Claude Code Cost Optimization

Cut Claude Code costs by 70-95% using shell scripts, provider switching, and task routing. When to use extended thinking vs cheap execution.

claude-codecost-optimizationhooksdeepseektutorial

Claude Code Cost Optimization

Claude Code provides access to the best reasoning models available. It's also expensive — $15/million output tokens. A single long session can cost several dollars. This tutorial covers the patterns that keep costs under control without sacrificing output quality.

The Cost Problem

A typical Claude Code session:

ActivityTokens (in/out)Cost (Sonnet)
Codebase indexing50,000 / 0$0.15
Exploring a bug5,000 / 3,000$0.06
Generating a fix2,000 / 500$0.01
Generating tests1,000 / 4,000$0.06
Code review of fix3,000 / 1,000$0.02
Documentation update1,000 / 2,000$0.03
Total71,500$0.34

The indexing and exploration phases are unavoidable — Claude needs context. But test generation, bulk code output, and documentation are high-token, low-reasoning tasks. Those are the optimization targets.

Strategy 1: Task Routing

Not every prompt needs Claude-level reasoning. Route tasks by complexity:

Use Claude for:

  • Architecture decisions
  • Bug root cause analysis
  • Code review for correctness
  • Security audit
  • Multi-step refactoring plans

Route elsewhere for:

  • Test generation (DeepSeek: 95% cheaper)
  • Boilerplate code (DeepSeek)
  • Documentation generation (MiniMax or DeepSeek)
  • Commit message generation (DeepSeek)
  • Changelog generation (DeepSeek)

The routing decision is simple: does this task require reasoning, or generation? If generation, use a cheaper model.

Strategy 2: Shell Scripts for Bulk Generation

Claude Code can invoke shell scripts via the bash tool. The pattern: Claude designs the approach, a script calls a cheaper API for bulk generation, Claude reviews the output.

The helper script

Create ~/.claude/hooks/cheap-gen.sh:

#!/bin/bash
# Routes generation requests to DeepSeek API
# Pipe a prompt to stdin, get generated content on stdout

PROMPT=$(cat)
MODEL="${DEEPSEEK_MODEL:-deepseek-chat}"

curl -s https://api.deepseek.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d "$(jq -n \
    --arg model "$MODEL" \
    --arg prompt "$PROMPT" \
    '{
      model: $model,
      messages: [{role: "user", content: $prompt}],
      temperature: 0.3,
      max_tokens: 4096
    }')" | jq -r '.choices[0].message.content'
chmod +x ~/.claude/hooks/cheap-gen.sh

Using the script

In a Claude Code session, pipe generation tasks through the script:

# Generate tests for a module
echo "Generate Jest unit tests for src/auth.ts covering login,
logout, token refresh, and rate limiting. Include edge cases
for expired tokens and invalid credentials." | \
  ~/.claude/hooks/cheap-gen.sh > __tests__/auth.test.ts

Claude designs the test strategy and reviews the output. DeepSeek generates 4,000 tokens of test code for $0.001 instead of $0.06.

When to use the script

TaskScript?Reason
Test generationYesHigh volume, deterministic output
API endpoint stubsYesPattern is consistent across endpoints
DocumentationYesVolume task, Claude reviews for accuracy
Changelog generationYesTemplates from git log
Bug fixesNoNeeds reasoning, not generation
Architecture designNoThe whole point of using Claude
Security auditNoCannot trust cheap model for security

Strategy 3: Minimize Context Bloat

Claude charges for input tokens too. Every message in the conversation history counts. Longer sessions = more tokens = higher cost.

Use CLAUDE.md aggressively

Put everything Claude needs in CLAUDE.md. Read once at startup, not repeated every turn:

## Project Conventions
- TypeScript strict mode
- Use Zod for validation
- Tests use Vitest
- Commit messages follow conventional commits

## Common Commands
- Build: npm run build
- Lint: npm run lint
- Test: npm test
- Type check: npm run typecheck

## Frequently Referenced Files
- src/types.ts — Core type definitions
- src/config.ts — App configuration
- src/lib/api.ts — API client

Every time you reference a convention in chat, Claude re-reads it. Put it in CLAUDE.md — it's read once. This saves 500-2,000 tokens per turn.

Compact long sessions

After 20+ turns, the conversation history alone can be 40,000 input tokens. Every new message costs $0.12 just for context. Use /compact to create a summary and start fresh.

Avoid reading unnecessary files

Claude reads files you reference. If you say "check src/components/" and there are 30 files, Claude reads all 30. Be specific: "check src/components/Button.tsx".

Strategy 4: Model Selection

Extended thinking — use sparingly

Extended thinking doubles or triples the cost per response. Enable it only for:

  • Complex multi-step architecture decisions
  • Bug investigations where standard reasoning failed
  • Security audits requiring deep analysis

For everything else, standard mode is sufficient.

Haiku for quick tasks

Claude Haiku is 90% cheaper than Sonnet. For simple questions, code explanations, and quick lookups:

claude --model claude-haiku-4-20250514

Use Haiku for the "quick question" turns that don't need reasoning depth. Save Sonnet for the actual work.

Opus — almost never worth it

Opus costs 5x more than Sonnet. The quality difference is marginal for coding tasks. Use Sonnet. The cost difference adds up fast:

Model10,000 output tokens50 turns/day20 days/month
Sonnet$0.15$7.50$150
Opus$0.75$37.50$750

Strategy 5: Session Discipline

One session, one task

Don't keep a session alive across tasks. Context accumulates. Start fresh for each feature or bug. Use CLAUDE.md to preserve project conventions across sessions.

Close sessions when done

Idle sessions still hold context in memory. Claude Code doesn't charge for idle time, but if you come back to an old session, you're paying for the entire stale context on the next message. Close it.

Use the right tool for the job

Claude Code is a coding agent. For non-coding questions, use the Claude web interface or API directly — you don't need the codebase context or tool overhead.

# Instead of asking Claude Code "what's the GitLab API rate limit"
# Just use curl + the web interface
curl -s https://docs.gitlab.com/api/ | grep "rate limit"

Cost Tracking

Check session cost

# During a session, Claude shows token usage
# Look for the cost line in each response

# Or check usage programmatically
# Anthropic API dashboard: console.anthropic.com

Set budget alerts

In the Anthropic console, set usage alerts at $5, $10, and $25. You'll get an email before costs surprise you.

Monthly cost benchmark

Usage LevelTypical CostOptimization Target
Light (1-2 hrs/day)$20-50/moHaiku for quick tasks
Medium (4-6 hrs/day)$80-150/moShell scripts, task routing
Heavy (8+ hrs/day)$200-400/moAll strategies, consider OpenCode

Quick Reference: Cost by Task

TaskRecommended ModelApprox. Cost
Architecture designSonnet, extended$0.05-0.15
Bug investigationSonnet$0.03-0.08
Feature implementationSonnet$0.05-0.20
Test generationDeepSeek via script$0.001-0.005
DocumentationDeepSeek via script$0.002-0.01
Code reviewSonnet$0.02-0.05
Quick questionHaiku$0.001-0.005
Refactoring planSonnet, extended$0.05-0.10
Boilerplate/CRUDDeepSeek via script$0.001-0.003