Claude Code Cost Optimization

Claude Code provides access to the best reasoning models available. It's also expensive — $15/million output tokens. A single long session can cost several dollars. This tutorial covers the patterns that keep costs under control without sacrificing output quality.

The Cost Problem

A typical Claude Code session:

Activity	Tokens (in/out)	Cost (Sonnet)
Codebase indexing	50,000 / 0	$0.15
Exploring a bug	5,000 / 3,000	$0.06
Generating a fix	2,000 / 500	$0.01
Generating tests	1,000 / 4,000	$0.06
Code review of fix	3,000 / 1,000	$0.02
Documentation update	1,000 / 2,000	$0.03
Total	71,500	$0.34

The indexing and exploration phases are unavoidable — Claude needs context. But test generation, bulk code output, and documentation are high-token, low-reasoning tasks. Those are the optimization targets.

Strategy 1: Task Routing

Not every prompt needs Claude-level reasoning. Route tasks by complexity:

Use Claude for:

Architecture decisions
Bug root cause analysis
Code review for correctness
Security audit
Multi-step refactoring plans

Route elsewhere for:

Test generation (DeepSeek: 95% cheaper)
Boilerplate code (DeepSeek)
Documentation generation (MiniMax or DeepSeek)
Commit message generation (DeepSeek)
Changelog generation (DeepSeek)

The routing decision is simple: does this task require reasoning, or generation? If generation, use a cheaper model.

Strategy 2: Shell Scripts for Bulk Generation

Claude Code can invoke shell scripts via the bash tool. The pattern: Claude designs the approach, a script calls a cheaper API for bulk generation, Claude reviews the output.

The helper script

Create ~/.claude/hooks/cheap-gen.sh:

#!/bin/bash
# Routes generation requests to DeepSeek API
# Pipe a prompt to stdin, get generated content on stdout

PROMPT=$(cat)
MODEL="${DEEPSEEK_MODEL:-deepseek-chat}"

curl -s https://api.deepseek.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d "$(jq -n \
    --arg model "$MODEL" \
    --arg prompt "$PROMPT" \
    '{
      model: $model,
      messages: [{role: "user", content: $prompt}],
      temperature: 0.3,
      max_tokens: 4096
    }')" | jq -r '.choices[0].message.content'

chmod +x ~/.claude/hooks/cheap-gen.sh

Using the script

In a Claude Code session, pipe generation tasks through the script:

# Generate tests for a module
echo "Generate Jest unit tests for src/auth.ts covering login,
logout, token refresh, and rate limiting. Include edge cases
for expired tokens and invalid credentials." | \
  ~/.claude/hooks/cheap-gen.sh > __tests__/auth.test.ts

Claude designs the test strategy and reviews the output. DeepSeek generates 4,000 tokens of test code for $0.001 instead of $0.06.

When to use the script

Task	Script?	Reason
Test generation	Yes	High volume, deterministic output
API endpoint stubs	Yes	Pattern is consistent across endpoints
Documentation	Yes	Volume task, Claude reviews for accuracy
Changelog generation	Yes	Templates from git log
Bug fixes	No	Needs reasoning, not generation
Architecture design	No	The whole point of using Claude
Security audit	No	Cannot trust cheap model for security

Strategy 3: Minimize Context Bloat

Claude charges for input tokens too. Every message in the conversation history counts. Longer sessions = more tokens = higher cost.

Use CLAUDE.md aggressively

Put everything Claude needs in CLAUDE.md. Read once at startup, not repeated every turn:

## Project Conventions
- TypeScript strict mode
- Use Zod for validation
- Tests use Vitest
- Commit messages follow conventional commits

## Common Commands
- Build: npm run build
- Lint: npm run lint
- Test: npm test
- Type check: npm run typecheck

## Frequently Referenced Files
- src/types.ts — Core type definitions
- src/config.ts — App configuration
- src/lib/api.ts — API client

Every time you reference a convention in chat, Claude re-reads it. Put it in CLAUDE.md — it's read once. This saves 500-2,000 tokens per turn.

Compact long sessions

After 20+ turns, the conversation history alone can be 40,000 input tokens. Every new message costs $0.12 just for context. Use /compact to create a summary and start fresh.

Avoid reading unnecessary files

Claude reads files you reference. If you say "check src/components/" and there are 30 files, Claude reads all 30. Be specific: "check src/components/Button.tsx".

Strategy 4: Model Selection

Extended thinking — use sparingly

Extended thinking doubles or triples the cost per response. Enable it only for:

Complex multi-step architecture decisions
Bug investigations where standard reasoning failed
Security audits requiring deep analysis

For everything else, standard mode is sufficient.

Haiku for quick tasks

Claude Haiku is 90% cheaper than Sonnet. For simple questions, code explanations, and quick lookups:

claude --model claude-haiku-4-20250514

Use Haiku for the "quick question" turns that don't need reasoning depth. Save Sonnet for the actual work.

Opus — almost never worth it

Opus costs 5x more than Sonnet. The quality difference is marginal for coding tasks. Use Sonnet. The cost difference adds up fast:

Model	10,000 output tokens	50 turns/day	20 days/month
Sonnet	$0.15	$7.50	$150
Opus	$0.75	$37.50	$750

Strategy 5: Session Discipline

One session, one task

Don't keep a session alive across tasks. Context accumulates. Start fresh for each feature or bug. Use CLAUDE.md to preserve project conventions across sessions.

Close sessions when done

Idle sessions still hold context in memory. Claude Code doesn't charge for idle time, but if you come back to an old session, you're paying for the entire stale context on the next message. Close it.

Use the right tool for the job

Claude Code is a coding agent. For non-coding questions, use the Claude web interface or API directly — you don't need the codebase context or tool overhead.

# Instead of asking Claude Code "what's the GitLab API rate limit"
# Just use curl + the web interface
curl -s https://docs.gitlab.com/api/ | grep "rate limit"

Cost Tracking

Check session cost

# During a session, Claude shows token usage
# Look for the cost line in each response

# Or check usage programmatically
# Anthropic API dashboard: console.anthropic.com

Set budget alerts

In the Anthropic console, set usage alerts at $5, $10, and $25. You'll get an email before costs surprise you.

Monthly cost benchmark

Usage Level	Typical Cost	Optimization Target
Light (1-2 hrs/day)	$20-50/mo	Haiku for quick tasks
Medium (4-6 hrs/day)	$80-150/mo	Shell scripts, task routing
Heavy (8+ hrs/day)	$200-400/mo	All strategies, consider OpenCode

Quick Reference: Cost by Task

Task	Recommended Model	Approx. Cost
Architecture design	Sonnet, extended	$0.05-0.15
Bug investigation	Sonnet	$0.03-0.08
Feature implementation	Sonnet	$0.05-0.20
Test generation	DeepSeek via script	$0.001-0.005
Documentation	DeepSeek via script	$0.002-0.01
Code review	Sonnet	$0.02-0.05
Quick question	Haiku	$0.001-0.005
Refactoring plan	Sonnet, extended	$0.05-0.10
Boilerplate/CRUD	DeepSeek via script	$0.001-0.003

Claude Code Patterns — Claude driving Gemini with Ralph loops
Offload Bulk Generation to DeepSeek — Detailed setup for the DeepSeek helper script
Multi-Model Workflows — Provider switching with OpenCode
Claude Code Getting Started — Installation and first session

Patterns

Tutorials

Claude Code Cost Optimization

Claude Code Cost Optimization

The Cost Problem

Strategy 1: Task Routing

Use Claude for:

Route elsewhere for:

Strategy 2: Shell Scripts for Bulk Generation

The helper script

Using the script

When to use the script

Strategy 3: Minimize Context Bloat

Use CLAUDE.md aggressively

Compact long sessions

Avoid reading unnecessary files

Strategy 4: Model Selection

Extended thinking — use sparingly

Haiku for quick tasks

Opus — almost never worth it

Strategy 5: Session Discipline

One session, one task

Close sessions when done

Use the right tool for the job

Cost Tracking

Check session cost

Set budget alerts

Monthly cost benchmark

Quick Reference: Cost by Task

Related Articles

Prompt Engineering in Gemini CLI (Enterprise)

Antigravity CLI — AI Coding Agent for Everyone

OpenCode — Getting Started

On this page

Patterns

Tutorials

Claude Code Cost Optimization

Claude Code Cost Optimization

The Cost Problem

Strategy 1: Task Routing

Use Claude for:

Route elsewhere for:

Strategy 2: Shell Scripts for Bulk Generation

The helper script

Using the script

When to use the script

Strategy 3: Minimize Context Bloat

Use CLAUDE.md aggressively

Compact long sessions

Avoid reading unnecessary files

Strategy 4: Model Selection

Extended thinking — use sparingly

Haiku for quick tasks

Opus — almost never worth it

Strategy 5: Session Discipline

One session, one task

Close sessions when done

Use the right tool for the job

Cost Tracking

Check session cost

Set budget alerts

Monthly cost benchmark

Quick Reference: Cost by Task

Related Content

Related Articles

Prompt Engineering in Gemini CLI (Enterprise)

Antigravity CLI — AI Coding Agent for Everyone

OpenCode — Getting Started

On this page