DeepSeek Reasoning Effort Control: High vs Max
Master DeepSeek's reasoning_effort parameter. When high vs max effort, cost implications, diminishing returns curve, and which task categories benefit most from maximum reasoning.
DeepSeek's reasoning_effort parameter is the only dial you have for controlling thinking depth — temperature and top_p are disabled in thinking mode. The two effective levels are high (default) and max. But "max" isn't always better. It costs more tokens, adds latency, and on some tasks, overthinking degrades quality.
Effort Level Comparison
| Aspect | high (default) | max |
|---|---|---|
| Reasoning depth | Thorough, multi-step | Exhaustive, explores alternatives |
| Token consumption | Baseline | +50-200% more reasoning tokens |
| Latency | Baseline | +30-100% more time |
| Best for | Most reasoning tasks | Complex proofs, architecture, strategy |
| Overkill on | Simple analysis, creative tasks | Most non-reasoning tasks |
| Agent contexts | Default for API | Auto-set by Claude Code, OpenCode |
When to Use effort=high
High is the correct default for most tasks:
- Standard reasoning tasks: Logic problems, analysis, explanation
- Code generation: Function-level coding, bug fixes, refactoring
- Document Q&A: Answering questions about provided documents
- Classification & extraction: Structured output tasks
- Creative writing: Where overthinking kills creativity
- Cost-sensitive workflows: Where token budget matters
Use high when:
- The task benefits from reasoning but doesn't require exhaustive exploration
- You're optimizing for speed and cost
- The problem has a clear solution path
- You're not sure — high is the safer default
When to Use effort=max
Max is justified for problems where errors cascade:
- Mathematical proofs: Multi-step derivations where each step must be verified
- Architecture decisions: System design with competing constraints and tradeoffs
- Complex debugging: Multi-hop error diagnosis across system boundaries
- Strategic analysis: Exploring scenarios, identifying hidden assumptions
- Competitive programming: Algorithmic problems with edge cases
- Legal/regulatory reasoning: Where missing a clause has real consequences
- Agentic coding: Auto-set by Claude Code and OpenCode for complex agent tasks
Use max when:
- The cost of a wrong answer exceeds the cost of extra tokens
- The problem has multiple valid approaches you want the model to consider
- You need the model to catch its own edge cases
- You're debugging and need to see exhaustive reasoning
The Diminishing Returns Curve
For most tasks, quality follows an S-curve but with fewer steps than Claude's token-budget approach:
| Effort | Quality Gain | ROI |
|---|---|---|
| Non-thinking → high | Large jump (reasoning enabled) | Excellent |
| high → max | Marginal improvement for most tasks | Fair to poor |
| Max (on the right task) | Significant for complex reasoning | Excellent |
The jump from no-thinking to high is the largest quality gain. high → max provides meaningful gains only on the specific task categories listed above. On typical Q&A, summarization, or coding tasks, max spends more tokens for indistinguishable output quality.
Cost Implications
Reasoning tokens are billed at output token rates:
| Model | Cost per 1M thinking tokens | Cost for 4K thinking tokens |
|---|---|---|
| Pro (high) | $0.87 | ~$0.0035 |
| Pro (max) | $0.87 | ~$0.007-0.014 (2-4x more tokens) |
| Flash (high) | $0.28 | ~$0.0011 |
| Flash (max) | $0.28 | ~$0.0022-0.0045 |
At DeepSeek's pricing, even max on Pro is cheaper than Claude's base API call. But the relative difference matters at scale: 1M requests at max vs high on Pro costs $3,500-$10,500 more.
When Thinking Mode Hurts
Some tasks are better without thinking mode:
| Task | Why thinking degrades quality |
|---|---|
| Creative writing | Over-rationalizing kills spontaneity and voice |
| Simple translation | Adds latency, no quality gain |
| Conversational chat | Thinking tokens are wasted on social conventions |
| Routine classification | Deterministic task, reasoning is overhead |
| JSON extraction (known schema) | JSON mode alone is sufficient |
Note:
Pro Move: For Claude Code integration, set CLAUDE_CODE_EFFORT_LEVEL=max for the main agent and leave sub-agents at default (high). The main agent benefits from exhaustive reasoning during planning; sub-agents executing specific instructions don't.
Note:
Don't confuse effort with correctness: In thinking mode, low and medium effort values are silently mapped to high. If you're testing different levels, only high and max are real. xhigh maps to max for compatibility.
Related Pages
- Thinking Mode Guide — Foundation: how to enable thinking mode and read
reasoning_contenttokens. - Multi-Turn Reasoning — Effort levels interact with multi-turn behavior — tool-call chains auto-set effort to
max.
Related Articles
1980s Neon & Synthwave SREF Codes
Bold neon colors, geometric patterns, and new wave aesthetics with authentic 80s period color grading.
Fantasy & Isekai SREF Codes for Midjourney
Epic fantasy worlds with detailed environments and RPG-inspired aesthetics for Midjourney prompts.
Midjourney Weapon Design: Master Prompts for Swords, Guns & More
Master Midjourney weapon design and creation. Learn to craft stunning swords, futuristic guns, and fantasy armaments with advanced prompting techniques and detailed examples.