DeepSeek Thinking Mode: Reasoning Tokens & Effort Control

Master DeepSeek's thinking mode with visible reasoning_content tokens. Learn to enable, read, and manage reasoning across turns — and how it differs from Claude's invisible extended thinking.

June 11, 2026
DeepSeekThinking ModeReasoningPrompt EngineeringCoT

DeepSeek's thinking mode is unlike anything from other providers. When enabled, the model outputs reasoning_content tokens — its chain-of-thought reasoning — alongside the final content. These tokens are visible, billable at output rates, and must be managed deliberately across conversation turns. This is fundamentally different from Claude's invisible extended thinking stream, where reasoning is hidden unless you explicitly access it.

The visibility of reasoning_content is both a superpower and a constraint. It's a superpower because you can debug the model's reasoning directly, understand where it went wrong, and use the reasoning as a quality signal. It's a constraint because you must decide whether to pass reasoning back to the API in subsequent turns — get this wrong and you'll get 400 errors or degraded multi-turn performance.

Note:

Key difference from Claude: DeepSeek's reasoning_content is always accessible. Claude's thinking stream requires special API handling. DeepSeek disables temperature and top_p in thinking mode — reasoning behavior is controlled solely by reasoning_effort.

What You'll Find Here

Thinking Mode Guide

How to enable thinking mode, read reasoning_content, and choose effort level. The fundamental differences from Claude's invisible extended thinking and GPT's chain-of-thought prompting. Stream and non-stream patterns.

Reasoning Effort Control

When to use high vs max effort. Cost implications — effort levels change token consumption. What tasks benefit most from reasoning, and where reasoning adds cost without benefit. The diminishing returns curve for effort levels.

Multi-Turn Reasoning

Managing reasoning_content across conversation turns. The critical distinction: optional passback for chat turns vs mandatory passback for tool-call loops. How to avoid 400 errors when reasoning content is mishandled.