DeepSeek Thinking Mode: Enable, Read & Compare

Complete guide to DeepSeek's thinking mode. How to enable thinking, read reasoning_content tokens, choose between high/max effort, and understand how visible reasoning differs from Claude's invisible extended thinking and GPT's CoT prompting.

June 11, 2026
DeepSeekThinking ModeReasoningCoTPrompt Engineering

DeepSeek's thinking mode is its most distinctive capability — and the one most likely to confuse users coming from other models. When enabled, the model outputs reasoning_content tokens (its chain-of-thought reasoning) alongside the final content. These tokens are visible, billable at output rates, and must be deliberately managed across turns. This is fundamentally different from every other major provider's reasoning implementation.

How Thinking Mode Works

1. User sends prompt with thinking: { type: "enabled" }
2. Model generates reasoning_content (visible CoT tokens)
3. Model generates content (final answer)
4. Both are returned in the response
5. reasoning_content is billed as output tokens
6. In multi-turn: reasoning_content may or may not need to be passed back (depends on context)

Enabling Thinking Mode

OpenAI-Compatible Format

from openai import OpenAI

client = OpenAI(
    api_key="<DeepSeek API Key>",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    reasoning_effort="high",  # or "max"
    extra_body={"thinking": {"type": "enabled"}}
)

# Access reasoning content
reasoning = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

print(f"REASONING:\n{reasoning}\n")
print(f"ANSWER:\n{answer}")

Anthropic-Compatible Format

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.deepseek.com/anthropic",
    api_key="<DeepSeek API Key>"
)

response = client.messages.create(
    model="deepseek-v4-pro",
    max_tokens=4096,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    output_config={"effort": "high"}  # or "max"
)

Thinking Mode vs Other Models

FeatureDeepSeek ThinkingClaude Extended ThinkingGPT Chain-of-Thought
Reasoning visibilityVisible (reasoning_content)Hidden (thinking stream, separate access)Visible in output (clutters response)
Reasoning costBilled as output tokensBilled at same rate, separate budgetBilled as output tokens
Effort controlhigh / maxToken budget (budget_tokens)Not applicable
Temperature controlDisabled in thinking modeUnaffectedUnaffected
Tool call supportYes (V3.2+)YesNot natively
Multi-turnMust manage reasoning_contentThinking is per-request, resetsCoT persists in message history

Key Differences from Claude

  1. Visibility: DeepSeek's reasoning is always accessible. Claude's requires special API handling to access the thinking stream.

  2. Temperature: DeepSeek disables temperature and top_p in thinking mode. Claude doesn't — you can still control creativity.

  3. Budget control: Claude uses budget_tokens for thinking allocation. DeepSeek uses reasoning_effort levels. Claude gives you fine-grained token control; DeepSeek gives you coarse high/max.

  4. Multi-turn behavior: DeepSeek's reasoning persists across turns (you manage reasoning_content). Claude's thinking is per-request — each call is independent.

Key Differences from GPT CoT

  1. Clean output: DeepSeek separates reasoning from answer. GPT's CoT is inline — your final answer is buried under a wall of "Let me think step by step..."

  2. Programmatic access: DeepSeek's reasoning_content is a separate field you can parse. GPT's CoT requires string parsing.

  3. Effort control: DeepSeek lets you dial reasoning up/down. GPT's CoT quality depends entirely on your prompt.

Reading reasoning_content

The reasoning content is your window into the model's thought process. Use it for:

Debugging Wrong Answers

if answer_is_wrong:
    print(f"MODEL'S REASONING:\n{reasoning}")
    # Look for: faulty assumptions, skipped steps, premature conclusions
    # The error will be visible in the reasoning chain

Quality Signal

# Long, structured reasoning = high confidence
# Short, hand-wavy reasoning = low confidence

if len(reasoning) < 200 or "clearly" in reasoning:
    print("Warning: Model may be overconfident or reasoning insufficiently")

Chain Verification

User prompt: "Verify your own reasoning before giving the final answer."

This prompts DeepSeek to include a self-check in reasoning_content:
"Let me verify: assumption A is correct because... assumption B holds given...
Wait — assumption B conflicts with constraint C. Let me reconsider..."

Streaming Mode

In streaming mode, reasoning_content and content arrive as separate chunks:

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=messages,
    stream=True,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}}
)

reasoning_buffer = ""
answer_buffer = ""

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.reasoning_content:
        reasoning_buffer += delta.reasoning_content
    elif delta.content:
        answer_buffer += delta.content

Note:

Common Pitfall: In streaming mode, reasoning_content chunks arrive before content chunks. Don't display reasoning_content to end users unless you intend to show the thinking process — it can be verbose and confusing.

Note:

Pro Move: Use reasoning_content as a lightweight "second opinion" signal. If the reasoning seems sound (logical steps, constraint-aware, error-catching), trust the output. If reasoning is hand-wavy, re-prompt with effort=max or verify the output independently.