OpenAI Agents SDK Setup Guide

Complete setup and configuration guide for OpenAI's official Agents SDK. Handoffs, guardrails, sandbox agents, tracing, sessions, MCP integration, and realtime voice agents.

June 12, 2026
openaiagents-sdkpythonhandoffsguardrailstracingvoice-agents

OpenAI Agents SDK

OpenAI's official Python SDK for building agentic applications. It provides a lightweight runtime with agents, handoffs, guardrails, and built-in tracing. Designed to be the production-ready successor to OpenAI's earlier Swarm experiment — same simplicity, real-world reliability.

The SDK uses the Responses API by default but adds a runtime layer: the Runner manages the agent loop, tool dispatch, guardrail execution, and handoffs. You focus on defining agents and tools — the SDK handles the rest.

Note:

Use the Agents SDK when you want the runtime to manage agent execution for you (tool dispatching, guardrails, handoffs, sessions). Use the Responses API directly when you want to own the loop and tool dispatch yourself. They're composable — many apps use both.

Installation

1

Install the SDK

Python 3.8+ required.

pip install openai-agents
2

Configure API Key

Set your OpenAI API key. The SDK reads from the environment by default.

export OPENAI_API_KEY=sk-...
3

Verify Installation

Run a minimal agent.

from agents import Agent, Runner

agent = Agent(name="Assistant", instructions="You are a helpful assistant.")
result = Runner.run_sync(agent, "Write a haiku about recursion.")
print(result.final_output)

Core Concepts

Architecture Overview

Agent
An LLM with instructions, tools, and optional handoffs. The fundamental building block. Agents can have guardrails, output types, and model overrides.

Values: Agent(name, instructions, tools, handoffs, model)

Runner
The execution runtime. Manages the agent loop: call LLM → execute tools → repeat until done. Handles guardrails, handoffs, and tracing.

Values: Runner.run_sync() | Runner.run() (async)

Handoff
One agent delegates to another by returning a handoff. The runner transfers control. Use for specialized sub-agents.

Values: agent_1.handoffs = [agent_2]

Guardrail
Input/output validation that runs in parallel with agent execution. Fails fast when checks don't pass.

Values: @input_guardrail, @output_guardrail

Handoffs: Agents Calling Agents

The killer feature — agents that delegate to specialized sub-agents.

from agents import Agent, Runner

billing_agent = Agent(
    name="Billing Agent",
    instructions="You handle billing inquiries: charges, invoices, payment methods, refunds.",
)

support_agent = Agent(
    name="Support Agent",
    instructions="You handle general support. For billing questions, hand off to the Billing Agent.",
    handoffs=[billing_agent]
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="Route users to the right agent. For billing, hand off to Billing Agent. For everything else, hand off to Support Agent.",
    handoffs=[billing_agent, support_agent]
)

result = Runner.run_sync(triage_agent, "I was charged twice for my subscription.")
# The triage agent automatically hands off to billing_agent
print(result.final_output)

Guardrails

Validation that runs alongside agent execution. Input guardrails check prompts before the LLM sees them. Output guardrails check responses before they reach the user.

from agents import Agent, Runner, input_guardrail, output_guardrail, GuardrailFunctionOutput

@input_guardrail
async def no_pii_guardrail(context, agent, input):
    # Check if input contains PII
    pii_patterns = ["credit card", "ssn", "social security"]
    if any(p in str(input).lower() for p in pii_patterns):
        return GuardrailFunctionOutput(
            output_info={"blocked": True},
            tripwire_triggered=True  # This halts execution
        )
    return GuardrailFunctionOutput(allow=True)

@output_guardrail
async def check_toxicity(context, agent, output):
    # Run output through moderation
    if "harmful" in str(output).lower():
        return GuardrailFunctionOutput(tripwire_triggered=True)
    return GuardrailFunctionOutput(allow=True)

agent = Agent(
    name="Safe Agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[no_pii_guardrail],
    output_guardrails=[check_toxicity]
)

Sandbox Agents

Run agents in isolated workspaces for code execution, file operations, and shell commands.

from agents import sandbox

# Docker-based sandbox
sandbox_agent = sandbox.SandboxAgent(
    name="Code Runner",
    instructions="Write and execute Python code in the sandbox.",
    sandbox=sandbox.DockerSandbox(image="python:3.12-slim"),
    capabilities=["filesystem", "shell"]
)

result = Runner.run_sync(
    sandbox_agent,
    "Clone the repo at github.com/example/project and list all Python files."
)

Sessions: Persistent Memory

Sessions maintain working context across agent turns. Use SQLite for development, Redis or Postgres for production.

from agents import Agent, Runner, SQLAlchemySession

# Create a persistent session
session = await SQLAlchemySession.create(
    connection_string="sqlite:///agent_memory.db"
)

agent = Agent(name="Memory Agent", instructions="Remember what the user tells you.")

# First turn
result = await Runner.run(agent, "My name is Alice.", session=session)
# Second turn — agent remembers Alice
result = await Runner.run(agent, "What's my name?", session=session)
# → "Your name is Alice."

Tracing

Every agent run is automatically traced. View traces with the OpenAI dashboard or export to your observability stack.

from agents import trace

# Traces capture: agent calls, tool executions, handoffs, guardrail checks, token usage
with trace(workflow_name="Customer Support"):
    result = Runner.run_sync(triage_agent, "I need help with billing.")

# View at: https://platform.openai.com/traces

Model Wiring

from agents import Agent, Runner

# Default: uses OPENAI_API_KEY, model="gpt-4o"
agent = Agent(name="Default", instructions="...")

# Override model per agent
agent = Agent(
    name="Fast Agent",
    instructions="...",
    model="gpt-4o-mini"
)

# Non-OpenAI models via LiteLLM (requires: pip install openai-agents[extensions])
from agents.extensions.models.litellm_model import LiteLLMModel

agent = Agent(
    name="Claude Agent",
    instructions="...",
    model=LiteLLMModel(
        model="anthropic/claude-sonnet-4-20250514",
        api_key="sk-ant-..."
    )
)

MCP Integration

Connect any MCP server as a tool for your agents.

from agents.mcp import MCPServerStdio

async with MCPServerStdio(
    name="Filesystem Server",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]}
) as server:
    agent = Agent(
        name="File Agent",
        instructions="Manage files for the user.",
        mcp_servers=[server]
    )
    result = await Runner.run(agent, "List files in /tmp")

Note:

Sandbox security. The Docker sandbox is the recommended production option — it provides container-level isolation for code execution. The Agents SDK does not include a local executor — all code execution requires a sandbox backend. For local development, the Docker sandbox runs on your machine with full isolation from your host filesystem.

Key Takeaway

The OpenAI Agents SDK is the most batteries-included option on this list. Guardrails, handoffs, tracing, sessions, sandbox agents, MCP integration, and realtime voice all ship in the same package. The tradeoff: it's the most OpenAI-coupled (non-OpenAI models require the extensions package). Use it when you're building on OpenAI infrastructure and want production-grade features without assembling them yourself. For multi-model flexibility, CrewAI or AutoGen offer broader provider support with less overhead.