Testing Agent Blueprint
Complete testing agent that reads source code, identifies test gaps, generates unit tests, runs them, and fixes failures. Ready-to-run with pytest and Jest integration.
Testing Agent
An AI agent that writes and runs tests like a QA engineer embedded in your team. It reads source code, checks coverage gaps, generates tests, runs them, and iterates on failures until tests pass.
Note:
Run this agent on PRs to auto-generate tests for new code, or point it at under-tested modules to fill coverage gaps. It uses your existing test framework — no config changes needed.
Agent File Structure
Setup
Install Dependencies
Install the OpenAI client. The agent uses your existing test runner (pytest or Jest).
pip install openai pytest pytest-cov
Create config.json
Configure the agent with your project root and test framework.
{
"openai_api_key": "sk-...",
"model": "gpt-4o",
"max_iterations": 8,
"project_root": "./my-project",
"test_framework": "pytest",
"test_directory": "tests",
"source_directory": "src",
"coverage_threshold": 80
}
Note:
The agent writes files to your test directory. Commit existing tests first or run on a branch.
Verify
Generate tests for a single module to verify the setup.
python agent.py --source "src/utils/validators.py"
The agent should create a test file, run it, and report pass/fail.
System Prompt
You are a senior test engineer. Your job is to write thorough, maintainable
tests for given source code. Follow this protocol:
1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
boundary values, and the specific behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
(not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after, any
untestable code noted
Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies (APIs, databases) — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test
Tool Definitions
Agent Tools
Values: path: string
Values: path: string, recursive?: bool
Values: path: string, content: string
Values: path?: string, framework?: 'pytest' | 'jest'
Values: source_path: string
Tool Implementation
# tools.py
import os
import subprocess
PROJECT_ROOT = None
TEST_DIRECTORY = None
TEST_FRAMEWORK = None
COVERAGE_THRESHOLD = 80
def read_file(path):
full = os.path.join(PROJECT_ROOT, path)
if not os.path.exists(full):
return f"ERROR: File not found: {path}"
with open(full, "r") as f:
lines = f.readlines()
return "".join(f"{i+1:4d}| {l}" for i, l in enumerate(lines))
def list_directory(path, recursive=False):
full = os.path.join(PROJECT_ROOT, path)
if not os.path.exists(full):
return f"ERROR: Directory not found: {path}"
items = []
for root, dirs, files in os.walk(full):
dirs[:] = [d for d in dirs if not d.startswith(".") and d != "__pycache__"]
for f in files:
if f.endswith((".py", ".js", ".ts", ".tsx")) and not f.startswith("."):
rel = os.path.relpath(os.path.join(root, f), PROJECT_ROOT)
items.append(rel)
if not recursive:
break
return "\n".join(sorted(items)) if items else f"Directory {path} is empty"
def write_file(path, content):
full = os.path.join(PROJECT_ROOT, path)
os.makedirs(os.path.dirname(full), exist_ok=True)
with open(full, "w") as f:
f.write(content)
return f"Written {len(content)} bytes to {path}"
def run_tests(path=None, framework=None):
framework = framework or TEST_FRAMEWORK
if framework == "pytest":
cmd = ["pytest", "-v", "--tb=short"]
if path:
cmd.append(os.path.join(PROJECT_ROOT, path))
else:
cmd.append(os.path.join(PROJECT_ROOT, TEST_DIRECTORY))
elif framework == "jest":
cmd = ["npx", "jest", "--verbose"]
if path:
cmd.append(path)
else:
return f"Unknown framework: {framework}"
try:
result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
text=True, timeout=60)
return result.stdout + "\n" + result.stderr
except subprocess.TimeoutExpired:
return "Test run timed out after 60s. Check for infinite loops."
def check_coverage(source_path):
if TEST_FRAMEWORK != "pytest":
return f"Coverage check not supported for {TEST_FRAMEWORK}. Only pytest --cov is available."
source_path = source_path.replace("/", ".").replace(".py", "")
cmd = [
"pytest", f"--cov={source_path}", "--cov-report=term-missing",
os.path.join(PROJECT_ROOT, TEST_DIRECTORY), "-q"
]
try:
result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
text=True, timeout=60)
return result.stdout
except Exception as e:
return f"Coverage check failed: {e}"
Agent Initialization
# agent.py
import json
import argparse
from openai import OpenAI
import tools as agent_tools
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a source or test file with line numbers",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "list_directory",
"description": "List Python/JS/TS files in a directory, excluding hidden dirs",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"recursive": {"type": "boolean", "default": False}
},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Create or overwrite a test file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
},
{
"type": "function",
"function": {
"name": "run_tests",
"description": "Run tests on a specific file or the full test directory",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"framework": {"type": "string", "enum": ["pytest", "jest"]}
},
"required": []
}
}
},
{
"type": "function",
"function": {
"name": "check_coverage",
"description": "Check code coverage for a source module",
"parameters": {
"type": "object",
"properties": {"source_path": {"type": "string"}},
"required": ["source_path"]
}
}
}
]
SYSTEM_PROMPT = """You are a senior test engineer. Your job is to write thorough,
maintainable tests for given source code. Follow this protocol:
1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
boundary values, and behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
(not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after,
any untestable code noted
Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test"""
def run_agent(source_path, config):
client = OpenAI(api_key=config["openai_api_key"])
agent_tools.PROJECT_ROOT = config["project_root"]
agent_tools.TEST_DIRECTORY = config.get("test_directory", "tests")
agent_tools.TEST_FRAMEWORK = config.get("test_framework", "pytest")
agent_tools.COVERAGE_THRESHOLD = config.get("coverage_threshold", 80)
query = (
f"Write comprehensive tests for {source_path}. "
f"Read the source file, check existing test coverage, "
f"generate tests, run them, and fix any failures. "
f"Use {config.get('test_framework', 'pytest')} conventions."
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": query}
]
for i in range(config.get("max_iterations", 8)):
response = client.chat.completions.create(
model=config.get("model", "gpt-4o"),
messages=messages,
tools=TOOL_SCHEMAS,
temperature=0.2
)
msg = response.choices[0].message
messages.append(msg)
if msg.content and "FINAL_REPORT:" in msg.content:
return msg.content.split("FINAL_REPORT:", 1)[1].strip()
if not msg.tool_calls:
messages.append({
"role": "user",
"content": "Continue. Use tools to read files, generate tests, and run them. End with FINAL_REPORT."
})
continue
for tool_call in msg.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
func = getattr(agent_tools, func_name)
result = func(**func_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "Agent reached max iterations."
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--source", required=True, help="Source file to generate tests for")
parser.add_argument("--config", default="config.json")
args = parser.parse_args()
with open(args.config) as f:
config = json.load(f)
report = run_agent(args.source, config)
print(report)
Walkthrough
Generating tests for src/services/payment.py — a Stripe integration module.
Agent reads the source
read_file("src/services/payment.py") returns a module with three functions: create_payment_intent(), handle_webhook(), refund_charge(). Each has type hints and docstrings.
Checks existing coverage
check_coverage("src.services.payment") shows 0% coverage — no tests exist. The agent also calls list_directory("tests") to confirm no test_payment.py file.
Generates tests
The agent calls write_file("tests/test_payment.py", ...) with:
# tests/test_payment.py
from unittest.mock import patch, MagicMock
import pytest
from src.services.payment import create_payment_intent, handle_webhook, refund_charge
@pytest.fixture
def mock_stripe():
with patch("src.services.payment.stripe") as mock:
yield mock
def test_create_payment_intent_success(mock_stripe):
mock_stripe.PaymentIntent.create.return_value = {"id": "pi_123", "status": "succeeded"}
result = create_payment_intent(amount=5000, currency="usd")
assert result["id"] == "pi_123"
def test_create_payment_intent_zero_amount(mock_stripe):
with pytest.raises(ValueError, match="amount must be positive"):
create_payment_intent(amount=0)
def test_handle_webhook_invalid_signature(mock_stripe):
mock_stripe.Webhook.construct_event.side_effect = ValueError("Invalid signature")
result = handle_webhook(b"payload", "bad_sig")
assert result["status"] == "error"
def test_refund_charge_success(mock_stripe):
mock_stripe.Refund.create.return_value = {"id": "re_456", "status": "succeeded"}
result = refund_charge("ch_789")
assert result["id"] == "re_456"
Runs tests and fixes failures
run_tests("tests/test_payment.py") returns 3 pass, 1 fail — the refund_charge test fails because the actual function has a different return key. The agent reads the source again, spots "refund_id" vs "id", and calls write_file with the corrected assertion.
Delivers FINAL_REPORT
FINAL_REPORT:
4 tests generated for src/services/payment.py
Coverage: 0% → 92% (4/4 functions covered)
Failures resolved: 1 (return key mismatch)
Untestable: None — all functions have deterministic inputs
Customization
Test Framework Settings
Values: pytest, jest
Values: 0-100 (default 80)
Values: tests, __tests__, spec
Values: src, lib, app
Note:
Mocking external services. The agent knows to mock APIs, databases, and file I/O. If your code uses a less common library, add it to the system prompt under "Mock external dependencies."
Key Takeaway
A testing agent's value is in coverage gaps, not test volume. Point it at modules with 0-40% coverage first — these are where humans avoid writing tests because the code is complex or untested from the start. The agent doesn't get bored writing edge-case tests for a function with 15 branches.
Related Articles
SQL Agent Blueprint
AI agent that converts natural language questions into SQL queries, runs them against SQLite databases, validates results, and explains findings. Self-contained — zero external DBs needed.
Hermes Agent Setup Guide
Complete setup and configuration guide for Hermes Agent by Nous Research — the #1 self-hosted AI agent on OpenRouter. Skills, security, messaging platforms, and LLM provider wiring.
LangGraph Setup Guide
Complete setup and configuration guide for LangGraph — LangChain's low-level orchestration framework for building stateful agents. Graph-based, durable execution, checkpointing, and human-in-the-loop.