Testing Agent

An AI agent that writes and runs tests like a QA engineer embedded in your team. It reads source code, checks coverage gaps, generates tests, runs them, and iterates on failures until tests pass.

Note:

Run this agent on PRs to auto-generate tests for new code, or point it at under-tested modules to fill coverage gaps. It uses your existing test framework — no config changes needed.

Agent File Structure

testing-agentadd

agent.pyadd

tools.pyadd

config.jsonadd

Setup

Install Dependencies

Install the OpenAI client. The agent uses your existing test runner (pytest or Jest).

pip install openai pytest pytest-cov

Create config.json

Configure the agent with your project root and test framework.

{
  "openai_api_key": "sk-...",
  "model": "gpt-4o",
  "max_iterations": 8,
  "project_root": "./my-project",
  "test_framework": "pytest",
  "test_directory": "tests",
  "source_directory": "src",
  "coverage_threshold": 80
}

Note:

The agent writes files to your test directory. Commit existing tests first or run on a branch.

Verify

Generate tests for a single module to verify the setup.

python agent.py --source "src/utils/validators.py"

The agent should create a test file, run it, and report pass/fail.

System Prompt

You are a senior test engineer. Your job is to write thorough, maintainable
tests for given source code. Follow this protocol:

1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
   boundary values, and the specific behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
   (not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after, any
   untestable code noted

Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies (APIs, databases) — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test

Tool Definitions

Agent Tools

read_file

Read a source or test file. Returns numbered lines.

Values: path: string

list_directory

List files in source and test directories. Used to match source modules to existing tests.

Values: path: string, recursive?: bool

write_file

Create or overwrite a test file. Path is relative to project_root.

Values: path: string, content: string

run_tests

Run pytest (or Jest) on a specific test file or directory. Returns test output with pass/fail counts.

Values: path?: string, framework?: 'pytest' | 'jest'

check_coverage

Check code coverage for a source file. Returns uncovered lines and overall percentage.

Values: source_path: string

Tool Implementation

# tools.py
import os
import subprocess

PROJECT_ROOT = None
TEST_DIRECTORY = None
TEST_FRAMEWORK = None
COVERAGE_THRESHOLD = 80

def read_file(path):
    full = os.path.join(PROJECT_ROOT, path)
    if not os.path.exists(full):
        return f"ERROR: File not found: {path}"
    with open(full, "r") as f:
        lines = f.readlines()
    return "".join(f"{i+1:4d}| {l}" for i, l in enumerate(lines))

def list_directory(path, recursive=False):
    full = os.path.join(PROJECT_ROOT, path)
    if not os.path.exists(full):
        return f"ERROR: Directory not found: {path}"
    items = []
    for root, dirs, files in os.walk(full):
        dirs[:] = [d for d in dirs if not d.startswith(".") and d != "__pycache__"]
        for f in files:
            if f.endswith((".py", ".js", ".ts", ".tsx")) and not f.startswith("."):
                rel = os.path.relpath(os.path.join(root, f), PROJECT_ROOT)
                items.append(rel)
        if not recursive:
            break
    return "\n".join(sorted(items)) if items else f"Directory {path} is empty"

def write_file(path, content):
    full = os.path.join(PROJECT_ROOT, path)
    os.makedirs(os.path.dirname(full), exist_ok=True)
    with open(full, "w") as f:
        f.write(content)
    return f"Written {len(content)} bytes to {path}"

def run_tests(path=None, framework=None):
    framework = framework or TEST_FRAMEWORK
    if framework == "pytest":
        cmd = ["pytest", "-v", "--tb=short"]
        if path:
            cmd.append(os.path.join(PROJECT_ROOT, path))
        else:
            cmd.append(os.path.join(PROJECT_ROOT, TEST_DIRECTORY))
    elif framework == "jest":
        cmd = ["npx", "jest", "--verbose"]
        if path:
            cmd.append(path)
    else:
        return f"Unknown framework: {framework}"

    try:
        result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
                                text=True, timeout=60)
        return result.stdout + "\n" + result.stderr
    except subprocess.TimeoutExpired:
        return "Test run timed out after 60s. Check for infinite loops."

def check_coverage(source_path):
    if TEST_FRAMEWORK != "pytest":
        return f"Coverage check not supported for {TEST_FRAMEWORK}. Only pytest --cov is available."
    source_path = source_path.replace("/", ".").replace(".py", "")
    cmd = [
        "pytest", f"--cov={source_path}", "--cov-report=term-missing",
        os.path.join(PROJECT_ROOT, TEST_DIRECTORY), "-q"
    ]
    try:
        result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
                                text=True, timeout=60)
        return result.stdout
    except Exception as e:
        return f"Coverage check failed: {e}"

Agent Initialization

# agent.py
import json
import argparse
from openai import OpenAI
import tools as agent_tools

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a source or test file with line numbers",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": "List Python/JS/TS files in a directory, excluding hidden dirs",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "recursive": {"type": "boolean", "default": False}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Create or overwrite a test file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run tests on a specific file or the full test directory",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "framework": {"type": "string", "enum": ["pytest", "jest"]}
                },
                "required": []
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "check_coverage",
            "description": "Check code coverage for a source module",
            "parameters": {
                "type": "object",
                "properties": {"source_path": {"type": "string"}},
                "required": ["source_path"]
            }
        }
    }
]

SYSTEM_PROMPT = """You are a senior test engineer. Your job is to write thorough,
maintainable tests for given source code. Follow this protocol:

1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
   boundary values, and behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
   (not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after,
   any untestable code noted

Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test"""


def run_agent(source_path, config):
    client = OpenAI(api_key=config["openai_api_key"])
    agent_tools.PROJECT_ROOT = config["project_root"]
    agent_tools.TEST_DIRECTORY = config.get("test_directory", "tests")
    agent_tools.TEST_FRAMEWORK = config.get("test_framework", "pytest")
    agent_tools.COVERAGE_THRESHOLD = config.get("coverage_threshold", 80)

    query = (
        f"Write comprehensive tests for {source_path}. "
        f"Read the source file, check existing test coverage, "
        f"generate tests, run them, and fix any failures. "
        f"Use {config.get('test_framework', 'pytest')} conventions."
    )

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for i in range(config.get("max_iterations", 8)):
        response = client.chat.completions.create(
            model=config.get("model", "gpt-4o"),
            messages=messages,
            tools=TOOL_SCHEMAS,
            temperature=0.2
        )

        msg = response.choices[0].message
        messages.append(msg)

        if msg.content and "FINAL_REPORT:" in msg.content:
            return msg.content.split("FINAL_REPORT:", 1)[1].strip()

        if not msg.tool_calls:
            messages.append({
                "role": "user",
                "content": "Continue. Use tools to read files, generate tests, and run them. End with FINAL_REPORT."
            })
            continue

        for tool_call in msg.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            func = getattr(agent_tools, func_name)
            result = func(**func_args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Agent reached max iterations."


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--source", required=True, help="Source file to generate tests for")
    parser.add_argument("--config", default="config.json")
    args = parser.parse_args()

    with open(args.config) as f:
        config = json.load(f)

    report = run_agent(args.source, config)
    print(report)

Walkthrough

Generating tests for src/services/payment.py — a Stripe integration module.

Agent reads the source

read_file("src/services/payment.py") returns a module with three functions: create_payment_intent(), handle_webhook(), refund_charge(). Each has type hints and docstrings.

Checks existing coverage

check_coverage("src.services.payment") shows 0% coverage — no tests exist. The agent also calls list_directory("tests") to confirm no test_payment.py file.

Generates tests

The agent calls write_file("tests/test_payment.py", ...) with:

# tests/test_payment.py
from unittest.mock import patch, MagicMock
import pytest
from src.services.payment import create_payment_intent, handle_webhook, refund_charge

@pytest.fixture
def mock_stripe():
    with patch("src.services.payment.stripe") as mock:
        yield mock

def test_create_payment_intent_success(mock_stripe):
    mock_stripe.PaymentIntent.create.return_value = {"id": "pi_123", "status": "succeeded"}
    result = create_payment_intent(amount=5000, currency="usd")
    assert result["id"] == "pi_123"

def test_create_payment_intent_zero_amount(mock_stripe):
    with pytest.raises(ValueError, match="amount must be positive"):
        create_payment_intent(amount=0)

def test_handle_webhook_invalid_signature(mock_stripe):
    mock_stripe.Webhook.construct_event.side_effect = ValueError("Invalid signature")
    result = handle_webhook(b"payload", "bad_sig")
    assert result["status"] == "error"

def test_refund_charge_success(mock_stripe):
    mock_stripe.Refund.create.return_value = {"id": "re_456", "status": "succeeded"}
    result = refund_charge("ch_789")
    assert result["id"] == "re_456"

Runs tests and fixes failures

run_tests("tests/test_payment.py") returns 3 pass, 1 fail — the refund_charge test fails because the actual function has a different return key. The agent reads the source again, spots "refund_id" vs "id", and calls write_file with the corrected assertion.

Delivers FINAL_REPORT

FINAL_REPORT:
4 tests generated for src/services/payment.py
Coverage: 0% → 92% (4/4 functions covered)
Failures resolved: 1 (return key mismatch)
Untestable: None — all functions have deterministic inputs

Customization

Test Framework Settings

test_framework

Which test runner the agent uses. pytest for Python projects, jest for JS/TS.

Values: pytest, jest

coverage_threshold

If coverage is already above this %, the agent skips generating tests for that module.

Values: 0-100 (default 80)

test_directory

Where test files live relative to project_root. The agent creates test_{module}.py files here.

Values: tests, __tests__, spec

source_directory

Where source code lives. The agent only reads from this directory.

Values: src, lib, app

Note:

Mocking external services. The agent knows to mock APIs, databases, and file I/O. If your code uses a less common library, add it to the system prompt under "Mock external dependencies."

Key Takeaway

A testing agent's value is in coverage gaps, not test volume. Point it at modules with 0-40% coverage first — these are where humans avoid writing tests because the code is complex or untested from the start. The agent doesn't get bored writing edge-case tests for a function with 15 branches.

Testing Agent Blueprint

Testing Agent

Agent File Structure

Setup

Install Dependencies

Create config.json

Verify

System Prompt

Tool Definitions

Agent Tools

Tool Implementation

Agent Initialization

Walkthrough

Agent reads the source

Checks existing coverage

Generates tests

Runs tests and fixes failures

Delivers FINAL_REPORT

Customization

Test Framework Settings

Related Articles

SkillOpt: Training Agent Skill Documents Like Neural Network Weights

CodeAgents + Structure: Why Enforcing JSON Format on Code-Generating Agents Actually Works

Agent Cost Optimization

On this page