Contract Review Agent

An AI agent that reviews contracts like a paralegal doing first-pass analysis. It extracts key clauses, flags unfavorable terms, compares contract versions (redline analysis), and generates executive summaries. Input is any PDF or Markdown contract.

Note:

Not legal advice. This agent identifies patterns and flags potential concerns. A qualified lawyer must review its output before any legal decisions. Use as a first-pass triage tool, not a replacement for legal counsel.

Agent File Structure

contract-review-agentadd

agent.pyadd

tools.pyadd

risk_categories.jsonadd

config.jsonadd

Setup

Install Dependencies

Install the OpenAI client plus PDF support.

pip install openai pymupdf

Create config.json

Configure the agent. risk_categories_path points to the JSON file defining what to flag.

{
  "openai_api_key": "sk-...",
  "model": "gpt-4o",
  "max_iterations": 6,
  "risk_categories_path": "risk_categories.json"
}

Verify

Run the agent on a sample contract to verify setup.

python agent.py --contract "./samples/vendor-agreement.pdf"

The agent should output extracted clauses, risk flags, and a summary.

System Prompt

You are a contract review specialist. Your role is first-pass analysis — identify
what's in the contract, flag potential concerns, and summarize. You are not a lawyer.
Always include this disclaimer in your output.

Protocol:
1. THOUGHT: What type of contract is this? What should I look for?
2. ACTION: Extract clauses by category (parties, term, payment, liability,
   termination, IP, confidentiality, governing law, etc.)
3. For each clause: summarize in plain English, note any unusual or one-sided terms
4. Cross-reference against risk categories — flag matches with severity
5. If a second contract version is provided, perform redline comparison
6. FINAL_REVIEW: Executive summary + clause table + risk flags + disclaimer

Rules:
- Flag missing clauses as HIGH risk when they are standard for this contract type
- Flag one-sided terms with the party they favor
- Use PLAIN ENGLISH summaries — the recipient may not be a lawyer
- If you're uncertain about a clause's implication, say so rather than guessing
- Always end with: "This is an automated first-pass review. Consult a qualified
  lawyer before making decisions based on this analysis."

Risk Categories

{
  "risk_categories": [
    {
      "name": "Unlimited Liability",
      "severity": "critical",
      "patterns": [
        "indemnify.*without limitation",
        "unlimited liability",
        "liable for.*all.*damages",
        "waive.*all.*claims"
      ],
      "description": "Party assumes unlimited financial exposure — negotiate a cap."
    },
    {
      "name": "Automatic Renewal",
      "severity": "high",
      "patterns": [
        "automatically renew",
        "auto-renew",
        "shall renew.*unless.*notice"
      ],
      "description": "Contract renews without explicit action — may lock you in unexpectedly."
    },
    {
      "name": "One-Sided Termination",
      "severity": "high",
      "patterns": [
        "terminate.*at any time.*without cause",
        "sole discretion to terminate",
        "immediate termination.*without notice"
      ],
      "description": "Only one party can terminate — negotiate mutual or notice-period terms."
    },
    {
      "name": "IP Assignment",
      "severity": "critical",
      "patterns": [
        "assign.*all.*intellectual property",
        "work product.*becomes.*property of",
        "hereby assigns.*all.*right.*title",
        "irrevocably assign"
      ],
      "description": "You're giving away ownership of your work product — negotiate a license instead."
    },
    {
      "name": "Non-Compete Overreach",
      "severity": "medium",
      "patterns": [
        "non-compete",
        "shall not.*compete.*for.*years",
        "restricted from.*engaging.*similar business"
      ],
      "description": "May restrict future work — check geographic and time scope for reasonableness."
    },
    {
      "name": "Data Privacy Gap",
      "severity": "high",
      "patterns": [
        "no.*data.*processing.*agreement",
        "no.*privacy.*policy",
        "sell.*personal.*data",
        "share.*personal.*information.*third.*party"
      ],
      "description": "Missing or weak data protection terms — required under GDPR/CCPA."
    },
    {
      "name": "Vague Scope",
      "severity": "medium",
      "patterns": [
        "as.*reasonably.*requested",
        "other.*services.*as.*needed",
        "additional.*work.*at.*client.*discretion"
      ],
      "description": "Scope is open-ended — you may be on the hook for undefined work."
    }
  ]
}

Tool Definitions

Agent Tools

read_contract

Read a PDF or Markdown contract file. Returns full text with page/line markers for PDFs.

Values: path: string

extract_clauses

Extract clauses by category (liability, payment, termination, etc.) using LLM. Returns structured JSON.

Values: categories?: string[] (default: all)

flag_risks

Cross-reference contract text against risk categories. Returns matched patterns with severity and descriptions.

Values: contract_text: string, categories?: string[]

compare_versions

Compare two contract versions and identify added, removed, and modified clauses.

Values: version_a_path: string, version_b_path: string

summarize_contract

Generate a plain-English executive summary of the contract.

Values: clauses_json: object

Tool Implementation

# tools.py
import json
import os
import re

def read_contract(path):
    full = path if os.path.isabs(path) else os.path.join(os.getcwd(), path)
    if not os.path.exists(full):
        return f"ERROR: File not found: {path}"

    if path.endswith(".pdf"):
        import fitz  # pymupdf
        doc = fitz.open(full)
        text = []
        for i, page in enumerate(doc):
            text.append(f"--- Page {i+1} ---\n{page.get_text()}")
        return "\n".join(text)

    with open(full, "r") as f:
        return f.read()


def extract_clauses(client, contract_text, model, categories=None):
    all_categories = categories or [
        "parties", "term", "payment", "liability", "termination",
        "intellectual_property", "confidentiality", "governing_law",
        "indemnification", "warranty", "limitation_of_liability",
        "force_majeure", "assignment", "dispute_resolution"
    ]
    prompt = f"""Extract clauses from this contract by category.
Return a JSON object with category names as keys.
For each category, provide: the clause text (exact quote) and a plain-English summary.
If a category is not present in the contract, set its value to null.

Contract text:
{contract_text[:15000]}

Categories to extract: {', '.join(all_categories)}

Return ONLY valid JSON."""

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content


def flag_risks(contract_text, risk_categories_path="risk_categories.json"):
    if not os.path.exists(risk_categories_path):
        return f"ERROR: Risk categories file not found: {risk_categories_path}"
    with open(risk_categories_path) as f:
        categories = json.load(f)["risk_categories"]

    # Limit scan to first 20K chars to avoid regex performance issues
    scan_text = contract_text[:20000]

    findings = []
    for cat in categories:
        for pattern in cat["patterns"]:
            matches = re.finditer(pattern, scan_text, re.IGNORECASE | re.DOTALL)
            for m in matches:
                context_start = max(0, m.start() - 80)
                context_end = min(len(scan_text), m.end() + 80)
                findings.append({
                    "category": cat["name"],
                    "severity": cat["severity"],
                    "matched_text": m.group().strip(),
                    "context": scan_text[context_start:context_end].replace("\n", " "),
                    "description": cat["description"]
                })

    unique = {f["matched_text"] + f["category"]: f for f in findings}
    result = list(unique.values())
    if len(contract_text) > 20000:
        result.append({"warning": "Contract text exceeds 20K characters. Only first 20K scanned for risks."})
    return json.dumps(result, indent=2)


def compare_versions(client, model, path_a, path_b):
    text_a = read_contract(path_a)
    text_b = read_contract(path_b)

    prompt = f"""Compare these two contract versions and identify changes.

Version A:
{text_a[:8000]}

Version B:
{text_b[:8000]}

Return a JSON object with:
- added: clauses present in B but not A
- removed: clauses present in A but not B
- modified: clauses that changed (show old vs new text)
- summary: one-sentence summary of the changes

Return ONLY valid JSON."""

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content


def summarize_contract(client, model, clauses_json):
    prompt = f"""Given these extracted contract clauses, write an executive summary
in plain English. Include: what this contract is, key obligations of each party,
critical risks, and recommended next steps. Keep it under 300 words.

Clauses:
{clauses_json}

Return ONLY the summary text, no JSON wrapper."""

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    return response.choices[0].message.content

Agent Initialization

# agent.py
import json
import os
import argparse
from openai import OpenAI
import tools as agent_tools

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "read_contract",
            "description": "Read a PDF or Markdown contract file",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "extract_clauses",
            "description": "Extract clauses by category from contract text",
            "parameters": {
                "type": "object",
                "properties": {
                    "contract_text": {"type": "string"},
                    "categories": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["contract_text"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "flag_risks",
            "description": "Cross-reference contract text against risk categories",
            "parameters": {
                "type": "object",
                "properties": {"contract_text": {"type": "string"}},
                "required": ["contract_text"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "compare_versions",
            "description": "Compare two contract versions (redline analysis)",
            "parameters": {
                "type": "object",
                "properties": {
                    "version_a_path": {"type": "string"},
                    "version_b_path": {"type": "string"}
                },
                "required": ["version_a_path", "version_b_path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "summarize_contract",
            "description": "Generate a plain-English executive summary",
            "parameters": {
                "type": "object",
                "properties": {"clauses_json": {"type": "string"}},
                "required": ["clauses_json"]
            }
        }
    }
]

SYSTEM_PROMPT = """You are a contract review specialist. Your role is first-pass
analysis — identify what's in the contract, flag potential concerns, and summarize.
You are not a lawyer. Always include a disclaimer.

Protocol:
1. Read the contract
2. Extract clauses by category (parties, term, payment, liability, termination,
   IP, confidentiality, governing law)
3. Flag risks against risk categories — report severity and description
4. Summarize in plain English
5. FINAL_REVIEW with: executive summary, clause table, risk flags, disclaimer

Rules:
- Flag missing clauses as HIGH risk when standard for this contract type
- Flag one-sided terms with the party they favor
- Use PLAIN ENGLISH — the recipient may not be a lawyer
- If uncertain, say so rather than guessing
- End with disclaimer about consulting a qualified lawyer"""


def run_agent(contract_path, config, compare_path=None):
    client = OpenAI(api_key=config["openai_api_key"])
    model = config.get("model", "gpt-4o")

    query = f"Review this contract: {contract_path}."
    if compare_path:
        query += f" Compare it against: {compare_path}."

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for i in range(config.get("max_iterations", 6)):
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=TOOL_SCHEMAS,
            temperature=0.2
        )

        msg = response.choices[0].message
        messages.append(msg)

        if msg.content and "FINAL_REVIEW:" in msg.content:
            return msg.content.split("FINAL_REVIEW:", 1)[1].strip()

        if not msg.tool_calls:
            messages.append({
                "role": "user",
                "content": "Continue the review. Extract clauses, flag risks, then provide FINAL_REVIEW."
            })
            continue

        for tool_call in msg.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)

            if func_name == "read_contract":
                result = agent_tools.read_contract(func_args.get("path", contract_path))
            elif func_name == "extract_clauses":
                result = agent_tools.extract_clauses(client,
                    func_args.get("contract_text", ""), model,
                    func_args.get("categories"))
            elif func_name == "flag_risks":
                text = func_args.get("contract_text", "")
                result = agent_tools.flag_risks(text,
                    config.get("risk_categories_path", "risk_categories.json"))
            elif func_name == "compare_versions":
                result = agent_tools.compare_versions(client, model,
                    func_args.get("version_a_path", contract_path),
                    func_args.get("version_b_path", compare_path or contract_path))
            elif func_name == "summarize_contract":
                result = agent_tools.summarize_contract(client, model,
                    func_args.get("clauses_json", ""))
            else:
                result = f"Unknown tool: {func_name}"

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Agent reached max iterations."


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--contract", required=True, help="Path to contract file (PDF or MD)")
    parser.add_argument("--compare", help="Optional: second version to compare against")
    parser.add_argument("--config", default="config.json")
    args = parser.parse_args()

    with open(args.config) as f:
        config = json.load(f)

    result = run_agent(args.contract, config, args.compare)
    print(result)

Walkthrough

Reviewing a software vendor agreement with unfavorable terms.

Agent reads the contract

read_contract returns a 12-page PDF. Full text with page markers loaded into context. The agent identifies it as a Master Services Agreement with a Statement of Work appendix.

Extracts all clauses

extract_clauses returns a JSON object with 12 clause categories. Key findings:

Liability: "Vendor's total liability shall not exceed fees paid in the preceding 3 months." → Low cap, but mutual — standard.
IP Assignment: "All work product shall become the sole property of Client." → Full IP transfer, no license option.
Termination: "Client may terminate at any time without cause upon 5 days' notice." → One-sided. Vendor has no equivalent right.

Flags risks

flag_risks cross-references the full text against risk categories:

[CRITICAL] IP Assignment — "hereby assigns all right, title, and interest"
[HIGH] One-Sided Termination — "terminate at any time without cause"
[MEDIUM] Vague Scope — "other services as reasonably requested"

Delivers FINAL_REVIEW

FINAL_REVIEW:

CONTRACT TYPE: Software Master Services Agreement
PARTIES: Client (Acme Corp) and Vendor (DevCo LLC)

KEY OBLIGATIONS:
- Vendor provides software development services per SOW
- Client pays $15,000/month, net-30 terms
- Term: 12 months

CRITICAL RISKS:
[CRITICAL] IP Assignment: All work product becomes Client's sole property.
Recommend: Negotiate a perpetual license instead of full assignment, or
increase fees to account for IP transfer value.

[HIGH] One-Sided Termination: Client can terminate without cause in 5 days.
Vendor has no equivalent right. Recommend: Add mutual termination clause
with 30-day notice for both parties.

[MEDIUM] Vague Scope: "other services as reasonably requested" creates
open-ended obligations. Recommend: Cap additional work at X hours/month
or require a separate SOW for scope changes.

DISCLAIMER: This is an automated first-pass review. Consult a qualified
lawyer before making decisions based on this analysis.

Customization

Risk Configuration

risk_categories_path

Path to the JSON file defining risk patterns. Add your organization's standard red-flag clauses.

Values: path to .json file

model

gpt-4o recommended for contract analysis. gpt-4o-mini works for simple contracts but may miss nuanced legal language.

Values: gpt-4o, gpt-4o-mini

max_iterations

Review iterations. Increase for long or complex contracts with many clauses.

Values: 1-10 (default 6)

Note:

PDF quality matters. Scanned PDFs (images of text) will not produce usable output. The agent works with text-based PDFs and Markdown files. Use OCR preprocessing for scanned documents.

Key Takeaway

Contract review agents are best at surface-level pattern matching and clause extraction — the kind of work that consumes paralegal hours. They will not catch subtle legal implications or jurisdiction-specific nuances. The risk categories JSON is the most important file: it defines what the agent flags. Customize it for your industry's standard concerns before running on real contracts.

Contract Review Agent Blueprint