What is AgentTape?¶
AgentTape records what your AI agent says and does, then replays it — so your tests run offline, for free, with zero side effects.
-
Record once
Run your agent against the real OpenAI API, database, and tools. AgentTape saves every call to a YAML file.
-
Replay forever
Run the same code with the network turned off. AgentTape serves the saved responses in milliseconds.
The one-minute version¶
AgentTape sits between your code and the outside world. It intercepts external calls — an OpenAI request, a database query, a custom Python tool — and saves the inputs and outputs to a local file called a cassette.
The next time your code runs, AgentTape blocks the real call and returns the saved response instead. Your application can't tell the difference. It thinks it's talking to live services.
import agenttape
from openai import OpenAI
def run_agent():
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Say hi in 3 words"}],
)
return resp.choices[0].message.content
# 1. Record — calls the real API once, writes cassettes/hello.yaml
with agenttape.use_cassette("hello", mode="record"):
print(run_agent())
# 2. Replay — zero network, free, identical output every time
with agenttape.use_cassette("hello", mode="none"):
print(run_agent())
What happened?
The first block called OpenAI for real and saved the prompt and response into cassettes/hello.yaml. The second block intercepted the same call, matched the prompt against the cassette, and returned the saved response without touching the network.
Why this matters¶
Tests for AI agents are usually slow, flaky, expensive, and dangerous:
| Problem | Without AgentTape | With AgentTape |
|---|---|---|
| Cost | Every test run burns tokens | Replay is free |
| Speed | 1–5s per LLM call | < 5ms per call |
| Flakiness | Models drift; outputs vary | Byte-for-byte identical |
| Side effects | A tool really charges the card | Replayed tools never execute |
| Offline | No network, no tests | Runs on a plane |
The usual alternative is hand-written mocks. But mocks test your assumptions about an API, not the API itself — when the real service changes, your mocks keep passing while production breaks. AgentTape captures the real interaction once, then replays that.
How it works¶
flowchart LR
A[Your code] --> B[AgentTape]
B -->|forwards| C[Real world<br/>OpenAI · DB · Stripe]
C -->|response| B
B -->|saves| D[(Cassette<br/>YAML)]
B -->|returns| A
AgentTape forwards the call, waits for the real response, saves both the request and response to the cassette, then hands the response back to your code.
flowchart LR
A[Your code] --> B[AgentTape]
B -->|matches request| D[(Cassette<br/>YAML)]
D -->|saved response| B
B -->|returns| A
C[Real world] -.->|never called| B
AgentTape matches the request against the cassette and returns the saved response. The real world is never contacted. If no match is found, it raises an error rather than silently hitting the network.
What AgentTape captures¶
AgentTape doesn't only record LLM calls. It records every boundary your agent crosses:
-
LLM calls
OpenAI chat completions, responses, and embeddings — captured automatically by the built-in adapter.
-
Tools
Any Python function you mark with
@agenttape.tool— database writes, API calls, payments. -
Raw HTTP
Any
httpxorrequestscall, even to services AgentTape has no dedicated adapter for. -
Retrieval & memory
Vector-store lookups and agent memory reads/writes, tagged for clarity.
Core guarantees¶
AgentTape's promises
- Local-first — no servers, no telemetry, no network during replay.
- Deterministic — the same inputs always produce the same recorded output, byte-for-byte.
- Zero side effects — a replayed tool never executes for real. Safe for CI.
- Fail loud, never silent — an unmatched request raises an error instead of quietly calling the real service.
- Git-friendly — cassettes are plain YAML you can read, diff, and hand-edit.
- Zero core dependencies — the engine runs on the Python standard library alone.
Where to go next¶
-
A guided, five-minute walkthrough from zero to replaying offline.
-
Already know the idea? Copy-paste your way into an existing project.
-
Understand cassettes, the replay engine, determinism, and partial replay.
-
Every function, decorator, and class, with signatures and examples.
Summary¶
- AgentTape intercepts your agent's external calls — LLM and tool calls.
- It saves them to readable YAML cassettes, then replays them deterministically.
- Replay is offline, free, fast, and safe from side effects.
- Integration is one
withblock or one decorator — your agent code stays unchanged.