Record vs Replay¶

AgentTape has two phases. Recording calls the real world and saves it. Replay reconstructs it offline. Understanding the difference is the whole mental model.

The two phases at a glance¶

	Record	Replay
Role	Passthrough observer	Air-gapped simulator
Network	Real requests	Blocked
API keys	Required & valid	Not needed
Cost	Billed per call	Free
Tools	Execute for real	Mocked
Side effects	Happen	Never happen
Cassette	Written	Read

The record phase¶

While recording, AgentTape is a passthrough observer. For each intercepted call it:

flowchart LR
    A[Your code] --> B[AgentTape]
    B -->|1. forward| C[Real service]
    C -->|2. response| B
    B -->|3. save request + response| D[(Cassette)]
    B -->|4. return| A

When to record¶

You should record rarely — only when the recorded behavior is intentionally stale:

Writing a brand-new test.
You deliberately changed the prompt, model, or tools.
The real API changed and you want the new baseline.

What recording costs you¶

Recording runs everything for real

Network requests hit live servers — API keys must be valid.
You are billed for usage.
Tools execute their actual code, so side effects happen: databases get written, emails get sent, cards get charged.

Record against staging/sandbox services when you can, or freeze dangerous tools with Partial Replay.

The replay phase¶

While replaying, AgentTape is an air-gapped simulator. For each intercepted call it:

flowchart LR
    A[Your code] --> B[AgentTape]
    B -->|1. match request| D[(Cassette)]
    D -->|2. saved response| B
    B -->|3. return| A
    C[Real service] -.->|never contacted| B

When to replay¶

Almost always:

Running tests locally.
Running tests in CI.
Reproducing a failure without paying for API calls.
Refactoring the code around your agent.

What replay guarantees¶

Replay is safe and deterministic

Network requests are blocked.
API keys aren't needed (or even read).
Usage is free.
Tools are mocked — zero side effects.

Strict matching: the safety net¶

In replay, AgentTape never guesses. If your code asks for a "Chocolate Cake" recipe but the cassette only has "Vanilla Cake," it does not fall back to the network. It raises UnmatchedInteractionError immediately and tells you what differed.

flowchart TD
    Q[Incoming request] --> M{Match in<br/>cassette?}
    M -->|yes| R[Return saved response]
    M -->|no| F[Raise UnmatchedInteractionError<br/>never call the real service]

This strictness is the point: a test that drifted should fail loudly, not silently charge a card because an assertion changed.

The one exception

mode="new_episodes" records new requests while replaying known ones, and Partial Replay lets you mark specific boundaries live. Both are explicit opt-ins — the default mode="none" never touches the network.

Who picks the phase?¶

You don't set "record" or "replay" directly — you set a mode, and the mode decides per request. mode="record" always records; mode="none" always replays; once and new_episodes mix the two.

How modes map to phases →

FAQ¶

Does replay re-run my prompt against the model?

No. Replay reconstructs the recorded bytes. It does not re-execute the LLM with your current prompt. The moment an input to a live boundary changes, that boundary really executes (real cost) and a separate derived cassette is written — your original is never mutated. See Partial Replay.

What if I forget and leave mode='record' in CI?

CI would hit real services and rewrite cassettes. Keep mode="none" as the default (it already is) and gate recording behind an explicit flag like pytest --agenttape-record. The pytest plugin does exactly this.

Summary¶

Record: passthrough, online, real side effects, writes the cassette. Do it rarely.
Replay: simulator, offline, zero side effects, reads the cassette. Do it always.
Replay matches strictly and fails loud — it never silently calls the real service.

Next: Cassette Modes →