Trace and Conversation Structure

What a trace contains, how to read one, and how traces relate to conversations.

Every agent run is captured as a trace: a tree of spans containing prompts, context, completions, token usage, latency, model metadata, tool calls, and errors. This page covers how traces are structured, how to navigate them, and how they relate to conversations.

What's in a trace

A trace is a tree of spans. Each span represents one unit of work β€” an LLM call, a tool invocation, or a custom step your agent emits.

Span typeWhat it captures
LLM spanPrompt, completion, model name, token counts, latency
Tool spanTool name, the inputs the agent passed, the response that came back, duration
Custom spanAny application-defined step your agent instruments

Every span carries timing, status, and any custom attributes you emit (ticket IDs, customer names, tags, etc.).

Reading a trace

Open a trace to see its full span tree. Click any span to inspect its inputs, outputs, timing, and metadata.

Tool spans are often where debugging starts β€” they show what the agent actually did (queried a database, called an API, ran a function) and what it got back.

Explain this trace

The Explain this trace button generates a plain-English summary of what the agent did β€” useful for triaging, sharing, or reviewing a run without walking the span tree by hand. An equivalent Explain this conversation button is available at the conversation level.

Errors in traces

An agent run can fail outright, or it can hit an error mid-flight and still recover. Monte Carlo tracks both so a recovered run isn't hidden behind an "OK" status, and a failure isn't lost in the noise.

An agent's status reflects its final outcome, while any errors that occurred during the run are surfaced separately. You can filter traces and conversations two ways:

  • Final error state β€” did the run ultimately fail?
  • Has errors anywhere β€” did anything go wrong mid-flight, even if the agent recovered?

Error counts are consistent across the trace list, the agent summary, dashboards, and conversation views, so the same run reads the same way everywhere.

To get notified when error rates rise, create an Agent Metric Monitor on the error-rate metric β€” it's one of the three metric monitors auto-generated when a new agent is registered.

How traces become conversations

Agents with a chat-like interface produce multi-turn interactions that span several traces β€” typically one trace per user turn. The conversation view stitches those traces into a single chronological dialogue, so you don't have to reconstruct the back-and-forth by jumping between traces.

  • Traces are grouped into a conversation by a shared thread ID. Monte Carlo auto-detects common fields (thread_id, session_id, conversation_id), and you can point it at a specific metadata field per agent.
  • Messages display in timestamp order, labeled by role β€” user, agent, system, or tool. Tool messages include the tool name, inputs, output, and duration.
  • The conversation header summarizes thread ID, start time, duration, message count, and total tokens.
  • Click any message to jump to its underlying span for full inputs, outputs, and timing.

Evaluate a conversation on-demand

You can run an evaluation directly on any conversation from the conversation page β€” no monitor setup required. Pick from the standard eval set or write a custom eval prompt, select one or more dimensions to run, and get results in seconds. This is ideal for ad-hoc investigation ("did the agent do the right thing here?") and for sanity-checking a prompt before standing up a full Agent Evaluation Monitor.

Finding traces

The Trace Explorer lists an agent's traces with status, duration, token usage, and error state. Filter and search to narrow down to specific runs.

  • Deep links β€” use the open-in-new-tab icon on a trace row to compare several traces side by side without losing your filtered view.
  • Custom metadata search β€” the search bar accepts key:value syntax on any attribute your agent emits, plus built-in fields. Prefix with - to exclude; quote values with spaces; chain with AND.
πŸ“˜

Availability

Custom-metadata search is available for agents running on the ClickHouse trace architecture.

The Agent summary shows aggregate health over a time range β€” trace count, latency percentiles, token usage, and error rates β€” segmentable by workflow, task, or model. Use it to spot which agents are degrading before drilling into individual traces.

Exporting trace data

Monte Carlo lets you curate examples from production traces and export them as datasets for evals, fine-tuning, or offline debugging.

Export spans as JSONL

From the trace detail view, select individual LLM spans and export them as a JSONL file β€” one record per span, with the input/output, tool calls, and model content. Capture the spans that were good (or bad) directly from a real trace, then use the file to:

  • build evaluation sets,
  • assemble fine-tuning data, or
  • reproduce a case for offline debugging.

Export a full trace

To export an entire run rather than hand-picked spans, use the Export trace action in the UI or the CLI:

montecarlo agent-traces export --trace-link <copy-link-from-UI>

This produces a gzipped JSON of the full span tree with per-span content (prompts, completions, tool I/O), suitable for replaying or archiving a trace.

Related