Pre-Prod Agent Monitors
Pre-prod agent monitors allow you to run agent evaluations on a static golden set of prompts or conversations. These evaluations serve as repeatable tests executed against a live production or pre-production environment, enabling CI/CD pipeline gating and tracking agent performance across builds.
Why use pre-prod monitors?
When deploying LLM-powered agents, you need confidence that code changes don't degrade agent quality. Pre-prod monitors enable:
- Regression testing — Run the same golden prompts against each build to detect quality regressions before deployment
- CI/CD gating — Automatically fail pipelines when agent evaluation scores drop below thresholds
- Performance tracking — Track agent quality metrics across builds over time
How pre-prod monitors differ from standard agent monitors
| Aspect | Standard Agent Monitors | Pre-prod Agent Monitors |
|---|---|---|
| Metric grouping | Time-bucketed (hourly/daily) | Bucketed by ci_build_id |
| Orchestration | Scheduled by Monte Carlo | Manually triggered via SDK |
| Use case | Production monitoring | CI/CD pipeline gating |
Setup guide
1. Instrument your agent
First, set up the Monte Carlo OpenTelemetry SDK, if you already haven't.
When invoking your agent during CI builds, pass ci_build_id and optionally expected_output as metadata. These metadata attributes will be available in the attr_map struct column of your warehouse traces table. When configuring the monitor (see below), you can filter by these attributes.
If using LangGraph Server, pass these values in the metadata field of your request:
POST /threads/{thread_id}/runs
{
"assistant_id": "your-assistant",
"input": {...},
"metadata": {
"ci_build_id": "build-127",
"expected_output": "Expected response text"
}
}If using a custom API layer, use mc.create_span_with_attributes to propagate attributes to child spans:
from montecarlo_opentelemetry import mc
@app.post("/analyze")
def analyze_sync() -> dict[str, Any]:
body: dict[str, Any] = app.current_event.json_body or {}
trace_ctx = body.get("trace_context") or {}
with mc.create_span_with_attributes("analyze_sync", trace_ctx):
return _analyze_handler(body)Alternatively, pass attributes via the config argument when invoking a LangGraph:
graph.ainvoke(
input={...},
config={"ci_build_id": "build-127", "expected_output": "..."}
)2. Create a pre-prod agent monitor
Follow the standard agent monitor configuration steps with these key differences:
- Filter Spans By
ci_build_id: Under "Filter spans by"- Select
attr_mapand select themontecarlo.association.ci_build_idattribute - Set the value to
{{ci_build_id}}. The{{ci_build_id}}syntax creates a runtime variable that you provide when triggering the monitor via the SDK.
- Select
- Group data: For "Group rows by", select "none" instead of hourly or daily bucketing, since pre-prod evaluations should assess all traces for a given build rather than grouping by time.
- Define Manual Schedule: Under "Define Schedule", select "Manually" instead of a recurring schedule, since the monitor will be triggered programmatically from your CI pipeline.
3. Create your CI build script
Your CI script orchestrates the pre-prod evaluation flow:
- Execute golden prompts — Invoke your agent with each prompt, passing
ci_build_idandexpected_outputin the request - Wait for traces — Poll until all traces are visible in the warehouse
- Trigger the monitor — Call
runMonitorwithci_build_idas a runtime variable - Poll for completion — Wait until the monitor execution finishes
- Gate deployment — Check the
breachedstatus and fail or pass the pipeline accordingly
The Pycarlo SDK provides components for building your CI script. See the sample_agent_ci_build_script.py file for a complete reference implementation.
Updated about 2 hours ago
