Pre-Prod Agent Monitors

Pre-prod agent monitors allow you to run agent evaluations on a static golden set of prompts or conversations. These evaluations serve as repeatable tests executed against a live production or pre-production environment, enabling CI/CD pipeline gating and tracking agent performance across builds.

Why use pre-prod monitors?

When deploying LLM-powered agents, you need confidence that code changes don't degrade agent quality. Pre-prod monitors enable:

  • Regression testing — Run the same golden prompts against each build to detect quality regressions before deployment
  • CI/CD gating — Automatically fail pipelines when agent evaluation scores drop below thresholds
  • Performance tracking — Track agent quality metrics across builds over time

How pre-prod monitors differ from standard agent monitors

AspectStandard Agent MonitorsPre-prod Agent Monitors
Metric groupingTime-bucketed (hourly/daily)Bucketed by ci_build_id
OrchestrationScheduled by Monte CarloManually triggered via SDK
Use caseProduction monitoringCI/CD pipeline gating

Setup guide

1. Instrument your agent

First, set up the Monte Carlo OpenTelemetry SDK, if you already haven't.

When invoking your agent during CI builds, pass ci_build_id and optionally expected_output as metadata. These metadata attributes will be available in the attr_map struct column of your warehouse traces table. When configuring the monitor (see below), you can filter by these attributes.

If using LangGraph Server, pass these values in the metadata field of your request:

POST /threads/{thread_id}/runs
{
  "assistant_id": "your-assistant",
  "input": {...},
  "metadata": {
    "ci_build_id": "build-127",
    "expected_output": "Expected response text"
  }
}

If using a custom API layer, use mc.create_span_with_attributes to propagate attributes to child spans:

from montecarlo_opentelemetry import mc

@app.post("/analyze")
def analyze_sync() -> dict[str, Any]:
    body: dict[str, Any] = app.current_event.json_body or {}
    trace_ctx = body.get("trace_context") or {}

    with mc.create_span_with_attributes("analyze_sync", trace_ctx):
        return _analyze_handler(body)

Alternatively, pass attributes via the config argument when invoking a LangGraph:

graph.ainvoke(
    input={...},
    config={"ci_build_id": "build-127", "expected_output": "..."}
)

2. Create a pre-prod agent monitor

Follow the standard agent monitor configuration steps with these key differences:

  • Filter Spans By ci_build_id: Under "Filter spans by"
    • Select attr_map and select the montecarlo.association.ci_build_id attribute
    • Set the value to {{ci_build_id}}. The {{ci_build_id}} syntax creates a runtime variable that you provide when triggering the monitor via the SDK.
  • Group data: For "Group rows by", select "none" instead of hourly or daily bucketing, since pre-prod evaluations should assess all traces for a given build rather than grouping by time.
  • Define Manual Schedule: Under "Define Schedule", select "Manually" instead of a recurring schedule, since the monitor will be triggered programmatically from your CI pipeline.

3. Create your CI build script

Your CI script orchestrates the pre-prod evaluation flow:

  1. Execute golden prompts — Invoke your agent with each prompt, passing ci_build_id and expected_output in the request
  2. Wait for traces — Poll until all traces are visible in the warehouse
  3. Trigger the monitor — Call runMonitor with ci_build_id as a runtime variable
  4. Poll for completion — Wait until the monitor execution finishes
  5. Gate deployment — Check the breached status and fail or pass the pipeline accordingly

The Pycarlo SDK provides components for building your CI script. See the sample_agent_ci_build_script.py file for a complete reference implementation.