Agent Validation Monitor

Overview

Agent validation monitors enforce row-level rules over an agent's spans β€” alerting when spans violate a predicate (for example, a required field is null, or a score falls outside an allowed range). Like trajectory monitors, they are rule-based: the logic lives in a single alert_condition predicate tree, evaluated over a time_filter window, rather than in alert_conditions.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For how validation monitoring works, see Agent Validation Monitors.

MaC key: agent_validation.

πŸ“˜

Author by export. The alert_condition predicate tree is intricate. The most reliable way to get its exact shape is to build the monitor in the Monte Carlo UI and export it with montecarlo monitors export, then manage the exported YAML.

Quick Start

montecarlo:
  agent_validation:
    - name: prompts_not_null
      description: Alert when an agent span has a null prompt
      agent: my-otel-agent # the agent's service_name
      alert_condition:
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: prompts
      time_filter:
        time_field:
          field: ingest_ts
        lookback_in_hrs: 10
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      domains:
        - my-domain

Configuration

agent β€” the agent to monitor

string Β· required

The agent whose spans the monitor reads. Agent monitors name an agent instead of a table; the warehouse source is derived from it. The value takes one of two forms, told apart by its shape:

  • Warehouse platform agents (e.g. Snowflake Cortex, Databricks) β€” a <database>:<schema>.<name> reference, where <name> is the agent's display name when it has one (otherwise its underlying identifier). No trace_table.
  • OpenTelemetry agents β€” the agent's bare service_name. By default the OpenTelemetry trace store is resolved automatically; set trace_table only when the spans live in a specific warehouse table.
agent: my-otel-agent
trace_table β€” warehouse trace table for an OTel agent

string Β· optional

<database>:<schema>.<table> naming the warehouse trace table that holds an OpenTelemetry agent's spans. Required when the spans live in your warehouse, or when the warehouse holds more than one OpenTelemetry trace table. Omit it for platform agents and for agents resolved from the default OpenTelemetry trace store.

trace_table: "ingest:opentelemetry.traces"
alert_condition β€” the validation predicate

object Β· required

The predicate tree spans must satisfy (or are flagged for violating). A set of conditions combined by a boolean operator (AND / OR); each condition is a predicate over span fields. Leaf predicates are unary (type: UNARY, e.g. is-null), binary (type: BINARY, e.g. comparison), or raw SQL (type: SQL), and groups can nest.

alert_condition:
  operator: AND
  conditions:
    - type: UNARY
      predicate:
        name: "null"
      value:
        - field: prompts
time_filter β€” evaluation window

object Β· required

The lookback window the condition is evaluated over. time_field must be the ingest_ts field; lookback_in_hrs must be at least 1.

time_filter:
  time_field:
    field: ingest_ts
  lookback_in_hrs: 10
exception_primary_key_column β€” dedup key for exceptions

string Β· optional

The span column used to identify and deduplicate exceptions across runs (for example, span_id). Lets the monitor track the same offending row over time rather than re-alerting on it.

exception_primary_key_column: span_id
is_agent_trace_aggregation β€” score whole traces

boolean Β· optional

Evaluate the validation at the trace grain rather than per span. When set, only the agent span filter is allowed β€” span-level refinements (workflow/task/span_name) are rejected. Include it in the template to re-apply at the same grain.

is_agent_trace_aggregation: true
filters β€” additional span filtering

object Β· optional

A predicate tree (FilterGroup) that further restricts which spans are validated, beyond the agent and the alert condition.

agent_span_filters β€” narrow to a workflow, task, or span

array of objects Β· optional

Refines the monitor to specific spans (workflow, task, span_name, each an object with a value). The agent is already set by agent. With is_agent_trace_aggregation: true, only the agent dimension is allowed here.

agent_span_filters:
  - workflow:
      value: checkout
schedule β€” when the monitor runs

object Β· required

type (fixed or manual), interval_minutes, start_time, timezone. Dynamic schedules are not supported for validation monitors.

schedule:
  type: fixed
  interval_minutes: 60
  start_time: "2025-01-01T00:00:00+00:00"
Common fields

The shared rule-monitor envelope:

  • name string Β· optional β€” unique identifier (auto-generated if omitted); renaming creates a new monitor.
  • description string Β· optional β€” max 512 characters.
  • warehouse string Β· optional β€” UUID or name; overrides the montecarlo.yml default.
  • connection_name string Β· optional β€” query engine to use within the warehouse.
  • timeout integer Β· optional β€” query timeout in seconds.
  • notes string Β· optional β€” shown in the UI, not in notifications.
  • audiences / failure_audiences array of strings Β· optional β€” notification channels.
  • priority enum Β· optional β€” P1–P5.
  • event_rollup_count integer Β· optional β€” minimum 2; roll up repeated breaches into one incident.
  • tags array of objects Β· optional β€” name (required) + value (optional).
  • data_quality_dimension enum Β· optional β€” ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY.
  • domains array of strings Β· optional (required on accounts created after January 2025).
Deprecated fields
FieldUse instead
resourcewarehouse
domain_uuidsdomains
labelsaudiences

Examples

OpenTelemetry agent β€” null check

Alert when an agent span has a null prompts value, scanning the last 10 hours and deduplicating exceptions by span_id.

montecarlo:
  agent_validation:
    - name: prompts_not_null
      description: Alert when an agent span has a null prompt
      agent: my-otel-agent
      alert_condition:
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: prompts
      time_filter:
        time_field:
          field: ingest_ts
        lookback_in_hrs: 10
      exception_primary_key_column: span_id
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      domains:
        - my-domain

Platform agent β€” trace-level validation

Validate at the trace grain on a warehouse platform agent. With is_agent_trace_aggregation, only the agent span filter is allowed.

montecarlo:
  agent_validation:
    - name: cortex_trace_completion_present
      description: Alert when a trace has no completion
      agent: "ANALYTICS:AGENTS.support_cortex_agent"
      is_agent_trace_aggregation: true
      alert_condition:
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: completions
      time_filter:
        time_field:
          field: ingest_ts
        lookback_in_hrs: 24
      schedule:
        type: fixed
        interval_minutes: 120
        start_time: "2025-01-01T00:00:00+00:00"
      domains:
        - my-domain

Troubleshooting

Agent source

  • Omitting agent. Every agent monitor requires an agent. A monitor without one fails validation.
  • Platform reference not found / ambiguous. A <database>:<schema>.<name> reference must match exactly one registered agent in the target account and warehouse. A display name shared by two agents in a schema is ambiguous β€” use the underlying endpoint identity. Cross-account: register the agent first.
  • Setting trace_table for a default-store agent. Redundant and rejected when the OpenTelemetry trace store resolves automatically. Omit it.

Validation specifics

  • Using alert_conditions. Validation monitors have no alert_conditions field β€” the predicate goes in the singular alert_condition.
  • Empty alert_condition. The predicate tree must contain at least one condition.
  • time_filter field other than ingest_ts. The time_field must be ingest_ts, and lookback_in_hrs must be at least 1.
  • Span-level filters with trace aggregation. With is_agent_trace_aggregation: true, only the agent span filter is allowed β€” workflow/task/span_name are rejected.
  • A dynamic schedule. Validation monitors don't support schedule.type: dynamic.
  • Forgetting PUT semantics on updates. Updating a monitor replaces its full configuration β€” omitted fields revert to defaults, including is_agent_trace_aggregation. Always specify the complete desired configuration.