Agent Metric Monitor

Overview

Agent metric monitors track statistical metrics over an agent's spans β€” null rate, latency, token usage, and the like β€” and alert on ML-based or fixed thresholds. They use the same alert_conditions as the Metric Monitor; the difference is the source: an agent, not a table.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For how agent monitors work, see Agent Monitors Overview. Alert conditions and the available metrics follow the Metric Monitor reference.

MaC key: agent_metric. Transforms are not supported here β€” for LLM-as-judge evaluation use the Agent Evaluation Monitor.

Quick Start

montecarlo:
  agent_metric:
    # Warehouse platform agent (e.g. Snowflake Cortex or Databricks)
    - name: agent_prompt_null_rate
      description: Null-rate of prompts on the support agent
      agent: "ANALYTICS:AGENTS.support_agent" # <database>:<schema>.<name>
      alert_conditions:
        - metric: NULL_RATE
          fields: [prompts]
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      aggregate_by: HOUR

Configuration

agent β€” the agent to monitor

string Β· required

The agent whose spans the monitor reads. Agent monitors name an agent instead of a table; the warehouse source is derived from it. The value takes one of two forms, told apart by its shape:

  • Warehouse platform agents (e.g. Snowflake Cortex, Databricks) β€” a <database>:<schema>.<name> reference, where <name> is the agent's display name when it has one (otherwise its underlying identifier). No trace_table.
  • OpenTelemetry agents β€” the agent's bare service_name. By default the OpenTelemetry trace store is resolved automatically; set trace_table only when the spans live in a specific warehouse table.

A value with a : and a dotted tail (db:schema.name) is read as a platform reference; a bare value is an OpenTelemetry service_name.

agent: "ANALYTICS:AGENTS.support_agent"
trace_table β€” warehouse trace table for an OTel agent

string Β· optional

<database>:<schema>.<table> naming the warehouse trace table that holds an OpenTelemetry agent's spans. Required when the spans live in your warehouse, or when the warehouse holds more than one OpenTelemetry trace table. Omit it for platform agents and for agents resolved from the default OpenTelemetry trace store.

trace_table: "ingest:opentelemetry.traces"
alert_conditions β€” metric alert rules

array of objects Β· required

Same shape as the Metric Monitor β€” a metric, optional fields, an operator (AUTO, GT, LT, range operators, …), and thresholds. See Available Metrics for metric names.

alert_conditions:
  - metric: NULL_RATE
    fields: [prompts]
    operator: AUTO
agent_span_filters β€” narrow to a workflow, task, or span

array of objects Β· optional

Refines the monitor to specific spans. The agent is already set by agent, so it need not be repeated. Each entry sets one or more of workflow, task, span_name, each an object with a value.

agent_span_filters:
  - workflow:
      value: checkout
aggregate_by β€” time bucket

enum Β· optional

Accepted values: HOUR Β· DAY Β· WEEK Β· MONTH (uppercase)

Buckets metrics by the chosen interval.

aggregate_by: HOUR
is_agent_trace_aggregation / is_agent_conversation_aggregation β€” scoring grain

boolean Β· optional Β· mutually exclusive

Raise the scoring grain from individual spans to whole traces (is_agent_trace_aggregation) or whole conversations (is_agent_conversation_aggregation). Setting both fails validation. Include the flag in the template to re-apply at the same grain β€” omitting it reverts to span grain.

is_agent_trace_aggregation: true
sensitivity β€” ML threshold sensitivity

enum Β· optional

Accepted values: low Β· medium Β· high

Tunes how lax the ML-generated thresholds are for AUTO alert conditions.

sensitivity: medium
segment_fields / segment_sql β€” segment the metrics

array of strings Β· optional

Segment metrics by up to 5 fields (segment_fields) or one SQL expression (segment_sql). Use one or the other, or neither.

segment_fields:
  - model_name
schedule β€” when the monitor runs

object Β· optional

Same shape as the Metric Monitor β€” type (fixed, dynamic, manual), interval_minutes, start_time, timezone.

schedule:
  type: fixed
  interval_minutes: 60
  start_time: "2025-01-01T00:00:00+00:00"
Common fields

The shared monitor envelope works the same as other monitor types:

  • name string Β· required β€” unique identifier in the namespace; renaming creates a new monitor.
  • description string Β· required β€” max 512 characters.
  • warehouse string Β· optional β€” UUID or name; overrides the montecarlo.yml default.
  • connection_name string Β· optional β€” query engine to use within the warehouse.
  • notes string Β· optional β€” shown in the UI, not in notifications.
  • audiences / failure_audiences array of strings Β· optional β€” notification channels.
  • priority enum Β· optional β€” P1–P5.
  • tags array of objects Β· optional β€” name (required) + value (optional).
  • data_quality_dimension enum Β· optional β€” ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY.
  • domains array of strings Β· optional (required on accounts created after January 2025) β€” domain for the monitor.
Deprecated fields
FieldUse instead
resourcewarehouse
domain_uuidsdomains
labelsaudiences

Examples

Platform agent metric (Snowflake Cortex / Databricks)

Null-rate metric on a warehouse platform agent, referenced by its <database>:<schema>.<name> identity. No trace_table β€” the source is derived from the agent.

montecarlo:
  agent_metric:
    - name: cortex_prompt_null_rate
      description: Null-rate of prompts on the Cortex agent
      agent: "ANALYTICS:AGENTS.support_cortex_agent"
      alert_conditions:
        - metric: NULL_RATE
          fields: [prompts]
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      aggregate_by: HOUR
      priority: P3
      domains:
        - my-domain

OpenTelemetry agent with a span filter

Addressed by the agent's bare service_name; the default OpenTelemetry trace store is resolved automatically. A span filter narrows the monitor to one workflow.

montecarlo:
  agent_metric:
    - name: otel_checkout_latency
      description: Latency of the checkout workflow
      agent: my-otel-agent # the agent's service_name
      agent_span_filters:
        - workflow:
            value: checkout
      alert_conditions:
        - metric: AVERAGE
          fields: [duration_ms]
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      aggregate_by: HOUR

Trace-level aggregation

Score whole traces rather than individual spans by setting is_agent_trace_aggregation.

montecarlo:
  agent_metric:
    - name: trace_token_usage
      description: Total token usage per trace
      agent: "ANALYTICS:AGENTS.support_cortex_agent"
      is_agent_trace_aggregation: true
      alert_conditions:
        - metric: SUM
          fields: [total_tokens]
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2025-01-01T00:00:00+00:00"
      aggregate_by: HOUR

Troubleshooting

Agent source

  • Omitting agent. Every agent monitor requires an agent. A monitor without one fails validation.
  • Platform reference not found. A <database>:<schema>.<name> reference must match an agent registered in the target account and warehouse. Cross-account: register the agent in the target account first.
  • Ambiguous platform reference. Export prefers an agent's display name, which is not unique β€” if two agents in the same schema share a display name, the reference is ambiguous and rejected. Use the agent's underlying endpoint identity instead.
  • Span-filter agent disagrees with agent. If you set an agent sub-field inside agent_span_filters, it must name the same agent as the top-level agent. Omit it β€” the agent is already set.

Trace tables

  • Setting trace_table for a default-store agent. When the OpenTelemetry trace store is resolved automatically, naming it is redundant and rejected. Omit trace_table.
  • trace_table matches no table. The named table must exist in the target warehouse, resolved by its <database>:<schema>.<table> id.
  • trace_table names a platform agent's table. Platform agents are authored by their agent reference, not their trace table.
  • No OpenTelemetry trace store provisioned. A bare service_name with no trace_table requires the warehouse to have a provisioned OpenTelemetry trace store. If it doesn't, set trace_table to name the warehouse table holding the spans.

Metric-specific

  • Using transforms. Transforms are not supported on agent_metric. Use the Agent Evaluation Monitor.
  • Setting both aggregation flags. is_agent_trace_aggregation and is_agent_conversation_aggregation are mutually exclusive β€” set at most one.
  • Forgetting PUT semantics on updates. Updating a monitor replaces its full configuration β€” fields you omit revert to defaults, including the aggregation flags. Always specify the complete desired configuration.