Metric Monitor

Track field-level and table-level metrics over time with anomaly detection or fixed thresholds.

Overview

Track field-level and table-level metrics β€” null rates, row counts, distributions, and more β€” and get alerted when values breach thresholds or deviate from learned baselines. The metric monitor handles time-series data quality on a single table, with optional segmentation and custom metric expressions.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For how metric monitors work and the full metrics list, see Metric Monitors and Available Metrics.

MaC key: metric. Replaces the legacy field_health monitor (blocked from new creation).

Quick Start

montecarlo:
  metric:
    - name: orders_row_count
      description: Alert on unexpected row count changes in the orders table
      data_source:
        table: my_database:my_schema.orders
      aggregate_time_field: created_at
      aggregate_by: day
      alert_conditions:
        - metric: ROW_COUNT_CHANGE
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 1440
      domains:
        - my-domain

Create interactively with create_or_update_metric_monitor(dry_run=True) via the Monte Carlo MCP server.

Configuration

description β€” what this monitor checks

string Β· required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: Alert on unexpected row count changes in the orders table
data_source β€” the table or SQL query to monitor

object Β· required

Provide either table or sql, not both.

data_source:
  table: analytics:core.fct_orders

Properties


table β€” fully qualified table name

string Β· optional Β· one of table or sql required

Format: database:schema.table. Mutually exclusive with sql.

data_source:
  table: analytics:core.fct_orders

sql β€” custom SQL query

string Β· optional Β· one of table or sql required

Custom SQL query that returns the dataset to monitor. Mutually exclusive with table. Not compatible with use_partition_clause.

data_source:
  sql: "SELECT * FROM analytics.core.fct_orders WHERE region = 'NA'"

transforms β€” AI-powered field transforms

array of objects Β· optional

AI-powered field transforms applied to the data source before metric computation.

PropertyTypeRequiredDescription
functionstringyesTransform function name (e.g., classification, extraction)
fieldstringnoColumn to apply the transform to
aliasstringnoOutput alias for the transformed column
sql_expressionstringnoSQL expression defining the transform
promptstringnoLLM prompt for AI-powered transforms. Not supported by classification (which uses categories instead)
categoriesarray of objectsnoCategory definitions for classification transforms. Each entry has a required label, optional description, and optional examples (array of strings)
model_connection_idstringnoConnection ID for the AI model used by the transform
model_namestringnoName of the AI model to use
output_typestringnoExpected output data type
field_config_listarray of objectsnoField configuration list for multi-field transforms
field_value_rangeobjectnoValue range constraints with lower_bound and upper_bound
idstringnoUnique identifier for this transform instance

data_source:
  table: analytics:core.fct_orders
  transforms:
    - function: classification
      field: description
      alias: description_category
      categories:
        - label: electronics
          description: Electronic devices
        - label: clothing
          description: Apparel items
alert_conditions β€” metrics and thresholds to monitor

array of objects Β· required

Each entry defines a metric to track and the condition that triggers an alert. Provide either metric (built-in) or custom_metric, not both. Multiple conditions are allowed. Supported alert condition types: threshold (default), noop.

Available operators: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP

Threshold types:

  • Static β€” fixed numeric value via threshold_value with an explicit operator (GT, LT, etc.).
  • Range β€” lower_threshold and upper_threshold with INSIDE_RANGE or OUTSIDE_RANGE.
  • Anomaly detection (AUTO) β€” ML-based. AUTO catches both high and low deviations. AUTO_HIGH/AUTO_LOW catch one-sided deviations. Control aggressiveness with the monitor-level sensitivity field.
🚧

Pipeline metric operator restriction

Pipeline metrics (ROW_COUNT_CHANGE, TIME_SINCE_LAST_ROW_COUNT_CHANGE, RELATIVE_ROW_COUNT) only support AUTO. Using explicit operators (GT, LT, etc.) on these metrics produces an error.

alert_conditions:
  - metric: ROW_COUNT_CHANGE
    operator: AUTO
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.05
    fields:
      - EMAIL

Properties


metric β€” built-in metric name

string Β· optional Β· one of metric or custom_metric required

Built-in metric name (e.g., ROW_COUNT_CHANGE, NULL_RATE, NUMERIC_MEAN). See Available Metrics. Mutually exclusive with custom_metric.

alert_conditions:
  - metric: NULL_RATE

custom_metric β€” custom SQL-based metric

object Β· optional Β· one of metric or custom_metric required

Mutually exclusive with metric.

PropertyTypeRequiredDescription
uuidstringnoUUID of an existing custom metric to reuse
display_namestringyesName for the metric
sql_expressionstringyesSQL expression that evaluates to a single numeric value (e.g., SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END))

alert_conditions:
  - custom_metric:
      display_name: Average Order Value
      sql_expression: "AVG(order_total)"
    operator: AUTO

fields β€” columns to compute the metric on

array of strings Β· optional Β· required for field-level metrics like NULL_COUNT

Not allowed for table-level metrics (e.g., ROW_COUNT_CHANGE).

alert_conditions:
  - metric: NULL_RATE
    fields:
      - EMAIL
      - PHONE_NUMBER

field_pattern β€” pattern-based field selection

object Β· optional

Selects fields dynamically by name pattern instead of listing them explicitly. Use instead of fields when column names follow conventions.

PropertyTypeRequiredDescription
operatorstringyesCONTAINING Β· ENDING_WITH Β· MATCHING Β· STARTING_WITH
valuestringyesPattern string to match against field names
case_sensitivebooleannoWhether the match is case-sensitive. Default: false
field_typeenumnoRestrict matching to a specific field type: BOOLEAN Β· DATE Β· NUMERIC Β· TEXT Β· TIME Β· TIME_OF_DAY

alert_conditions:
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.01
    field_pattern:
      operator: ENDING_WITH
      value: _id
      field_type: NUMERIC

operator β€” comparison operator

enum Β· optional
Accepted values: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP

AUTO uses ML anomaly detection (both high and low deviations). AUTO_HIGH/AUTO_LOW detect one-sided deviations. NOOP collects data without alerting. Explicit operators (GT, LT, etc.) require threshold_value. INSIDE_RANGE/OUTSIDE_RANGE require lower_threshold and upper_threshold.

alert_conditions:
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.05

threshold_value β€” static threshold for comparison

number Β· optional Β· required when using explicit operators like GT, LT, etc.

Ignored when operator is AUTO, AUTO_HIGH, or AUTO_LOW.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: LT
    threshold_value: 100

lower_threshold β€” lower bound for range operators

number Β· optional Β· required for INSIDE_RANGE / OUTSIDE_RANGE

Used with INSIDE_RANGE or OUTSIDE_RANGE operators.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: OUTSIDE_RANGE
    lower_threshold: 10
    upper_threshold: 1000

upper_threshold β€” upper bound for range operators

number Β· optional Β· required for INSIDE_RANGE / OUTSIDE_RANGE

Used with INSIDE_RANGE or OUTSIDE_RANGE operators.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: INSIDE_RANGE
    lower_threshold: 0
    upper_threshold: 100

type β€” alert condition type

string Β· optional Β· default: threshold
Accepted values: threshold Β· noop

Use noop to collect data without alerting. When type: noop, you must set operator: NOOP β€” the CLI rejects a noop condition with no operator.

alert_conditions:
  - metric: NULL_RATE
    type: noop
    operator: NOOP

baseline_trailing_days β€” trailing days for ML baseline

integer Β· optional

Number of trailing days for the ML baseline window. Used with drift metrics like PSI, KS_TEST, JS_DIVERGENCE. Minimum: 1.

alert_conditions:
  - metric: PSI
    operator: AUTO
    baseline_trailing_days: 30

baseline_start / baseline_end β€” fixed baseline period

string (ISO 8601 datetime) Β· optional

Fixed start and end of the baseline period. Used with drift/cardinality metrics. Must be a full ISO 8601 datetime (YYYY-MM-DDThh:mm:ss) β€” a date-only value like "2025-01-01" is rejected with "Not a valid datetime."

alert_conditions:
  - metric: KS_TEST
    operator: AUTO
    baseline_start: "2025-01-01T00:00:00"
    baseline_end: "2025-03-31T00:00:00"

num_bins β€” histogram bins for drift metrics

integer Β· optional

Number of histogram bins for drift metrics (PSI, KS_TEST, JS_DIVERGENCE). Range: 2--1000.

alert_conditions:
  - metric: PSI
    operator: AUTO
    num_bins: 50

id β€” stable identifier for this alert condition

string Β· optional

Preserved across updates.

alert_conditions:
  - metric: NULL_RATE
    id: null-rate-email
schedule β€” when the monitor runs

object Β· optional Β· default: system-managed schedule

Controls when the monitor runs. Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is not supported β€” use interval_minutes instead. Minimum interval_minutes is 60.

When using aggregate_by, interval_minutes must be a multiple of the bucket size:

aggregate_byMinimum interval_minutes
hour60
day1440
week10080
month43200

Omitting schedule means Monte Carlo runs the monitor on the default collection cycle (typically every 6--12 hours depending on table activity and your plan).

schedule:
  type: fixed
  interval_minutes: 1440

Properties


type β€” schedule type

enum Β· optional Β· default: fixed
Accepted values: fixed Β· dynamic Β· manual

schedule:
  type: dynamic

interval_minutes β€” run interval for fixed schedules

integer Β· optional

Must align with aggregate_by (e.g., daily aggregation requires a multiple of 1440). Minimum: 60.

schedule:
  type: fixed
  interval_minutes: 1440

interval_crontab β€” crontab schedule

array of strings Β· optional

Not supported for metric monitors. Use interval_minutes instead. Custom SQL and validation monitors support crontab.


start_time β€” schedule start time

string Β· optional

ISO 8601 format.

schedule:
  type: fixed
  interval_minutes: 1440
  start_time: "2025-01-01T06:00:00Z"

timezone β€” schedule timezone

string Β· optional

Timezone identifier (e.g., America/New_York).

schedule:
  type: fixed
  interval_minutes: 1440
  timezone: America/New_York

dynamic_schedule_tables β€” tables that trigger this monitor

array of strings Β· optional

Tables whose update events trigger this monitor (for dynamic schedules).

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:core.fct_orders

dynamic_schedule_jobs β€” jobs that trigger this monitor

array of objects Β· optional

Jobs whose completion triggers this monitor (for dynamic schedules).

PropertyTypeRequiredDescription
job_typestringyesAdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob
job_namestringyesName of the job
project_namestringyesProject or workspace containing the job
task_namestringnoSpecific task within the job
mconstringnoMCON identifier for the job

schedule:
  type: dynamic
  dynamic_schedule_jobs:
    - job_type: DbtJob
      job_name: daily_build
      project_name: analytics

min_interval_minutes β€” minimum interval between dynamic runs

integer Β· optional

Minimum interval between runs for dynamic schedules.

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:core.fct_orders
  min_interval_minutes: 60
aggregate_time_field β€” timestamp column for time bucketing

string Β· optional

Timestamp or date column used to bucket data by time. Omit for whole-table scans on each run. A DATE column limits you to daily (or coarser) aggregation; aggregate_by: hour on a DATE column produces meaningless buckets.

aggregate_time_field: created_at
aggregate_time_sql β€” SQL expression for time bucketing

string Β· optional

SQL expression that evaluates to a timestamp, used instead of aggregate_time_field when the time column needs transformation (e.g., epoch to timestamp).

aggregate_time_sql: "TO_TIMESTAMP(epoch_seconds)"
aggregate_by β€” time bucket granularity

enum Β· optional
Accepted values: hour Β· day Β· week Β· month

Must align with schedule.interval_minutes (e.g., daily requires a multiple of 1440, hourly requires a multiple of 60).

aggregate_by: day
aggregate_timezone β€” timezone for time aggregation

string Β· optional

Timezone identifier (e.g., America/New_York).

aggregate_timezone: America/New_York
collection_lag β€” offset for late-arriving data

integer Β· optional Β· default: 0

Number of hours to offset from the current period to account for late-arriving data (e.g., 24 for daily, 1 for hourly). Negative values are allowed to include one future time bucket (e.g., -24 for daily aggregation). Only valid when aggregate_by is set.

collection_lag: 2
where_condition β€” SQL WHERE clause to filter rows

string Β· optional

SQL WHERE clause (without the WHERE keyword) to filter rows before metric computation.

where_condition: "status != 'deleted'"
use_partition_clause β€” use the table's partition clause

boolean Β· optional Β· default: false

Use the table's partition clause for efficient querying. Not allowed when data_source.sql is used.

use_partition_clause: true
segment_fields β€” columns to segment metrics by

array of strings Β· optional Β· default: []

Each unique combination of values creates a separate time series. Maximum 5 segment fields. Use segment_sql when you need to bucket or transform values.

segment_fields:
  - region
  - product_category
segment_sql β€” SQL expressions for segmentation

array of strings Β· optional Β· default: []

SQL expressions for segmentation when column names alone are insufficient. Use instead of segment_fields when you need to bucket or transform values.

segment_sql:
  - "CASE WHEN country IN ('US','CA','MX') THEN 'NA' ELSE 'INTL' END"
high_segment_count β€” enable high-cardinality segmentation

boolean Β· optional Β· default: false

Enable support for high-cardinality segmentation (more than the default segment limit).

high_segment_count: true
sensitivity β€” anomaly detection aggressiveness

enum Β· optional
Accepted values: low Β· medium Β· high

Controls how aggressively AUTO thresholds flag anomalies.

ValueBehavior
lowFewer alerts, only large deviations
mediumBalanced
highMore alerts, smaller deviations flagged

sensitivity: medium
domains β€” domain for this monitor

array of strings (exactly one entry) Β· required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain
sampling_config β€” row sampling configuration

object Β· optional

sampling_config:
  percentage: 10

Properties


percentage β€” percentage of rows to sample

number Β· optional

Percentage of rows to sample (0--100).

sampling_config:
  percentage: 25

count β€” fixed number of rows to sample

integer Β· optional

sampling_config:
  count: 10000
name β€” unique identifier within the namespace

string Β· required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β€” incident history does not transfer.

name: orders_row_count
warehouse β€” which warehouse to use

string Β· optional Β· required if multiple warehouses

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake
connection_name β€” named connection

string Β· optional

Overrides the default connection.

connection_name: snowflake-analytics
timeout β€” query execution timeout

integer Β· optional

Query execution timeout in seconds.

timeout: 300
tags β€” key-value pairs for organizing monitors

array of objects Β· optional

PropertyTypeRequiredDescription
namestringyesTag key
valuestringnoTag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production
priority β€” incident priority level

enum Β· optional
Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5

priority: P2
audiences β€” notification channels

array of strings Β· optional

Audience names linking this monitor to channels defined in Notifications as Code. In exported/rendered YAML, appears as labels.

audiences:
  - data-engineering
  - platform-alerts
failure_audiences β€” notification channels for run failures

array of strings Β· optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - data-engineering-oncall
data_quality_dimension β€” data quality category

enum Β· optional
Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY

data_quality_dimension: COMPLETENESS
notes β€” internal notes

string Β· optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the analytics team. Reviewed quarterly.
is_draft β€” create as draft without activating

boolean Β· optional Β· default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β€” always include it if you want the monitor to stay in draft.

is_draft: true
uuid β€” update an existing monitor

string Β· optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19
Deprecated fields
FieldUse instead
resourcewarehouse
domaindomains
domain_uuidsdomains
labelsaudiences
notify_rule_run_failurenotify_run_failure
πŸ“˜

API-only fields

Some fields visible in the API (notify_run_failure, disable_look_back_bootstrap, skip_reset, fail_on_reset) are not available in MaC YAML.

Examples

Row count anomaly detection with daily aggregation

Detects unexpected changes in daily row volume using ML-based thresholds.

montecarlo:
  metric:
    - name: daily_orders_volume
      description: Track daily order volume for anomaly detection
      data_source:
        table: analytics:core.fct_orders
      aggregate_time_field: order_date
      aggregate_by: day
      alert_conditions:
        - metric: ROW_COUNT_CHANGE
          operator: AUTO
      sensitivity: medium
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - data-engineering-alerts
      priority: P2
      data_quality_dimension: COMPLETENESS
      domains:
        - my-domain
      tags:
        - name: team
          value: analytics

Null rate monitoring with explicit threshold

Alerts when null rates on critical fields exceed 5%.

montecarlo:
  metric:
    - name: customer_null_rate_check
      description: Alert when null rate on email or phone exceeds 5%
      warehouse: my-snowflake-warehouse
      data_source:
        table: raw:crm.customers
      aggregate_time_field: updated_at
      aggregate_by: day
      alert_conditions:
        - metric: NULL_RATE
          operator: GT
          threshold_value: 0.05
          fields:
            - EMAIL
            - PHONE_NUMBER
      schedule:
        type: dynamic
        dynamic_schedule_tables:
          - raw:crm.customers
      audiences:
        - crm-data-quality
      priority: P3
      data_quality_dimension: COMPLETENESS
      domains:
        - my-domain

Segmented metric with custom SQL expression

Monitors average order value per region, using a custom metric and SQL-based segmentation.

montecarlo:
  metric:
    - name: avg_order_value_by_region
      description: Track average order value segmented by sales region
      data_source:
        table: analytics:core.fct_orders
      aggregate_time_field: created_at
      aggregate_by: day
      segment_sql:
        - "CASE WHEN country IN ('US','CA','MX') THEN 'NA' WHEN country IN ('GB','DE','FR') THEN 'EU' ELSE 'OTHER' END"
      high_segment_count: false
      alert_conditions:
        - custom_metric:
            display_name: Average Order Value
            sql_expression: "AVG(order_total)"
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - revenue-monitoring
      priority: P2
      domains:
        - revenue-domain

Troubleshooting

Metric names

Use the canonical metric names from Available Metrics. Common mistakes:

  • AVG / MEAN β†’ use NUMERIC_MEAN
  • MIN β†’ use NUMERIC_MIN
  • MAX β†’ use NUMERIC_MAX
  • STDDEV β†’ use NUMERIC_STDDEV
  • SUM stays SUM β€” there is no NUMERIC_SUM
  • APPROX_DISTINCT_COUNT / COUNT_DISTINCT β†’ use UNIQUE_COUNT
  • COUNT_NULL β†’ use NULL_COUNT
  • ROW_COUNT is not a column metric β†’ the table-level metric is ROW_COUNT_CHANGE

Operators and alert conditions

  • Explicit operators on pipeline metrics fail. ROW_COUNT_CHANGE, TIME_SINCE_LAST_ROW_COUNT_CHANGE, and RELATIVE_ROW_COUNT only support AUTO. GT, LT, or any explicit operator produces an error.
  • NE is not valid. The inequality operator is NEQ.
  • alert_conditions is required. A metric monitor with none fails validation.

Fields and data source

  • Don't pass fields for table-level metrics. Metrics like ROW_COUNT_CHANGE operate on the whole table; including fields causes a validation error.
  • use_partition_clause is table-source only. Combining use_partition_clause: true with data_source.sql causes a validation error.
  • Verify column names. They're case-sensitive on most warehouses (Snowflake returns uppercase). Check the actual table schema before writing alert conditions.

Schedules

  • Bucket size must fit the interval. aggregate_by: day with interval_minutes: 60 fails validation β€” the interval must be at least as large as the bucket size.
  • No crontab on metric monitors. Use interval_minutes; only custom SQL, validation, and metric comparison monitors support interval_crontab.
  • interval_minutes minimum is 60. A lower value produces: "Metric monitors must have a interval_minutes >= 60."

Updates and deprecated fields

  • PUT semantics on updates. When updating a monitor by uuid, every field you omit reverts to its default β€” it is not left unchanged. Always specify the complete desired configuration.
  • Prefer warehouse over resource. The resource field still works but is deprecated.