Metric Monitor

Track field-level and table-level metrics over time with anomaly detection or fixed thresholds.

Overview

Track field-level and table-level metrics — null rates, row counts, distributions, and more — and get alerted when values breach thresholds or deviate from learned baselines. The metric monitor handles time-series data quality on a single table, with optional segmentation and custom metric expressions.

📘
Reference scope
This page covers MaC YAML configuration. For how metric monitors work and the full metrics list, see Metric Monitors and Available Metrics.

MaC key: metric. Replaces the legacy field_health monitor (blocked from new creation).

Quick Start

montecarlo:
  metric:
    - name: orders_row_count
      description: Alert on unexpected row count changes in the orders table
      data_source:
        table: my_database:my_schema.orders
      aggregate_time_field: created_at
      aggregate_by: day
      alert_conditions:
        - metric: ROW_COUNT_CHANGE
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 1440
      domains:
        - my-domain

Create interactively with create_or_update_metric_monitor(dry_run=True) via the Monte Carlo MCP server.

Configuration

description — what this monitor checks

string · required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: Alert on unexpected row count changes in the orders table

data_source — the table or SQL query to monitor

object · required

Provide either table or sql, not both.

data_source:
  table: analytics:core.fct_orders

Properties

table — fully qualified table name

string · optional · one of table or sql required

Format: database:schema.table. Mutually exclusive with sql.

data_source:
  table: analytics:core.fct_orders

sql — custom SQL query

string · optional · one of table or sql required

Custom SQL query that returns the dataset to monitor. Mutually exclusive with table. Not compatible with use_partition_clause.

data_source:
  sql: "SELECT * FROM analytics.core.fct_orders WHERE region = 'NA'"

transforms — AI-powered field transforms

array of objects · optional

AI-powered field transforms applied to the data source before metric computation.

Property	Type	Required	Description
`function`	string	yes	Transform function name (e.g., classification, extraction)
`field`	string	no	Column to apply the transform to
`alias`	string	no	Output alias for the transformed column
`sql_expression`	string	no	SQL expression defining the transform
`prompt`	string	no	LLM prompt for AI-powered transforms. Not supported by `classification` (which uses `categories` instead)
`categories`	array of objects	no	Category definitions for classification transforms. Each entry has a required `label`, optional `description`, and optional `examples` (array of strings)
`model_connection_id`	string	no	Connection ID for the AI model used by the transform
`model_name`	string	no	Name of the AI model to use
`output_type`	string	no	Expected output data type
`field_config_list`	array of objects	no	Field configuration list for multi-field transforms
`field_value_range`	object	no	Value range constraints with `lower_bound` and `upper_bound`
`id`	string	no	Unique identifier for this transform instance

data_source:
  table: analytics:core.fct_orders
  transforms:
    - function: classification
      field: description
      alias: description_category
      categories:
        - label: electronics
          description: Electronic devices
        - label: clothing
          description: Apparel items

alert_conditions — metrics and thresholds to monitor

array of objects · required

Each entry defines a metric to track and the condition that triggers an alert. Provide either metric (built-in) or custom_metric, not both. Multiple conditions are allowed. Supported alert condition types: threshold (default), noop.

Available operators: AUTO · AUTO_HIGH · AUTO_LOW · GT · GTE · LT · LTE · EQ · NEQ · INSIDE_RANGE · OUTSIDE_RANGE · NOOP

Threshold types:

Static — fixed numeric value via threshold_value with an explicit operator (GT, LT, etc.).
Range — lower_threshold and upper_threshold with INSIDE_RANGE or OUTSIDE_RANGE.
Anomaly detection (AUTO) — ML-based. AUTO catches both high and low deviations. AUTO_HIGH/AUTO_LOW catch one-sided deviations. Control aggressiveness with the monitor-level sensitivity field.

🚧
Pipeline metric operator restriction
Pipeline metrics (ROW_COUNT_CHANGE, TIME_SINCE_LAST_ROW_COUNT_CHANGE, RELATIVE_ROW_COUNT) only support AUTO. Using explicit operators (GT, LT, etc.) on these metrics produces an error.

alert_conditions:
  - metric: ROW_COUNT_CHANGE
    operator: AUTO
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.05
    fields:
      - EMAIL

Properties

metric — built-in metric name

string · optional · one of metric or custom_metric required

Built-in metric name (e.g., ROW_COUNT_CHANGE, NULL_RATE, NUMERIC_MEAN). See Available Metrics. Mutually exclusive with custom_metric.

alert_conditions:
  - metric: NULL_RATE

custom_metric — custom SQL-based metric

object · optional · one of metric or custom_metric required

Mutually exclusive with metric.

Property	Type	Required	Description
`uuid`	string	no	UUID of an existing custom metric to reuse
`display_name`	string	yes	Name for the metric
`sql_expression`	string	yes	SQL expression that evaluates to a single numeric value (e.g., `SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END)`)

alert_conditions:
  - custom_metric:
      display_name: Average Order Value
      sql_expression: "AVG(order_total)"
    operator: AUTO

fields — columns to compute the metric on

array of strings · optional · required for field-level metrics like NULL_COUNT

Not allowed for table-level metrics (e.g., ROW_COUNT_CHANGE).

alert_conditions:
  - metric: NULL_RATE
    fields:
      - EMAIL
      - PHONE_NUMBER

field_pattern — pattern-based field selection

object · optional

Selects fields dynamically by name pattern instead of listing them explicitly. Use instead of fields when column names follow conventions.

Property	Type	Required	Description
`operator`	string	yes	`CONTAINING` · `ENDING_WITH` · `MATCHING` · `STARTING_WITH`
`value`	string	yes	Pattern string to match against field names
`case_sensitive`	boolean	no	Whether the match is case-sensitive. Default: `false`
`field_type`	enum	no	Restrict matching to a specific field type: `BOOLEAN` · `DATE` · `NUMERIC` · `TEXT` · `TIME` · `TIME_OF_DAY`

alert_conditions:
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.01
    field_pattern:
      operator: ENDING_WITH
      value: _id
      field_type: NUMERIC

operator — comparison operator

enum · optional

Accepted values: AUTO · AUTO_HIGH · AUTO_LOW · GT · GTE · LT · LTE · EQ · NEQ · INSIDE_RANGE · OUTSIDE_RANGE · NOOP

AUTO uses ML anomaly detection (both high and low deviations). AUTO_HIGH/AUTO_LOW detect one-sided deviations. NOOP collects data without alerting. Explicit operators (GT, LT, etc.) require threshold_value. INSIDE_RANGE/OUTSIDE_RANGE require lower_threshold and upper_threshold.

alert_conditions:
  - metric: NULL_RATE
    operator: GT
    threshold_value: 0.05

threshold_value — static threshold for comparison

number · optional · required when using explicit operators like GT, LT, etc.

Ignored when operator is AUTO, AUTO_HIGH, or AUTO_LOW.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: LT
    threshold_value: 100

lower_threshold — lower bound for range operators

number · optional · required for INSIDE_RANGE / OUTSIDE_RANGE

Used with INSIDE_RANGE or OUTSIDE_RANGE operators.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: OUTSIDE_RANGE
    lower_threshold: 10
    upper_threshold: 1000

upper_threshold — upper bound for range operators

number · optional · required for INSIDE_RANGE / OUTSIDE_RANGE

Used with INSIDE_RANGE or OUTSIDE_RANGE operators.

alert_conditions:
  - metric: NUMERIC_MEAN
    operator: INSIDE_RANGE
    lower_threshold: 0
    upper_threshold: 100

type — alert condition type

string · optional · default: threshold

Accepted values: threshold · noop

Use noop to collect data without alerting. When type: noop, you must set operator: NOOP — the CLI rejects a noop condition with no operator.

alert_conditions:
  - metric: NULL_RATE
    type: noop
    operator: NOOP

baseline_trailing_days — trailing days for ML baseline

integer · optional

Number of trailing days for the ML baseline window. Used with drift metrics like PSI, KS_TEST, JS_DIVERGENCE. Minimum: 1.

alert_conditions:
  - metric: PSI
    operator: AUTO
    baseline_trailing_days: 30

baseline_start / baseline_end — fixed baseline period

string (ISO 8601 datetime) · optional

Fixed start and end of the baseline period. Used with drift/cardinality metrics. Must be a full ISO 8601 datetime (YYYY-MM-DDThh:mm:ss) — a date-only value like "2025-01-01" is rejected with "Not a valid datetime."

alert_conditions:
  - metric: KS_TEST
    operator: AUTO
    baseline_start: "2025-01-01T00:00:00"
    baseline_end: "2025-03-31T00:00:00"

num_bins — histogram bins for drift metrics

integer · optional

Number of histogram bins for drift metrics (PSI, KS_TEST, JS_DIVERGENCE). Range: 2--1000.

alert_conditions:
  - metric: PSI
    operator: AUTO
    num_bins: 50

id — stable identifier for this alert condition

string · optional

Preserved across updates.

alert_conditions:
  - metric: NULL_RATE
    id: null-rate-email

schedule — when the monitor runs

object · optional · default: system-managed schedule

Controls when the monitor runs. Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is not supported — use interval_minutes instead. Minimum interval_minutes is 60.

When using aggregate_by, interval_minutes must be a multiple of the bucket size:

`aggregate_by`	Minimum `interval_minutes`
`hour`	60
`day`	1440
`week`	10080
`month`	43200

Omitting schedule means Monte Carlo runs the monitor on the default collection cycle (typically every 6--12 hours depending on table activity and your plan).

schedule:
  type: fixed
  interval_minutes: 1440

Properties

type — schedule type

enum · optional · default: fixed

Accepted values: fixed · dynamic · manual

schedule:
  type: dynamic

interval_minutes — run interval for fixed schedules

integer · optional

Must align with aggregate_by (e.g., daily aggregation requires a multiple of 1440). Minimum: 60.

schedule:
  type: fixed
  interval_minutes: 1440

interval_crontab — crontab schedule

array of strings · optional

Not supported for metric monitors. Use interval_minutes instead. Custom SQL and validation monitors support crontab.

start_time — schedule start time

string · optional

ISO 8601 format.

schedule:
  type: fixed
  interval_minutes: 1440
  start_time: "2025-01-01T06:00:00Z"

timezone — schedule timezone

string · optional

Timezone identifier (e.g., America/New_York).

schedule:
  type: fixed
  interval_minutes: 1440
  timezone: America/New_York

dynamic_schedule_tables — tables that trigger this monitor

array of strings · optional

Tables whose update events trigger this monitor (for dynamic schedules).

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:core.fct_orders

dynamic_schedule_jobs — jobs that trigger this monitor

array of objects · optional

Jobs whose completion triggers this monitor (for dynamic schedules).

Property	Type	Required	Description
`job_type`	string	yes	`AdfJob` · `AirflowDag` · `DatabricksJob` · `DbtJob`
`job_name`	string	yes	Name of the job
`project_name`	string	yes	Project or workspace containing the job
`task_name`	string	no	Specific task within the job
`mcon`	string	no	MCON identifier for the job

schedule:
  type: dynamic
  dynamic_schedule_jobs:
    - job_type: DbtJob
      job_name: daily_build
      project_name: analytics

min_interval_minutes — minimum interval between dynamic runs

integer · optional

Minimum interval between runs for dynamic schedules.

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:core.fct_orders
  min_interval_minutes: 60

aggregate_time_field — timestamp column for time bucketing

string · optional

Timestamp or date column used to bucket data by time. Omit for whole-table scans on each run. A DATE column limits you to daily (or coarser) aggregation; aggregate_by: hour on a DATE column produces meaningless buckets.

aggregate_time_field: created_at

aggregate_time_sql — SQL expression for time bucketing

string · optional

SQL expression that evaluates to a timestamp, used instead of aggregate_time_field when the time column needs transformation (e.g., epoch to timestamp).

aggregate_time_sql: "TO_TIMESTAMP(epoch_seconds)"

aggregate_by — time bucket granularity

enum · optional

Accepted values: hour · day · week · month

Must align with schedule.interval_minutes (e.g., daily requires a multiple of 1440, hourly requires a multiple of 60).

aggregate_by: day

aggregate_timezone — timezone for time aggregation

string · optional

Timezone identifier (e.g., America/New_York).

aggregate_timezone: America/New_York

collection_lag — offset for late-arriving data

integer · optional · default: 0

Number of hours to offset from the current period to account for late-arriving data (e.g., 24 for daily, 1 for hourly). Negative values are allowed to include one future time bucket (e.g., -24 for daily aggregation). Only valid when aggregate_by is set.

collection_lag: 2

where_condition — SQL WHERE clause to filter rows

string · optional

SQL WHERE clause (without the WHERE keyword) to filter rows before metric computation.

where_condition: "status != 'deleted'"

use_partition_clause — use the table's partition clause

boolean · optional · default: false

Use the table's partition clause for efficient querying. Not allowed when data_source.sql is used.

use_partition_clause: true

segment_fields — columns to segment metrics by

array of strings · optional · default: []

Each unique combination of values creates a separate time series. Maximum 5 segment fields. Use segment_sql when you need to bucket or transform values.

segment_fields:
  - region
  - product_category

segment_sql — SQL expressions for segmentation

array of strings · optional · default: []

SQL expressions for segmentation when column names alone are insufficient. Use instead of segment_fields when you need to bucket or transform values.

segment_sql:
  - "CASE WHEN country IN ('US','CA','MX') THEN 'NA' ELSE 'INTL' END"

high_segment_count — enable high-cardinality segmentation

boolean · optional · default: false

Enable support for high-cardinality segmentation (more than the default segment limit).

high_segment_count: true

sensitivity — anomaly detection aggressiveness

enum · optional

Accepted values: low · medium · high

Controls how aggressively AUTO thresholds flag anomalies.

Value	Behavior
`low`	Fewer alerts, only large deviations
`medium`	Balanced
`high`	More alerts, smaller deviations flagged

sensitivity: medium

domains — domain for this monitor

array of strings (exactly one entry) · required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain

sampling_config — row sampling configuration

object · optional

sampling_config:
  percentage: 10

Properties

percentage — percentage of rows to sample

number · optional

Percentage of rows to sample (0--100).

sampling_config:
  percentage: 25

count — fixed number of rows to sample

integer · optional

sampling_config:
  count: 10000

name — unique identifier within the namespace

string · required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one — incident history does not transfer.

name: orders_row_count

warehouse — which warehouse to use

string · optional · required if multiple warehouses

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake

connection_name — named connection

string · optional

Overrides the default connection.

connection_name: snowflake-analytics

timeout — query execution timeout

integer · optional

Query execution timeout in seconds.

timeout: 300

tags — key-value pairs for organizing monitors

array of objects · optional

Property	Type	Required	Description
`name`	string	yes	Tag key
`value`	string	no	Tag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production

priority — incident priority level

enum · optional

Accepted values: P1 · P2 · P3 · P4 · P5

priority: P2

audiences — notification channels

array of strings · optional

Audience names linking this monitor to channels defined in Notifications as Code. In exported/rendered YAML, appears as labels.

audiences:
  - data-engineering
  - platform-alerts

failure_audiences — notification channels for run failures

array of strings · optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - data-engineering-oncall

alert_grouping — control how breaches are grouped into alerts

object · optional · default: no grouping (a new alert is created every time the monitor breaches)

Groups subsequent breaches into the currently open alert rather than creating a new one, while the alert remains open and unresolved.

mode
Accepted values: group_into_open_alert

alert_grouping:
  mode: group_into_open_alert

data_quality_dimension — data quality category

enum · optional

Accepted values: ACCURACY · COMPLETENESS · CONSISTENCY · TIMELINESS · UNIQUENESS · VALIDITY

data_quality_dimension: COMPLETENESS

notes — internal notes

string · optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the analytics team. Reviewed quarterly.

is_draft — create as draft without activating

boolean · optional · default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics — always include it if you want the monitor to stay in draft.

is_draft: true

uuid — update an existing monitor

string · optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19

Deprecated fields

Field	Use instead
`resource`	`warehouse`
`domain`	`domains`
`domain_uuids`	`domains`
`labels`	`audiences`
`notify_rule_run_failure`	`notify_run_failure`

📘
API-only fields
Some fields visible in the API (notify_run_failure, disable_look_back_bootstrap, skip_reset, fail_on_reset) are not available in MaC YAML.

Examples

Row count anomaly detection with daily aggregation

Detects unexpected changes in daily row volume using ML-based thresholds.

montecarlo:
  metric:
    - name: daily_orders_volume
      description: Track daily order volume for anomaly detection
      data_source:
        table: analytics:core.fct_orders
      aggregate_time_field: order_date
      aggregate_by: day
      alert_conditions:
        - metric: ROW_COUNT_CHANGE
          operator: AUTO
      sensitivity: medium
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - data-engineering-alerts
      priority: P2
      data_quality_dimension: COMPLETENESS
      domains:
        - my-domain
      tags:
        - name: team
          value: analytics

Null rate monitoring with explicit threshold

Alerts when null rates on critical fields exceed 5%.

montecarlo:
  metric:
    - name: customer_null_rate_check
      description: Alert when null rate on email or phone exceeds 5%
      warehouse: my-snowflake-warehouse
      data_source:
        table: raw:crm.customers
      aggregate_time_field: updated_at
      aggregate_by: day
      alert_conditions:
        - metric: NULL_RATE
          operator: GT
          threshold_value: 0.05
          fields:
            - EMAIL
            - PHONE_NUMBER
      schedule:
        type: dynamic
        dynamic_schedule_tables:
          - raw:crm.customers
      audiences:
        - crm-data-quality
      priority: P3
      data_quality_dimension: COMPLETENESS
      domains:
        - my-domain

Segmented metric with custom SQL expression

Monitors average order value per region, using a custom metric and SQL-based segmentation.

montecarlo:
  metric:
    - name: avg_order_value_by_region
      description: Track average order value segmented by sales region
      data_source:
        table: analytics:core.fct_orders
      aggregate_time_field: created_at
      aggregate_by: day
      segment_sql:
        - "CASE WHEN country IN ('US','CA','MX') THEN 'NA' WHEN country IN ('GB','DE','FR') THEN 'EU' ELSE 'OTHER' END"
      high_segment_count: false
      alert_conditions:
        - custom_metric:
            display_name: Average Order Value
            sql_expression: "AVG(order_total)"
          operator: AUTO
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - revenue-monitoring
      priority: P2
      domains:
        - revenue-domain

Troubleshooting

Metric names

Use the canonical metric names from Available Metrics. Common mistakes:

AVG / MEAN → use NUMERIC_MEAN
MIN → use NUMERIC_MIN
MAX → use NUMERIC_MAX
STDDEV → use NUMERIC_STDDEV
SUM stays SUM — there is no NUMERIC_SUM
APPROX_DISTINCT_COUNT / COUNT_DISTINCT → use UNIQUE_COUNT
COUNT_NULL → use NULL_COUNT
ROW_COUNT is not a column metric → the table-level metric is ROW_COUNT_CHANGE

Operators and alert conditions

Explicit operators on pipeline metrics fail. ROW_COUNT_CHANGE, TIME_SINCE_LAST_ROW_COUNT_CHANGE, and RELATIVE_ROW_COUNT only support AUTO. GT, LT, or any explicit operator produces an error.
NE is not valid. The inequality operator is NEQ.
alert_conditions is required. A metric monitor with none fails validation.

Fields and data source

Don't pass fields for table-level metrics. Metrics like ROW_COUNT_CHANGE operate on the whole table; including fields causes a validation error.
use_partition_clause is table-source only. Combining use_partition_clause: true with data_source.sql causes a validation error.
Verify column names. They're case-sensitive on most warehouses (Snowflake returns uppercase). Check the actual table schema before writing alert conditions.

Schedules

Bucket size must fit the interval. aggregate_by: day with interval_minutes: 60 fails validation — the interval must be at least as large as the bucket size.
No crontab on metric monitors. Use interval_minutes; only custom SQL, validation, and metric comparison monitors support interval_crontab.
interval_minutes minimum is 60. A lower value produces: "Metric monitors must have a interval_minutes >= 60."

Updates and deprecated fields

PUT semantics on updates. When updating a monitor by uuid, every field you omit reverts to its default — it is not left unchanged. Always specify the complete desired configuration.
Prefer warehouse over resource. The resource field still works but is deprecated.

Updated about 1 month ago

Did this page help you?