Comparison Monitor

Compare metrics across two data sources and alert when differences exceed thresholds.

Overview

Compare the same metric across two tables — source and target — and get alerted when they diverge beyond a threshold. Useful for migration validation, cross-environment parity checks (staging vs production), or any scenario where two datasets should agree on row counts, null rates, or custom expressions.

📘
Reference scope
This page covers MaC YAML configuration. For an introduction to comparison monitors, see Comparison Monitors.

MaC key: metric_comparison. Replaces the legacy comparison_rule (blocked from new creation).

Quick Start

montecarlo:
  metric_comparison:
    - name: orders_row_count_parity
      description: Verify row count matches between staging and production orders
      source:
        warehouse: staging-snowflake
        table: staging:core.fct_orders
      target:
        warehouse: prod-snowflake
        table: analytics:core.fct_orders
      alert_conditions:
        - metric: ROW_COUNT
          type: comparison_delta
          operator: GT
          threshold_value: 100
      schedule:
        type: fixed
        interval_minutes: 1440
      domains:
        - my-domain

Create interactively with create_or_update_comparison_monitor(dry_run=True) via the Monte Carlo MCP server.

Configuration

description — what this monitor checks

string · required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: Verify staging and prod orders tables have matching row counts

source — source-side data configuration

object · required

Defines one side of the comparison — the table or SQL query, warehouse connection, and optional segmentation. The warehouse field inside source is always required; the top-level warehouse does not substitute for it.

source:
  warehouse: staging-snowflake
  table: staging:core.fct_orders
  where_condition: "status = 'ACTIVE'"

Properties

warehouse — warehouse for this side of the comparison

string · required

Warehouse UUID or name. Must be specified on each side independently — the top-level warehouse does not substitute for it.

warehouse: staging-snowflake

table — fully qualified table name

string · optional · one of table or sql required

Format: database:schema.table. Mutually exclusive with sql.

table: staging:core.fct_orders

sql — custom SQL query as the data source

string · optional · one of table or sql required

Custom SQL query. Mutually exclusive with table.

sql: "SELECT * FROM staging.core.fct_orders WHERE created_at > DATEADD(day, -7, CURRENT_DATE)"

transforms — AI-powered field transforms

array of objects · optional

Applied before metric computation. Same structure as metric monitor transforms.

transforms:
  - field: revenue
    expression: "amount_cents / 100.0"

where_condition — SQL WHERE filter

string · optional

SQL WHERE clause (without WHERE) to filter rows before comparison.

where_condition: "status = 'ACTIVE'"

segment_fields — columns to segment comparison by

array of strings · optional

Column names to segment the comparison by. Maximum 1 field.

segment_fields:
  - region

segment_sql — SQL expressions for segmentation

array of strings · optional

SQL expressions for segmentation when column names alone are insufficient. Maximum 1 expression.

segment_sql:
  - "DATE_TRUNC('month', created_at)"

connection_name — named connection override

string · optional

Named connection to use when executing queries on this side.

connection_name: staging-read-only

timeout — query execution timeout

integer · optional

Query execution timeout in seconds.

timeout: 300

target — target-side data configuration

object · required

Same structure as source. Defines the other side of the comparison. The warehouse field inside target is always required; the top-level warehouse does not substitute for it.

target:
  warehouse: prod-snowflake
  table: analytics:core.fct_orders

alert_conditions — comparison alert conditions

array of objects · required

Each entry defines which metric to compare and the threshold that triggers an alert. Supported alert condition types: comparison_delta, noop.

Available operators: GT · NOOP only. Other operators (LT, GTE, AUTO, etc.) are not supported.

Threshold type: comparison_delta alerts when the difference between source and target exceeds threshold_value. Set is_threshold_relative: true to treat threshold_value as a percentage difference rather than an absolute count.

alert_conditions:
  - metric: ROW_COUNT
    type: comparison_delta
    operator: GT
    threshold_value: 100

Properties

metric — built-in metric name

string · optional · one of metric or custom_metric required

Built-in metric name (e.g., ROW_COUNT, NULL_COUNT, NUMERIC_MEAN). Mutually exclusive with custom_metric. See Available Metrics.

metric: ROW_COUNT

custom_metric — custom SQL-based metric

object · optional · one of metric or custom_metric required

Custom SQL-based metric with separate expressions for source and target. Mutually exclusive with metric.

Property	Type	Required	Description
`uuid`	string	no	UUID of an existing custom metric to reuse
`display_name`	string	yes	Name for the metric
`source_sql_expression`	string	yes	SQL expression for the source side. Must return a single numeric value.
`target_sql_expression`	string	yes	SQL expression for the target side. Must return a single numeric value.

custom_metric:
  display_name: Total Revenue
  source_sql_expression: "SUM(amount_cents) / 100.0"
  target_sql_expression: "SUM(revenue_usd)"

source_field — column name on the source side

string · optional

Required when comparing field-level metrics and columns have different names on each side.

source_field: EMAIL_ADDR

target_field — column name on the target side

string · optional

Required when comparing field-level metrics and columns have different names on each side.

target_field: EMAIL

type — condition type

enum · optional · default: comparison_delta

Accepted values: comparison_delta · noop

Use noop to collect data without alerting.

type: comparison_delta

operator — comparison operator

enum · optional · default: GT

Accepted values: GT · NOOP

Only GT is supported for comparison_delta conditions.

operator: GT

threshold_value — maximum allowed delta

number · required, for comparison_delta conditions

Maximum allowed delta between source and target.

threshold_value: 100

is_threshold_relative — treat threshold as percentage

boolean · optional · default: false

When true, threshold_value is treated as a percentage (0-100) rather than an absolute value.

is_threshold_relative: true

schedule — execution schedule

object · optional · default: system-managed schedule

Controls when the monitor runs. Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported. Omitting schedule means Monte Carlo runs the monitor on the default collection cycle.

schedule:
  type: fixed
  interval_minutes: 1440

Properties

type — schedule type

enum · optional · default: fixed

Accepted values: fixed · dynamic · manual

type: fixed

interval_minutes — run interval

integer · optional

Run interval for fixed schedules.

interval_minutes: 1440

interval_crontab — crontab expressions

array of strings · optional

Crontab expressions for scheduling.

interval_crontab:
  - "0 6 * * *"

interval_crontab_day_operator — crontab day combination

enum · optional

Accepted values: AND · OR

How to combine day-of-week and day-of-month in crontab.

interval_crontab_day_operator: AND

start_time — schedule start time

string · optional

ISO 8601 format.

start_time: "2024-01-01T06:00:00Z"

timezone — schedule timezone

string · optional

timezone: America/New_York

dynamic_schedule_tables — tables that trigger the monitor

array of strings · required, when type is dynamic (unless dynamic_schedule_jobs is set)

Tables whose update events trigger this monitor.

dynamic_schedule_tables:
  - analytics:core.fct_orders

dynamic_schedule_jobs — jobs that trigger the monitor

array of objects · optional

Property	Type	Required	Description
`job_type`	string	yes	`AdfJob` · `AirflowDag` · `DatabricksJob` · `DbtJob`
`job_name`	string	yes	Name of the job
`project_name`	string	yes	Project or workspace containing the job
`task_name`	string	no	Specific task within the job
`mcon`	string	no	MCON identifier for the job

dynamic_schedule_jobs:
  - job_type: DbtJob
    job_name: nightly_build
    project_name: analytics

min_interval_minutes — minimum interval between dynamic runs

integer · optional

Minimum interval between runs for dynamic schedules.

min_interval_minutes: 60

domains — domain for this monitor

array of strings (exactly one entry) · required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain

name — unique identifier within the namespace

string · required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one — incident history does not transfer.

name: orders_row_count_parity

warehouse — top-level warehouse

string · optional

Warehouse UUID or name. Does not substitute for warehouse inside source and target — those are always required. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake

notes — internal notes

string · optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Migration parity check. Reviewed weekly during cutover.

notify_run_failure — notify on query failure

boolean · optional

Notify when the monitor query itself fails.

notify_run_failure: true

audiences — notification channels

array of strings · optional

Audience names linking this monitor to channels defined in Notifications as Code.

audiences:
  - migration-alerts
  - data-engineering

failure_audiences — run-failure notification channels

array of strings · optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - oncall-alerts

alert_grouping — control how breaches are grouped into alerts

object · optional · default: no grouping (a new alert is created every time the monitor breaches)

Groups subsequent breaches into the currently open alert rather than creating a new one, while the alert remains open and unresolved.

mode
Accepted values: group_into_open_alert

alert_grouping:
  mode: group_into_open_alert

priority — incident priority level

enum · optional

Accepted values: P1 · P2 · P3 · P4 · P5

priority: P2

tags — key-value pairs for organizing monitors

array of objects · optional

Property	Type	Required	Description
`name`	string	yes	Tag key
`value`	string	no	Tag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production

data_quality_dimension — data quality category

enum · optional

Accepted values: ACCURACY · COMPLETENESS · CONSISTENCY · TIMELINESS · UNIQUENESS · VALIDITY

data_quality_dimension: CONSISTENCY

is_draft — create as draft without activating

boolean · optional · default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics — always include it if you want the monitor to stay in draft.

is_draft: true

uuid — update an existing monitor

string · optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19

Deprecated fields

Field	Use instead
`resource`	`warehouse`
`domain`	`domains`
`domain_uuids`	`domains`
`labels`	`audiences`
`notify_rule_run_failure`	`notify_run_failure`

📘
API-only fields
Some fields visible in the API or JSON Schema (skip_reset, fail_on_reset) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Available Metrics

Table-level (no source_field / target_field): ROW_COUNT

Field-level (require source_field and target_field):

Uniqueness: UNIQUE_COUNT, DUPLICATE_COUNT, APPROX_DISTINCT_COUNT
Null/empty: NULL_COUNT, NON_NULL_COUNT, EMPTY_STRING_COUNT, TEXT_ALL_SPACES_COUNT, NAN_COUNT, TEXT_NULL_KEYWORD_COUNT
Numeric: NUMERIC_MEAN, NUMERIC_MEDIAN, NUMERIC_MIN, NUMERIC_MAX, NUMERIC_STDDEV, SUM, ZERO_COUNT, NEGATIVE_COUNT
Percentiles: PERCENTILE_20, PERCENTILE_40, PERCENTILE_60, PERCENTILE_80
Text stats: TEXT_MAX_LENGTH, TEXT_MIN_LENGTH, TEXT_MEAN_LENGTH, TEXT_STD_LENGTH
Text format: TEXT_NOT_INT_COUNT, TEXT_NOT_NUMBER_COUNT, TEXT_NOT_UUID_COUNT, TEXT_NOT_SSN_COUNT, TEXT_NOT_US_PHONE_COUNT, TEXT_NOT_US_STATE_CODE_COUNT, TEXT_NOT_US_ZIP_CODE_COUNT, TEXT_NOT_EMAIL_ADDRESS_COUNT, TEXT_NOT_TIMESTAMP_COUNT
Boolean: TRUE_COUNT, FALSE_COUNT
Timestamp: FUTURE_TIMESTAMP_COUNT, PAST_TIMESTAMP_COUNT, UNIX_ZERO_COUNT

Examples

Row count parity with absolute threshold

Alerts when the row count difference between source and target exceeds 100 rows.

montecarlo:
  metric_comparison:
    - name: orders_row_count_parity
      description: Verify staging and prod orders tables have matching row counts
      source:
        warehouse: staging-snowflake
        table: staging:core.fct_orders
      target:
        warehouse: prod-snowflake
        table: analytics:core.fct_orders
      alert_conditions:
        - metric: ROW_COUNT
          type: comparison_delta
          operator: GT
          threshold_value: 100
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - migration-alerts
      priority: P2
      data_quality_dimension: CONSISTENCY
      domains:
        - my-domain

Field-level comparison with different column names

Compares null counts between two tables where the column is named differently on each side.

montecarlo:
  metric_comparison:
    - name: email_nulls_comparison
      description: Compare email null counts between legacy and new CRM
      source:
        warehouse: prod-snowflake
        table: legacy:crm.contacts
      target:
        warehouse: prod-snowflake
        table: new:crm.customers
      alert_conditions:
        - metric: NULL_COUNT
          source_field: EMAIL_ADDR
          target_field: EMAIL
          type: comparison_delta
          operator: GT
          threshold_value: 50
      schedule:
        type: fixed
        interval_minutes: 720
      priority: P3
      domains:
        - my-domain

Custom metric comparison

Uses custom SQL expressions to compare a derived metric across both sides.

montecarlo:
  metric_comparison:
    - name: revenue_match
      description: Verify total revenue matches between source system and warehouse
      source:
        warehouse: source-postgres
        table: public:billing.invoices
      target:
        warehouse: prod-snowflake
        table: analytics:finance.fct_invoices
      alert_conditions:
        - custom_metric:
            display_name: Total Revenue
            source_sql_expression: "SUM(amount_cents) / 100.0"
            target_sql_expression: "SUM(revenue_usd)"
          type: comparison_delta
          operator: GT
          threshold_value: 0.01
          is_threshold_relative: true
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - finance-data-quality
      priority: P1
      domains:
        - my-domain

Troubleshooting

Metrics and fields

Passing source_field or target_field with ROW_COUNT. ROW_COUNT is a table-level metric. Including field references causes a validation error. This is the single most common mistake with comparison monitors.
Using custom_metric together with metric or field references. These are mutually exclusive. When using custom_metric, omit metric, source_field, and target_field entirely.

Source and target

Forgetting warehouse inside source / target. The warehouse field is required on each side. The top-level warehouse field does not substitute for it.

Alert conditions

Omitting threshold_value for comparison_delta conditions. The threshold is required — without it, the backend cannot evaluate the comparison.
Confusing camelCase and snake_case. Alert condition fields like source_field, target_field, threshold_value, and is_threshold_relative use snake_case in MaC YAML. The MCP tool uses camelCase (sourceField, targetField, thresholdValue, isThresholdRelative). Do not mix conventions.

Tags and updates

Wrong tags format. Tags must be objects with name and optional value keys. Writing tags: ["my-tag"] or tags: - my-tag fails validation.
Forgetting PUT semantics on updates. When updating a monitor by including uuid, every field you omit reverts to its default — it is not left unchanged. Always specify the complete desired configuration.

Updated about 1 month ago

Did this page help you?