Comparison Monitor

Compare metrics across two data sources and alert when differences exceed thresholds.

Overview

Compare the same metric across two tables β€” source and target β€” and get alerted when they diverge beyond a threshold. Useful for migration validation, cross-environment parity checks (staging vs production), or any scenario where two datasets should agree on row counts, null rates, or custom expressions.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For an introduction to comparison monitors, see Comparison Monitors.

MaC key: metric_comparison. Replaces the legacy comparison_rule (blocked from new creation).

Quick Start

montecarlo:
  metric_comparison:
    - name: orders_row_count_parity
      description: Verify row count matches between staging and production orders
      source:
        warehouse: staging-snowflake
        table: staging:core.fct_orders
      target:
        warehouse: prod-snowflake
        table: analytics:core.fct_orders
      alert_conditions:
        - metric: ROW_COUNT
          type: comparison_delta
          operator: GT
          threshold_value: 100
      schedule:
        type: fixed
        interval_minutes: 1440
      domains:
        - my-domain

Create interactively with create_or_update_comparison_monitor(dry_run=True) via the Monte Carlo MCP server.

Configuration

description β€” what this monitor checks

string Β· required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: Verify staging and prod orders tables have matching row counts
source β€” source-side data configuration

object Β· required

Defines one side of the comparison β€” the table or SQL query, warehouse connection, and optional segmentation. The warehouse field inside source is always required; the top-level warehouse does not substitute for it.

source:
  warehouse: staging-snowflake
  table: staging:core.fct_orders
  where_condition: "status = 'ACTIVE'"

Properties


warehouse β€” warehouse for this side of the comparison

string Β· required

Warehouse UUID or name. Must be specified on each side independently β€” the top-level warehouse does not substitute for it.

warehouse: staging-snowflake

table β€” fully qualified table name

string Β· optional Β· one of table or sql required

Format: database:schema.table. Mutually exclusive with sql.

table: staging:core.fct_orders

sql β€” custom SQL query as the data source

string Β· optional Β· one of table or sql required

Custom SQL query. Mutually exclusive with table.

sql: "SELECT * FROM staging.core.fct_orders WHERE created_at > DATEADD(day, -7, CURRENT_DATE)"

transforms β€” AI-powered field transforms

array of objects Β· optional

Applied before metric computation. Same structure as metric monitor transforms.

transforms:
  - field: revenue
    expression: "amount_cents / 100.0"

where_condition β€” SQL WHERE filter

string Β· optional

SQL WHERE clause (without WHERE) to filter rows before comparison.

where_condition: "status = 'ACTIVE'"

segment_fields β€” columns to segment comparison by

array of strings Β· optional

Column names to segment the comparison by. Maximum 1 field.

segment_fields:
  - region

segment_sql β€” SQL expressions for segmentation

array of strings Β· optional

SQL expressions for segmentation when column names alone are insufficient. Maximum 1 expression.

segment_sql:
  - "DATE_TRUNC('month', created_at)"

connection_name β€” named connection override

string Β· optional

Named connection to use when executing queries on this side.

connection_name: staging-read-only

timeout β€” query execution timeout

integer Β· optional

Query execution timeout in seconds.

timeout: 300
target β€” target-side data configuration

object Β· required

Same structure as source. Defines the other side of the comparison. The warehouse field inside target is always required; the top-level warehouse does not substitute for it.

target:
  warehouse: prod-snowflake
  table: analytics:core.fct_orders
alert_conditions β€” comparison alert conditions

array of objects Β· required

Each entry defines which metric to compare and the threshold that triggers an alert. Supported alert condition types: comparison_delta, noop.

Available operators: GT Β· NOOP only. Other operators (LT, GTE, AUTO, etc.) are not supported.

Threshold type: comparison_delta alerts when the difference between source and target exceeds threshold_value. Set is_threshold_relative: true to treat threshold_value as a percentage difference rather than an absolute count.

alert_conditions:
  - metric: ROW_COUNT
    type: comparison_delta
    operator: GT
    threshold_value: 100

Properties


metric β€” built-in metric name

string Β· optional Β· one of metric or custom_metric required

Built-in metric name (e.g., ROW_COUNT, NULL_COUNT, NUMERIC_MEAN). Mutually exclusive with custom_metric. See Available Metrics.

metric: ROW_COUNT

custom_metric β€” custom SQL-based metric

object Β· optional Β· one of metric or custom_metric required

Custom SQL-based metric with separate expressions for source and target. Mutually exclusive with metric.

PropertyTypeRequiredDescription
uuidstringnoUUID of an existing custom metric to reuse
display_namestringyesName for the metric
source_sql_expressionstringyesSQL expression for the source side. Must return a single numeric value.
target_sql_expressionstringyesSQL expression for the target side. Must return a single numeric value.

custom_metric:
  display_name: Total Revenue
  source_sql_expression: "SUM(amount_cents) / 100.0"
  target_sql_expression: "SUM(revenue_usd)"

source_field β€” column name on the source side

string Β· optional

Required when comparing field-level metrics and columns have different names on each side.

source_field: EMAIL_ADDR

target_field β€” column name on the target side

string Β· optional

Required when comparing field-level metrics and columns have different names on each side.

target_field: EMAIL

type β€” condition type

enum Β· optional Β· default: comparison_delta

Accepted values: comparison_delta Β· noop

Use noop to collect data without alerting.

type: comparison_delta

operator β€” comparison operator

enum Β· optional Β· default: GT

Accepted values: GT Β· NOOP

Only GT is supported for comparison_delta conditions.

operator: GT

threshold_value β€” maximum allowed delta

number Β· required, for comparison_delta conditions

Maximum allowed delta between source and target.

threshold_value: 100

is_threshold_relative β€” treat threshold as percentage

boolean Β· optional Β· default: false

When true, threshold_value is treated as a percentage (0-100) rather than an absolute value.

is_threshold_relative: true
schedule β€” execution schedule

object Β· optional Β· default: system-managed schedule

Controls when the monitor runs. Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported. Omitting schedule means Monte Carlo runs the monitor on the default collection cycle.

schedule:
  type: fixed
  interval_minutes: 1440

Properties


type β€” schedule type

enum Β· optional Β· default: fixed

Accepted values: fixed Β· dynamic Β· manual

type: fixed

interval_minutes β€” run interval

integer Β· optional

Run interval for fixed schedules.

interval_minutes: 1440

interval_crontab β€” crontab expressions

array of strings Β· optional

Crontab expressions for scheduling.

interval_crontab:
  - "0 6 * * *"

interval_crontab_day_operator β€” crontab day combination

enum Β· optional

Accepted values: AND Β· OR

How to combine day-of-week and day-of-month in crontab.

interval_crontab_day_operator: AND

start_time β€” schedule start time

string Β· optional

ISO 8601 format.

start_time: "2024-01-01T06:00:00Z"

timezone β€” schedule timezone

string Β· optional

timezone: America/New_York

dynamic_schedule_tables β€” tables that trigger the monitor

array of strings Β· required, when type is dynamic (unless dynamic_schedule_jobs is set)

Tables whose update events trigger this monitor.

dynamic_schedule_tables:
  - analytics:core.fct_orders

dynamic_schedule_jobs β€” jobs that trigger the monitor

array of objects Β· optional

PropertyTypeRequiredDescription
job_typestringyesAdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob
job_namestringyesName of the job
project_namestringyesProject or workspace containing the job
task_namestringnoSpecific task within the job
mconstringnoMCON identifier for the job

dynamic_schedule_jobs:
  - job_type: DbtJob
    job_name: nightly_build
    project_name: analytics

min_interval_minutes β€” minimum interval between dynamic runs

integer Β· optional

Minimum interval between runs for dynamic schedules.

min_interval_minutes: 60
domains β€” domain for this monitor

array of strings (exactly one entry) Β· required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain
name β€” unique identifier within the namespace

string Β· required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β€” incident history does not transfer.

name: orders_row_count_parity
warehouse β€” top-level warehouse

string Β· optional

Warehouse UUID or name. Does not substitute for warehouse inside source and target β€” those are always required. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake
notes β€” internal notes

string Β· optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Migration parity check. Reviewed weekly during cutover.
notify_run_failure β€” notify on query failure

boolean Β· optional

Notify when the monitor query itself fails.

notify_run_failure: true
audiences β€” notification channels

array of strings Β· optional

Audience names linking this monitor to channels defined in Notifications as Code.

audiences:
  - migration-alerts
  - data-engineering
failure_audiences β€” run-failure notification channels

array of strings Β· optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - oncall-alerts
priority β€” incident priority level

enum Β· optional

Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5

priority: P2
tags β€” key-value pairs for organizing monitors

array of objects Β· optional

PropertyTypeRequiredDescription
namestringyesTag key
valuestringnoTag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production
data_quality_dimension β€” data quality category

enum Β· optional

Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY

data_quality_dimension: CONSISTENCY
is_draft β€” create as draft without activating

boolean Β· optional Β· default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β€” always include it if you want the monitor to stay in draft.

is_draft: true
uuid β€” update an existing monitor

string Β· optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19
Deprecated fields
FieldUse instead
resourcewarehouse
domaindomains
domain_uuidsdomains
labelsaudiences
notify_rule_run_failurenotify_run_failure
πŸ“˜

API-only fields

Some fields visible in the API or JSON Schema (skip_reset, fail_on_reset) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Available Metrics

Table-level (no source_field / target_field): ROW_COUNT

Field-level (require source_field and target_field):

  • Uniqueness: UNIQUE_COUNT, DUPLICATE_COUNT, APPROX_DISTINCT_COUNT
  • Null/empty: NULL_COUNT, NON_NULL_COUNT, EMPTY_STRING_COUNT, TEXT_ALL_SPACES_COUNT, NAN_COUNT, TEXT_NULL_KEYWORD_COUNT
  • Numeric: NUMERIC_MEAN, NUMERIC_MEDIAN, NUMERIC_MIN, NUMERIC_MAX, NUMERIC_STDDEV, SUM, ZERO_COUNT, NEGATIVE_COUNT
  • Percentiles: PERCENTILE_20, PERCENTILE_40, PERCENTILE_60, PERCENTILE_80
  • Text stats: TEXT_MAX_LENGTH, TEXT_MIN_LENGTH, TEXT_MEAN_LENGTH, TEXT_STD_LENGTH
  • Text format: TEXT_NOT_INT_COUNT, TEXT_NOT_NUMBER_COUNT, TEXT_NOT_UUID_COUNT, TEXT_NOT_SSN_COUNT, TEXT_NOT_US_PHONE_COUNT, TEXT_NOT_US_STATE_CODE_COUNT, TEXT_NOT_US_ZIP_CODE_COUNT, TEXT_NOT_EMAIL_ADDRESS_COUNT, TEXT_NOT_TIMESTAMP_COUNT
  • Boolean: TRUE_COUNT, FALSE_COUNT
  • Timestamp: FUTURE_TIMESTAMP_COUNT, PAST_TIMESTAMP_COUNT, UNIX_ZERO_COUNT

Examples

Row count parity with absolute threshold

Alerts when the row count difference between source and target exceeds 100 rows.

montecarlo:
  metric_comparison:
    - name: orders_row_count_parity
      description: Verify staging and prod orders tables have matching row counts
      source:
        warehouse: staging-snowflake
        table: staging:core.fct_orders
      target:
        warehouse: prod-snowflake
        table: analytics:core.fct_orders
      alert_conditions:
        - metric: ROW_COUNT
          type: comparison_delta
          operator: GT
          threshold_value: 100
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - migration-alerts
      priority: P2
      data_quality_dimension: CONSISTENCY
      domains:
        - my-domain

Field-level comparison with different column names

Compares null counts between two tables where the column is named differently on each side.

montecarlo:
  metric_comparison:
    - name: email_nulls_comparison
      description: Compare email null counts between legacy and new CRM
      source:
        warehouse: prod-snowflake
        table: legacy:crm.contacts
      target:
        warehouse: prod-snowflake
        table: new:crm.customers
      alert_conditions:
        - metric: NULL_COUNT
          source_field: EMAIL_ADDR
          target_field: EMAIL
          type: comparison_delta
          operator: GT
          threshold_value: 50
      schedule:
        type: fixed
        interval_minutes: 720
      priority: P3
      domains:
        - my-domain

Custom metric comparison

Uses custom SQL expressions to compare a derived metric across both sides.

montecarlo:
  metric_comparison:
    - name: revenue_match
      description: Verify total revenue matches between source system and warehouse
      source:
        warehouse: source-postgres
        table: public:billing.invoices
      target:
        warehouse: prod-snowflake
        table: analytics:finance.fct_invoices
      alert_conditions:
        - custom_metric:
            display_name: Total Revenue
            source_sql_expression: "SUM(amount_cents) / 100.0"
            target_sql_expression: "SUM(revenue_usd)"
          type: comparison_delta
          operator: GT
          threshold_value: 0.01
          is_threshold_relative: true
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - finance-data-quality
      priority: P1
      domains:
        - my-domain

Troubleshooting

Metrics and fields

  • Passing source_field or target_field with ROW_COUNT. ROW_COUNT is a table-level metric. Including field references causes a validation error. This is the single most common mistake with comparison monitors.
  • Using custom_metric together with metric or field references. These are mutually exclusive. When using custom_metric, omit metric, source_field, and target_field entirely.

Source and target

  • Forgetting warehouse inside source / target. The warehouse field is required on each side. The top-level warehouse field does not substitute for it.

Alert conditions

  • Omitting threshold_value for comparison_delta conditions. The threshold is required β€” without it, the backend cannot evaluate the comparison.
  • Confusing camelCase and snake_case. Alert condition fields like source_field, target_field, threshold_value, and is_threshold_relative use snake_case in MaC YAML. The MCP tool uses camelCase (sourceField, targetField, thresholdValue, isThresholdRelative). Do not mix conventions.

Tags and updates

  • Wrong tags format. Tags must be objects with name and optional value keys. Writing tags: ["my-tag"] or tags: - my-tag fails validation.
  • Forgetting PUT semantics on updates. When updating a monitor by including uuid, every field you omit reverts to its default β€” it is not left unchanged. Always specify the complete desired configuration.