Comparison Monitor
Compare metrics across two data sources and alert when differences exceed thresholds.
Overview
Compare the same metric across two tables β source and target β and get alerted when they diverge beyond a threshold. Useful for migration validation, cross-environment parity checks (staging vs production), or any scenario where two datasets should agree on row counts, null rates, or custom expressions.
Reference scopeThis page covers MaC YAML configuration. For an introduction to comparison monitors, see Comparison Monitors.
MaC key: metric_comparison. Replaces the legacy comparison_rule (blocked from new creation).
Quick Start
montecarlo:
metric_comparison:
- name: orders_row_count_parity
description: Verify row count matches between staging and production orders
source:
warehouse: staging-snowflake
table: staging:core.fct_orders
target:
warehouse: prod-snowflake
table: analytics:core.fct_orders
alert_conditions:
- metric: ROW_COUNT
type: comparison_delta
operator: GT
threshold_value: 100
schedule:
type: fixed
interval_minutes: 1440
domains:
- my-domainCreate interactively with create_or_update_comparison_monitor(dry_run=True) via the Monte Carlo MCP server.
Configuration
string Β· required
Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.
description: Verify staging and prod orders tables have matching row countsobject Β· required
Defines one side of the comparison β the table or SQL query, warehouse connection, and optional segmentation. The warehouse field inside source is always required; the top-level warehouse does not substitute for it.
source:
warehouse: staging-snowflake
table: staging:core.fct_orders
where_condition: "status = 'ACTIVE'"Properties
warehouse β warehouse for this side of the comparison
string Β· required
Warehouse UUID or name. Must be specified on each side independently β the top-level warehouse does not substitute for it.
warehouse: staging-snowflaketable β fully qualified table name
string Β· optional Β· one of table or sql required
Format: database:schema.table. Mutually exclusive with sql.
table: staging:core.fct_orderssql β custom SQL query as the data source
string Β· optional Β· one of table or sql required
Custom SQL query. Mutually exclusive with table.
sql: "SELECT * FROM staging.core.fct_orders WHERE created_at > DATEADD(day, -7, CURRENT_DATE)"transforms β AI-powered field transforms
array of objects Β· optional
Applied before metric computation. Same structure as metric monitor transforms.
transforms:
- field: revenue
expression: "amount_cents / 100.0"where_condition β SQL WHERE filter
string Β· optional
SQL WHERE clause (without WHERE) to filter rows before comparison.
where_condition: "status = 'ACTIVE'"segment_fields β columns to segment comparison by
array of strings Β· optional
Column names to segment the comparison by. Maximum 1 field.
segment_fields:
- regionsegment_sql β SQL expressions for segmentation
array of strings Β· optional
SQL expressions for segmentation when column names alone are insufficient. Maximum 1 expression.
segment_sql:
- "DATE_TRUNC('month', created_at)"connection_name β named connection override
string Β· optional
Named connection to use when executing queries on this side.
connection_name: staging-read-onlytimeout β query execution timeout
integer Β· optional
Query execution timeout in seconds.
timeout: 300object Β· required
Same structure as source. Defines the other side of the comparison. The warehouse field inside target is always required; the top-level warehouse does not substitute for it.
target:
warehouse: prod-snowflake
table: analytics:core.fct_ordersarray of objects Β· required
Each entry defines which metric to compare and the threshold that triggers an alert. Supported alert condition types: comparison_delta, noop.
Available operators: GT Β· NOOP only. Other operators (LT, GTE, AUTO, etc.) are not supported.
Threshold type: comparison_delta alerts when the difference between source and target exceeds threshold_value. Set is_threshold_relative: true to treat threshold_value as a percentage difference rather than an absolute count.
alert_conditions:
- metric: ROW_COUNT
type: comparison_delta
operator: GT
threshold_value: 100Properties
metric β built-in metric name
string Β· optional Β· one of metric or custom_metric required
Built-in metric name (e.g., ROW_COUNT, NULL_COUNT, NUMERIC_MEAN). Mutually exclusive with custom_metric. See Available Metrics.
metric: ROW_COUNTcustom_metric β custom SQL-based metric
object Β· optional Β· one of metric or custom_metric required
Custom SQL-based metric with separate expressions for source and target. Mutually exclusive with metric.
| Property | Type | Required | Description |
|---|---|---|---|
uuid | string | no | UUID of an existing custom metric to reuse |
display_name | string | yes | Name for the metric |
source_sql_expression | string | yes | SQL expression for the source side. Must return a single numeric value. |
target_sql_expression | string | yes | SQL expression for the target side. Must return a single numeric value. |
custom_metric:
display_name: Total Revenue
source_sql_expression: "SUM(amount_cents) / 100.0"
target_sql_expression: "SUM(revenue_usd)"source_field β column name on the source side
string Β· optional
Required when comparing field-level metrics and columns have different names on each side.
source_field: EMAIL_ADDRtarget_field β column name on the target side
string Β· optional
Required when comparing field-level metrics and columns have different names on each side.
target_field: EMAILtype β condition type
enum Β· optional Β· default: comparison_delta
Accepted values: comparison_delta Β· noop
Use noop to collect data without alerting.
type: comparison_deltaoperator β comparison operator
enum Β· optional Β· default: GT
Accepted values: GT Β· NOOP
Only GT is supported for comparison_delta conditions.
operator: GTthreshold_value β maximum allowed delta
number Β· required, for comparison_delta conditions
Maximum allowed delta between source and target.
threshold_value: 100is_threshold_relative β treat threshold as percentage
boolean Β· optional Β· default: false
When true, threshold_value is treated as a percentage (0-100) rather than an absolute value.
is_threshold_relative: trueobject Β· optional Β· default: system-managed schedule
Controls when the monitor runs. Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported. Omitting schedule means Monte Carlo runs the monitor on the default collection cycle.
schedule:
type: fixed
interval_minutes: 1440Properties
type β schedule type
enum Β· optional Β· default: fixed
Accepted values: fixed Β· dynamic Β· manual
type: fixedinterval_minutes β run interval
integer Β· optional
Run interval for fixed schedules.
interval_minutes: 1440interval_crontab β crontab expressions
array of strings Β· optional
Crontab expressions for scheduling.
interval_crontab:
- "0 6 * * *"interval_crontab_day_operator β crontab day combination
enum Β· optional
Accepted values: AND Β· OR
How to combine day-of-week and day-of-month in crontab.
interval_crontab_day_operator: ANDstart_time β schedule start time
string Β· optional
ISO 8601 format.
start_time: "2024-01-01T06:00:00Z"timezone β schedule timezone
string Β· optional
timezone: America/New_Yorkdynamic_schedule_tables β tables that trigger the monitor
array of strings Β· required, when type is dynamic (unless dynamic_schedule_jobs is set)
Tables whose update events trigger this monitor.
dynamic_schedule_tables:
- analytics:core.fct_ordersdynamic_schedule_jobs β jobs that trigger the monitor
array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
job_type | string | yes | AdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob |
job_name | string | yes | Name of the job |
project_name | string | yes | Project or workspace containing the job |
task_name | string | no | Specific task within the job |
mcon | string | no | MCON identifier for the job |
dynamic_schedule_jobs:
- job_type: DbtJob
job_name: nightly_build
project_name: analyticsmin_interval_minutes β minimum interval between dynamic runs
integer Β· optional
Minimum interval between runs for dynamic schedules.
min_interval_minutes: 60array of strings (exactly one entry) Β· required on all accounts created after January 2025
Set default_domain in montecarlo.yml to avoid repeating it on every monitor.
domains:
- my-domainstring Β· required
Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β incident history does not transfer.
name: orders_row_count_paritystring Β· optional
Warehouse UUID or name. Does not substitute for warehouse inside source and target β those are always required. Overrides default_resource from montecarlo.yml.
warehouse: my-snowflakestring Β· optional
Visible in the Monte Carlo UI. Not included in notifications.
notes: Migration parity check. Reviewed weekly during cutover.boolean Β· optional
Notify when the monitor query itself fails.
notify_run_failure: truearray of strings Β· optional
Audience names linking this monitor to channels defined in Notifications as Code.
audiences:
- migration-alerts
- data-engineeringarray of strings Β· optional
Separate audiences for run-failure notifications. Falls back to audiences if not set.
failure_audiences:
- oncall-alertsenum Β· optional
Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5
priority: P2array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tag key |
value | string | no | Tag value |
tags:
- name: team
value: analytics
- name: environment
value: productionenum Β· optional
Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY
data_quality_dimension: CONSISTENCYboolean Β· optional Β· default: false
Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β always include it if you want the monitor to stay in draft.
is_draft: truestring Β· optional
Include the UUID of an existing monitor to update it instead of creating a new one.
uuid: 0dae7702-0950-45c7-909c-7e183bddca19Deprecated fields
| Field | Use instead |
|---|---|
resource | warehouse |
domain | domains |
domain_uuids | domains |
labels | audiences |
notify_rule_run_failure | notify_run_failure |
API-only fieldsSome fields visible in the API or JSON Schema (
skip_reset,fail_on_reset) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.
Available Metrics
Table-level (no source_field / target_field): ROW_COUNT
Field-level (require source_field and target_field):
- Uniqueness:
UNIQUE_COUNT,DUPLICATE_COUNT,APPROX_DISTINCT_COUNT - Null/empty:
NULL_COUNT,NON_NULL_COUNT,EMPTY_STRING_COUNT,TEXT_ALL_SPACES_COUNT,NAN_COUNT,TEXT_NULL_KEYWORD_COUNT - Numeric:
NUMERIC_MEAN,NUMERIC_MEDIAN,NUMERIC_MIN,NUMERIC_MAX,NUMERIC_STDDEV,SUM,ZERO_COUNT,NEGATIVE_COUNT - Percentiles:
PERCENTILE_20,PERCENTILE_40,PERCENTILE_60,PERCENTILE_80 - Text stats:
TEXT_MAX_LENGTH,TEXT_MIN_LENGTH,TEXT_MEAN_LENGTH,TEXT_STD_LENGTH - Text format:
TEXT_NOT_INT_COUNT,TEXT_NOT_NUMBER_COUNT,TEXT_NOT_UUID_COUNT,TEXT_NOT_SSN_COUNT,TEXT_NOT_US_PHONE_COUNT,TEXT_NOT_US_STATE_CODE_COUNT,TEXT_NOT_US_ZIP_CODE_COUNT,TEXT_NOT_EMAIL_ADDRESS_COUNT,TEXT_NOT_TIMESTAMP_COUNT - Boolean:
TRUE_COUNT,FALSE_COUNT - Timestamp:
FUTURE_TIMESTAMP_COUNT,PAST_TIMESTAMP_COUNT,UNIX_ZERO_COUNT
Examples
Row count parity with absolute threshold
Alerts when the row count difference between source and target exceeds 100 rows.
montecarlo:
metric_comparison:
- name: orders_row_count_parity
description: Verify staging and prod orders tables have matching row counts
source:
warehouse: staging-snowflake
table: staging:core.fct_orders
target:
warehouse: prod-snowflake
table: analytics:core.fct_orders
alert_conditions:
- metric: ROW_COUNT
type: comparison_delta
operator: GT
threshold_value: 100
schedule:
type: fixed
interval_minutes: 1440
audiences:
- migration-alerts
priority: P2
data_quality_dimension: CONSISTENCY
domains:
- my-domainField-level comparison with different column names
Compares null counts between two tables where the column is named differently on each side.
montecarlo:
metric_comparison:
- name: email_nulls_comparison
description: Compare email null counts between legacy and new CRM
source:
warehouse: prod-snowflake
table: legacy:crm.contacts
target:
warehouse: prod-snowflake
table: new:crm.customers
alert_conditions:
- metric: NULL_COUNT
source_field: EMAIL_ADDR
target_field: EMAIL
type: comparison_delta
operator: GT
threshold_value: 50
schedule:
type: fixed
interval_minutes: 720
priority: P3
domains:
- my-domainCustom metric comparison
Uses custom SQL expressions to compare a derived metric across both sides.
montecarlo:
metric_comparison:
- name: revenue_match
description: Verify total revenue matches between source system and warehouse
source:
warehouse: source-postgres
table: public:billing.invoices
target:
warehouse: prod-snowflake
table: analytics:finance.fct_invoices
alert_conditions:
- custom_metric:
display_name: Total Revenue
source_sql_expression: "SUM(amount_cents) / 100.0"
target_sql_expression: "SUM(revenue_usd)"
type: comparison_delta
operator: GT
threshold_value: 0.01
is_threshold_relative: true
schedule:
type: fixed
interval_minutes: 1440
audiences:
- finance-data-quality
priority: P1
domains:
- my-domainTroubleshooting
Metrics and fields
- Passing
source_fieldortarget_fieldwithROW_COUNT.ROW_COUNTis a table-level metric. Including field references causes a validation error. This is the single most common mistake with comparison monitors. - Using
custom_metrictogether withmetricor field references. These are mutually exclusive. When usingcustom_metric, omitmetric,source_field, andtarget_fieldentirely.
Source and target
- Forgetting
warehouseinsidesource/target. Thewarehousefield is required on each side. The top-levelwarehousefield does not substitute for it.
Alert conditions
- Omitting
threshold_valueforcomparison_deltaconditions. The threshold is required β without it, the backend cannot evaluate the comparison. - Confusing camelCase and snake_case. Alert condition fields like
source_field,target_field,threshold_value, andis_threshold_relativeuse snake_case in MaC YAML. The MCP tool uses camelCase (sourceField,targetField,thresholdValue,isThresholdRelative). Do not mix conventions.
Tags and updates
- Wrong
tagsformat. Tags must be objects withnameand optionalvaluekeys. Writingtags: ["my-tag"]ortags: - my-tagfails validation. - Forgetting PUT semantics on updates. When updating a monitor by including
uuid, every field you omit reverts to its default β it is not left unchanged. Always specify the complete desired configuration.
Updated about 2 hours ago
