Validation Monitor
Row-level data quality assertions that flag invalid rows.
Overview
Define row-level data quality rules and get alerted when rows break them. You write an alert_condition β a predicate tree β that describes what invalid data looks like. Rows that match are breaches; if any exist (or exceed a percentage threshold), an incident is created.
Reference scopeThis page covers MaC YAML configuration. For available predicates and UI setup, see Validation Monitors and Available Conditions.
Conditions define INVALID data.The single most common mistake: writing conditions that match valid rows. A validation condition describes what's wrong β rows that match are flagged as breaches.
Predicate types: UNARY (single-field checks like null), BINARY (field-to-value comparisons like greater_than), SQL (free-form boolean expressions), and GROUP (AND/OR combinators). Any predicate can be inverted with negated: true.
For aggregate metrics, use a Metric Monitor. For arbitrary SQL returning a single number, use a Custom SQL Monitor.
Quick Start
montecarlo:
validation:
- name: orders_user_id_not_null
description: "Alert when user_id is null in orders table"
schedule:
type: fixed
interval_minutes: 720
data_source:
table: "my_database:my_schema.orders"
alert_condition:
type: GROUP
operator: AND
conditions:
- type: UNARY
predicate:
name: "null"
value:
- field: user_id
domains:
- my-domainCreate interactively with create_or_update_validation_monitor(dry_run=True) via the Monte Carlo MCP server. Use get_validation_predicates(warehouse=<uuid>) to discover available predicates for your warehouse.
Configuration
object Β· required
Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported.
Properties
type β schedule type
enum Β· optional Β· default: fixed
Accepted values: fixed Β· dynamic Β· manual
schedule:
type: fixedinterval_minutes β run interval for fixed schedules
integer Β· optional
Required when type is fixed.
schedule:
type: fixed
interval_minutes: 720interval_crontab β cron expressions
array of strings Β· optional
5-field cron format.
schedule:
type: fixed
interval_crontab:
- "0 8 * * *"interval_crontab_day_operator β day-of-week/day-of-month combination
enum Β· optional
Accepted values: AND Β· OR
schedule:
interval_crontab_day_operator: ANDstart_time β ISO 8601 start time
string Β· optional
schedule:
start_time: "2024-01-01T08:00:00Z"timezone β IANA timezone
string Β· optional
schedule:
timezone: America/Los_Angelesdynamic_schedule_tables β tables that trigger the monitor
array of strings Β· optional (required when type is dynamic)
schedule:
type: dynamic
dynamic_schedule_tables:
- analytics:public.ordersdynamic_schedule_jobs β jobs that trigger the monitor
array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
job_type | enum | yes | AirflowDag Β· DatabricksJob Β· AdfJob Β· DbtJob |
job_name | string | yes | Name of the job |
project_name | string | yes | Project or resource name |
task_name | string | no | Task within the job |
mcon | string | no | MCON identifier |
schedule:
type: dynamic
dynamic_schedule_jobs:
- job_type: AirflowDag
job_name: etl_orders
project_name: my-airflowmin_interval_minutes β minimum interval for dynamic schedules
integer Β· optional
schedule:
type: dynamic
min_interval_minutes: 30schedule:
type: fixed
interval_minutes: 720object Β· required
Provide either table or sql, not both.
Properties
table β fully qualified table name
string Β· required (if sql is not provided)
Format: database:schema.table.
data_source:
table: "analytics_db:public.orders"sql β SQL query returning rows to validate
string Β· required (if table is not provided)
data_source:
sql: |
SELECT * FROM orders
WHERE created_at >= CURRENT_DATE - 1transforms β AI-powered field transforms
array of objects Β· optional
See data_source.transforms in the Metric Monitor reference.
data_source:
table: "analytics_db:public.orders"
transforms:
- field: status
transform: normalizedata_source:
table: "analytics_db:public.orders"object Β· required
A predicate tree. The root node must be a GROUP. Each node is one of four types: GROUP, UNARY, BINARY, or SQL. Conditions match invalid data β a row that matches is a breach. Validation monitors do not use operators or threshold types β alerting is determined by the predicate tree and optional percentage_threshold/percentage_operator fields.
Noise reduction: Set event_rollup_count (min 2) or event_rollup_until_changed at the monitor level to suppress notifications until N consecutive breaches or until the result changes. These two fields are mutually exclusive.
alert_conditionis singular.Other monitor types use
alert_conditions(plural, an array). Validation monitors usealert_condition(singular, a single object). The plural form causes a validation error.
Properties
GROUP node β combine conditions with AND/OR logic
The root alert_condition must be a GROUP. Groups can be nested for complex logic.
| Property | Type | Required | Description |
|---|---|---|---|
type | string | yes | GROUP |
operator | string | no | AND (default) or OR |
conditions | array | yes | Child nodes (GROUP, UNARY, BINARY, or SQL) |
alert_condition:
type: GROUP
operator: AND
conditions:
- type: UNARY
predicate:
name: "null"
value:
- field: user_idUNARY predicate β single-field check with no comparison value
| Property | Type | Required | Description |
|---|---|---|---|
type | string | yes | UNARY |
predicate | object | yes | name (string) and optional negated (boolean) |
value | array | yes | Field references: - field: <column_name> |
Available UNARY predicates by field type:
| Field Type | Predicates |
|---|---|
| Numeric | null, is_zero, is_negative, is_between_0_and_1, is_between_0_and_100, is_nan |
| String | null, empty_string, all_space, containing_spaces, null_string, is_uuid, email, true_string, false_string, timestamp_iso_8601, timestamp_mm-dd-yyyy, timestamp_dd-mm-yyyy, timestamp_mm/dd/yyyy, timestamp_dd/mm/yyyy, us_zip_code, us_ssn, us_state_code, us_phone_number, can_sin, ca_postal_code, fr_insee_code, fr_postal_code, de_postal_code, de_tax_id, ie_ppsn, ie_postal_code, it_postal_code, it_fiscal_code, es_dni, es_postal_code, uk_postal_code, uk_nino, nl_postal_code, nl_bsn, tr_postal_code, tr_id_no, ch_oasi, ch_postal_code, pl_postal_code, pl_pesel, aus_postal_code, aus_state_code |
| Boolean | null |
| Date | null, in_past, in_future, today, yesterday, in_past_7_days, in_past_30_days, in_past_365_days, in_past_calendar_month, in_past_calendar_week, weekday, sunday through saturday |
| Timestamp | null, in_past, in_future, in_past_60_minutes, in_past_24_hours, in_past_7_days, in_past_30_days, in_past_365_days, in_past_calendar_month, in_past_calendar_week, today, yesterday, weekday, sunday through saturday |
Predicate availability varies by warehouseRun
get_validation_predicatesto get the current list for your warehouse.
- type: UNARY
predicate:
name: "null"
value:
- field: emailBINARY predicate β field-to-value comparison
| Property | Type | Required | Description |
|---|---|---|---|
type | string | yes | BINARY |
predicate | object | yes | name (string) and optional negated (boolean) |
left | array | yes | Field references: - field: <column_name> |
right | array | yes | One of: literal, field, or sql. See right-side value types below. |
Available BINARY predicates by field type:
| Field Type | Predicates |
|---|---|
| Numeric | equal, greater_than, greater_than_or_equal, less_than, less_than_or_equal, in_set |
| String | equal, contains, starts_with, ends_with, matches_regex, in_set |
| Boolean | equal |
| Date / Timestamp | equal, greater_than, greater_than_or_equal, less_than, less_than_or_equal |
Right-side value types:
| Value Type | Syntax | Description |
|---|---|---|
literal | - literal: "value" | A static value. Always pass as a string, even for numbers (e.g., - literal: "150", not - literal: 150). |
field | - field: column_name | Reference to another column. Only with date/timestamp left fields or in_set. |
sql | - sql: "expression" | A scalar SQL expression or subquery (must not start with SELECT β use parentheses). |
Left side must always be a field reference.You cannot put a
literal,sql, or aggregate on the left side.
- type: BINARY
predicate:
name: greater_than
left:
- field: age
right:
- literal: "150"SQL predicate β free-form SQL boolean expression
| Property | Type | Required | Description |
|---|---|---|---|
type | string | yes | SQL |
sql | string | yes | A SQL boolean expression (e.g. amount > 0 AND amount < 10000). Not a query β no SELECT. |
- type: SQL
sql: "status = 'failed' AND amount < 0"Negation β invert any UNARY or BINARY predicate
Any UNARY or BINARY predicate can be inverted with negated: true in the predicate object.
# Alert when status is NOT in the allowed set
- type: BINARY
predicate:
name: in_set
negated: true
left:
- field: status
right:
- literal: "active"
- literal: "inactive"
- literal: "pending"alert_condition:
type: GROUP
operator: AND
conditions:
- type: UNARY
predicate:
name: "null"
value:
- field: user_idstring Β· required
Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.
description: "Alert when user_id is null in orders table"array of strings (exactly one entry) Β· required on all accounts created after January 2025
Set default_domain in montecarlo.yml to avoid repeating it on every monitor.
domains:
- my-domainnumber Β· optional
Alert only when the percentage of invalid rows exceeds this value. Minimum: 0. When omitted, the monitor alerts on any matching row. Must be paired with percentage_operator β setting one without the other causes a validation error.
percentage_threshold: 5
percentage_operator: GTenum Β· optional
Accepted values: EQ Β· GT Β· GTE Β· LT Β· LTE
Must be paired with percentage_threshold β setting one without the other causes a validation error.
percentage_threshold: 1.0
percentage_operator: GTstring Β· optional (yes if multiple warehouses)
Warehouse UUID or name. Overrides default_resource from montecarlo.yml.
warehouse: my-snowflakestring Β· required
Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β incident history does not transfer.
name: null_user_id_checkstring Β· optional
Overrides the default connection for this warehouse.
connection_name: analytics-readonlyinteger Β· optional
Minimum 2. Mutually exclusive with event_rollup_until_changed.
event_rollup_count: 3boolean Β· optional Β· default: false
Notifies only on status transitions. Mutually exclusive with event_rollup_count.
event_rollup_until_changed: truestring Β· optional
Track specific invalid rows across runs β see whether the same rows keep failing or new ones appear.
exception_primary_key_column: order_idenum Β· optional
Accepted values: SEV-0 Β· SEV-1 Β· SEV-2 Β· SEV-3 Β· SEV-4
severity: SEV-2enum Β· optional
Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5
priority: P2array of strings Β· optional Β· default: []
Audience names linking this monitor to channels defined in Notifications as Code. In exported/rendered YAML, appears as labels.
audiences:
- data-eng-oncall
- platform-alertsarray of strings Β· optional
Separate audience for run failures. Defaults to audiences when omitted.
failure_audiences:
- data-eng-oncallboolean Β· optional
notify_run_failure: trueinteger Β· optional
timeout: 300array of objects Β· optional Β· default: []
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tag key |
value | string | no | Tag value |
tags:
- name: team
value: analytics
- name: environment
value: productionenum Β· optional
Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY
data_quality_dimension: VALIDITYstring Β· optional
Visible in the Monte Carlo UI. Not included in notifications.
notes: Owned by the analytics team. Reviewed quarterly.boolean Β· optional Β· default: false
Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β always include it if you want the monitor to stay in draft.
is_draft: truestring Β· optional
Include the UUID of an existing monitor to update it instead of creating a new one.
uuid: 0dae7702-0950-45c7-909c-7e183bddca19Deprecated fields
| Field | Use instead |
|---|---|
resource | warehouse |
domain | domains |
domain_uuids | domains |
labels | audiences |
notify_rule_run_failure | notify_run_failure |
API-only fieldsSome fields visible in the API or JSON Schema (
metadata) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.
Examples
Null check β alert when required field is null
montecarlo:
validation:
- name: orders_user_id_not_null
description: "Alert when user_id is null in orders table"
schedule:
type: fixed
interval_minutes: 720
data_source:
table: "analytics_db:public.orders"
alert_condition:
type: GROUP
operator: AND
conditions:
- type: UNARY
predicate:
name: "null"
value:
- field: user_id
domains:
- my-domainRange validation β alert when age is out of bounds
montecarlo:
validation:
- name: users_age_range
description: "age must be between 0 and 150"
schedule:
type: fixed
interval_minutes: 720
data_source:
table: "analytics_db:public.users"
alert_condition:
type: GROUP
operator: OR
conditions:
- type: UNARY
id: age_negative
predicate:
name: is_negative
value:
- field: age
- type: BINARY
id: age_too_high
predicate:
name: greater_than
left:
- field: age
right:
- literal: "150"
domains:
- my-domainPercentage threshold β tolerate small violation rates
montecarlo:
validation:
- name: orders_failed_amount_check
description: "Custom business logic check"
schedule:
type: fixed
interval_minutes: 720
data_source:
table: "analytics_db:public.orders"
alert_condition:
type: GROUP
operator: AND
conditions:
- type: SQL
id: custom_check
sql: "status = 'failed' AND amount < 0"
percentage_threshold: 5
percentage_operator: GT
priority: P3
audiences:
- data-eng-oncall
domains:
- my-domainTroubleshooting
Conditions and predicates
Conditions match INVALID data. The most common mistake is writing conditions that match valid rows. A row that matches the predicate is a breach.
| Intent | Wrong | Correct |
|---|---|---|
| Email must not be null | null with negated: true | null (no negation) |
| Status must be "active" | equal to "active" | equal to "active" with negated: true |
| Amount must be positive | is_negative with negated: true | is_negative (no negation) |
Wrong predicate names. Use the exact names from the predicates list.
| Wrong | Correct |
|---|---|
not_null | null with negated: true |
is_null | null |
gt, > | greater_than |
eq, == | equal |
regex | matches_regex |
between | Two conditions: greater_than_or_equal AND less_than_or_equal |
Schema and syntax errors
alert_conditions(plural) vsalert_condition(singular). Validation uses the singular form. The plural form causes a validation error.percentage_thresholdandpercentage_operatormust be paired. Setting one without the other causes a validation error. Both must be present or both must be omitted.- Literal on left side of BINARY. Left side must always be a field reference β you cannot put a
literal,sql, or aggregate on the left side. - SELECT in a SQL node. SQL nodes must be boolean expressions, not queries. No
SELECTkeyword. alert_conditionas a JSON string. Must be a YAML/JSON object, not a serialized string.- Size limit. The serialized condition tree must be under 200,000 characters.
Updates
- Forgetting PUT semantics on updates. When providing
uuid, omitted fields revert to defaults. Always include every field you want to preserve.
Updated about 2 hours ago
