Validation Monitor

Row-level data quality assertions that flag invalid rows.

Overview

Define row-level data quality rules and get alerted when rows break them. You write an alert_condition — a predicate tree — that describes what invalid data looks like. Rows that match are breaches; if any exist (or exceed a percentage threshold), an incident is created.

📘
Reference scope
This page covers MaC YAML configuration. For available predicates and UI setup, see Validation Monitors and Available Conditions.

🚧
Conditions define INVALID data.
The single most common mistake: writing conditions that match valid rows. A validation condition describes what's wrong — rows that match are flagged as breaches.

Predicate types: UNARY (single-field checks like null), BINARY (field-to-value comparisons like greater_than), SQL (free-form boolean expressions), and GROUP (AND/OR combinators). Any predicate can be inverted with negated: true.

For aggregate metrics, use a Metric Monitor. For arbitrary SQL returning a single number, use a Custom SQL Monitor.

Quick Start

montecarlo:
  validation:
    - name: orders_user_id_not_null
      description: "Alert when user_id is null in orders table"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "my_database:my_schema.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: user_id
      domains:
        - my-domain

Create interactively with create_or_update_validation_monitor(dry_run=True) via the Monte Carlo MCP server. Use get_validation_predicates(warehouse=<uuid>) to discover available predicates for your warehouse.

Configuration

schedule — when and how often to run

object · required

Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported.

Properties

type — schedule type

enum · optional · default: fixed

Accepted values: fixed · dynamic · manual

schedule:
  type: fixed

interval_minutes — run interval for fixed schedules

integer · optional

Required when type is fixed.

schedule:
  type: fixed
  interval_minutes: 720

interval_crontab — cron expressions

array of strings · optional

5-field cron format.

schedule:
  type: fixed
  interval_crontab:
    - "0 8 * * *"

interval_crontab_day_operator — day-of-week/day-of-month combination

enum · optional

Accepted values: AND · OR

schedule:
  interval_crontab_day_operator: AND

start_time — ISO 8601 start time

string · optional

schedule:
  start_time: "2024-01-01T08:00:00Z"

timezone — IANA timezone

string · optional

schedule:
  timezone: America/Los_Angeles

dynamic_schedule_tables — tables that trigger the monitor

array of strings · optional (required when type is dynamic)

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:public.orders

dynamic_schedule_jobs — jobs that trigger the monitor

array of objects · optional

Property	Type	Required	Description
`job_type`	enum	yes	`AirflowDag` · `DatabricksJob` · `AdfJob` · `DbtJob`
`job_name`	string	yes	Name of the job
`project_name`	string	yes	Project or resource name
`task_name`	string	no	Task within the job
`mcon`	string	no	MCON identifier

schedule:
  type: dynamic
  dynamic_schedule_jobs:
    - job_type: AirflowDag
      job_name: etl_orders
      project_name: my-airflow

min_interval_minutes — minimum interval for dynamic schedules

integer · optional

schedule:
  type: dynamic
  min_interval_minutes: 30

schedule:
  type: fixed
  interval_minutes: 720

data_source — the table or SQL query to validate

object · required

Provide either table or sql, not both.

Properties

table — fully qualified table name

string · required (if sql is not provided)

Format: database:schema.table.

data_source:
  table: "analytics_db:public.orders"

sql — SQL query returning rows to validate

string · required (if table is not provided)

data_source:
  sql: |
    SELECT * FROM orders
    WHERE created_at >= CURRENT_DATE - 1

transforms — AI-powered field transforms

array of objects · optional

See data_source.transforms in the Metric Monitor reference.

data_source:
  table: "analytics_db:public.orders"
  transforms:
    - field: status
      transform: normalize

data_source:
  table: "analytics_db:public.orders"

alert_condition — predicate tree defining invalid rows

object · required

A predicate tree. The root node must be a GROUP. Each node is one of four types: GROUP, UNARY, BINARY, or SQL. Conditions match invalid data — a row that matches is a breach. Validation monitors do not use operators or threshold types — alerting is determined by the predicate tree and optional percentage_threshold/percentage_operator fields.

Noise reduction: Set event_rollup_count (min 2), event_rollup_until_changed, or alert_grouping at the monitor level to suppress notifications for subsequent breaches. These fields are mutually exclusive — only one can be set at a time.

🚧
alert_condition is singular.
Other monitor types use alert_conditions (plural, an array). Validation monitors use alert_condition (singular, a single object). The plural form causes a validation error.

Properties

GROUP node — combine conditions with AND/OR logic

The root alert_condition must be a GROUP. Groups can be nested for complex logic.

Property	Type	Required	Description
`type`	string	yes	`GROUP`
`operator`	string	no	`AND` (default) or `OR`
`conditions`	array	yes	Child nodes (GROUP, UNARY, BINARY, or SQL)

alert_condition:
  type: GROUP
  operator: AND
  conditions:
    - type: UNARY
      predicate:
        name: "null"
      value:
        - field: user_id

UNARY predicate — single-field check with no comparison value

Property	Type	Required	Description
`type`	string	yes	`UNARY`
`predicate`	object	yes	`name` (string) and optional `negated` (boolean)
`value`	array	yes	Field references: `- field: <column_name>`

Available UNARY predicates by field type:

Field Type	Predicates
Numeric	`null`, `is_zero`, `is_negative`, `is_between_0_and_1`, `is_between_0_and_100`, `is_nan`
String	`null`, `empty_string`, `all_space`, `containing_spaces`, `null_string`, `is_uuid`, `email`, `true_string`, `false_string`, `timestamp_iso_8601`, `timestamp_mm-dd-yyyy`, `timestamp_dd-mm-yyyy`, `timestamp_mm/dd/yyyy`, `timestamp_dd/mm/yyyy`, `us_zip_code`, `us_ssn`, `us_state_code`, `us_phone_number`, `can_sin`, `ca_postal_code`, `fr_insee_code`, `fr_postal_code`, `de_postal_code`, `de_tax_id`, `ie_ppsn`, `ie_postal_code`, `it_postal_code`, `it_fiscal_code`, `es_dni`, `es_postal_code`, `uk_postal_code`, `uk_nino`, `nl_postal_code`, `nl_bsn`, `tr_postal_code`, `tr_id_no`, `ch_oasi`, `ch_postal_code`, `pl_postal_code`, `pl_pesel`, `aus_postal_code`, `aus_state_code`
Boolean	`null`
Date	`null`, `in_past`, `in_future`, `today`, `yesterday`, `in_past_7_days`, `in_past_30_days`, `in_past_365_days`, `in_past_calendar_month`, `in_past_calendar_week`, `weekday`, `sunday` through `saturday`
Timestamp	`null`, `in_past`, `in_future`, `in_past_60_minutes`, `in_past_24_hours`, `in_past_7_days`, `in_past_30_days`, `in_past_365_days`, `in_past_calendar_month`, `in_past_calendar_week`, `today`, `yesterday`, `weekday`, `sunday` through `saturday`

📘
Predicate availability varies by warehouse
Run get_validation_predicates to get the current list for your warehouse.

- type: UNARY
  predicate:
    name: "null"
  value:
    - field: email

BINARY predicate — field-to-value comparison

Property	Type	Required	Description
`type`	string	yes	`BINARY`
`predicate`	object	yes	`name` (string) and optional `negated` (boolean)
`left`	array	yes	Field references: `- field: <column_name>`
`right`	array	yes	One of: `literal`, `field`, or `sql`. See right-side value types below.

Available BINARY predicates by field type:

Field Type	Predicates
Numeric	`equal`, `greater_than`, `greater_than_or_equal`, `less_than`, `less_than_or_equal`, `in_set`
String	`equal`, `contains`, `starts_with`, `ends_with`, `matches_regex`, `in_set`
Boolean	`equal`
Date / Timestamp	`equal`, `greater_than`, `greater_than_or_equal`, `less_than`, `less_than_or_equal`

Right-side value types:

Value Type	Syntax	Description
`literal`	`- literal: "value"`	A static value. Always pass as a string, even for numbers (e.g., `- literal: "150"`, not `- literal: 150`).
`field`	`- field: column_name`	Reference to another column. Only with date/timestamp left fields or `in_set`.
`sql`	`- sql: "expression"`	A scalar SQL expression or subquery (must not start with `SELECT` — use parentheses).

🚧
Left side must always be a field reference.
You cannot put a literal, sql, or aggregate on the left side.

- type: BINARY
  predicate:
    name: greater_than
  left:
    - field: age
  right:
    - literal: "150"

SQL predicate — free-form SQL boolean expression

Property	Type	Required	Description
`type`	string	yes	`SQL`
`sql`	string	yes	A SQL boolean expression (e.g. `amount > 0 AND amount < 10000`). Not a query — no `SELECT`.

- type: SQL
  sql: "status = 'failed' AND amount < 0"

Negation — invert any UNARY or BINARY predicate

Any UNARY or BINARY predicate can be inverted with negated: true in the predicate object.

# Alert when status is NOT in the allowed set
- type: BINARY
  predicate:
    name: in_set
    negated: true
  left:
    - field: status
  right:
    - literal: "active"
    - literal: "inactive"
    - literal: "pending"

alert_condition:
  type: GROUP
  operator: AND
  conditions:
    - type: UNARY
      predicate:
        name: "null"
      value:
        - field: user_id

description — what this monitor checks

string · required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: "Alert when user_id is null in orders table"

domains — domain for this monitor

array of strings (exactly one entry) · required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain

percentage_threshold — invalid row percentage before alerting

number · optional

Alert only when the percentage of invalid rows exceeds this value. Minimum: 0. When omitted, the monitor alerts on any matching row. Must be paired with percentage_operator — setting one without the other causes a validation error.

percentage_threshold: 5
percentage_operator: GT

percentage_operator — operator for percentage threshold

enum · optional

Accepted values: EQ · GT · GTE · LT · LTE

Must be paired with percentage_threshold — setting one without the other causes a validation error.

percentage_threshold: 1.0
percentage_operator: GT

warehouse — which warehouse to use

string · optional (yes if multiple warehouses)

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake

name — unique identifier within the namespace

string · required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one — incident history does not transfer.

name: null_user_id_check

connection_name — named connection

string · optional

Overrides the default connection for this warehouse.

connection_name: analytics-readonly

event_rollup_count — consecutive breaches before alerting

integer · optional

Minimum 2. Mutually exclusive with event_rollup_until_changed and alert_grouping.

event_rollup_count: 3

event_rollup_until_changed — suppress repeat notifications

boolean · optional · default: false

Notifies only on status transitions. Mutually exclusive with event_rollup_count and alert_grouping.

event_rollup_until_changed: true

alert_grouping — group subsequent breaches into the currently open alert

object · optional · default: no grouping

Groups subsequent breaches into the currently open alert rather than creating a new one, for as long as the alert remains open and unresolved. Mutually exclusive with event_rollup_count and event_rollup_until_changed.

mode
Accepted values: group_into_open_alert

alert_grouping:
  mode: group_into_open_alert

exception_primary_key_column — column for exception tracking

string · optional

Track specific invalid rows across runs — see whether the same rows keep failing or new ones appear.

exception_primary_key_column: order_id

severity — incident severity level

enum · optional

Accepted values: SEV-0 · SEV-1 · SEV-2 · SEV-3 · SEV-4

severity: SEV-2

priority — incident priority level

enum · optional

Accepted values: P1 · P2 · P3 · P4 · P5

priority: P2

audiences — notification channels

array of strings · optional · default: []

Audience names linking this monitor to channels defined in Notifications as Code. In exported/rendered YAML, appears as labels.

audiences:
  - data-eng-oncall
  - platform-alerts

failure_audiences — notification channels for run failures

array of strings · optional

Separate audience for run failures. Defaults to audiences when omitted.

failure_audiences:
  - data-eng-oncall

notify_run_failure — notify on run failures

boolean · optional

notify_run_failure: true

timeout — query timeout in seconds

integer · optional

timeout: 300

tags — key-value pairs for organizing monitors

array of objects · optional · default: []

Property	Type	Required	Description
`name`	string	yes	Tag key
`value`	string	no	Tag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production

data_quality_dimension — data quality category

enum · optional

Accepted values: ACCURACY · COMPLETENESS · CONSISTENCY · TIMELINESS · UNIQUENESS · VALIDITY

data_quality_dimension: VALIDITY

notes — internal notes

string · optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the analytics team. Reviewed quarterly.

is_draft — create as draft without activating

boolean · optional · default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics — always include it if you want the monitor to stay in draft.

is_draft: true

uuid — update an existing monitor

string · optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19

Deprecated fields

Field	Use instead
`resource`	`warehouse`
`domain`	`domains`
`domain_uuids`	`domains`
`labels`	`audiences`
`notify_rule_run_failure`	`notify_run_failure`

📘
API-only fields
Some fields visible in the API or JSON Schema (metadata) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Examples

Null check — alert when required field is null

montecarlo:
  validation:
    - name: orders_user_id_not_null
      description: "Alert when user_id is null in orders table"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: user_id
      domains:
        - my-domain

Range validation — alert when age is out of bounds

montecarlo:
  validation:
    - name: users_age_range
      description: "age must be between 0 and 150"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.users"
      alert_condition:
        type: GROUP
        operator: OR
        conditions:
          - type: UNARY
            id: age_negative
            predicate:
              name: is_negative
            value:
              - field: age
          - type: BINARY
            id: age_too_high
            predicate:
              name: greater_than
            left:
              - field: age
            right:
              - literal: "150"
      domains:
        - my-domain

Percentage threshold — tolerate small violation rates

montecarlo:
  validation:
    - name: orders_failed_amount_check
      description: "Custom business logic check"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: SQL
            id: custom_check
            sql: "status = 'failed' AND amount < 0"
      percentage_threshold: 5
      percentage_operator: GT
      priority: P3
      audiences:
        - data-eng-oncall
      domains:
        - my-domain

Troubleshooting

Conditions and predicates

Conditions match INVALID data. The most common mistake is writing conditions that match valid rows. A row that matches the predicate is a breach.

Intent	Wrong	Correct
Email must not be null	`null` with `negated: true`	`null` (no negation)
Status must be "active"	`equal` to "active"	`equal` to "active" with `negated: true`
Amount must be positive	`is_negative` with `negated: true`	`is_negative` (no negation)

Wrong predicate names. Use the exact names from the predicates list.

Wrong	Correct
`not_null`	`null` with `negated: true`
`is_null`	`null`
`gt`, `>`	`greater_than`
`eq`, `==`	`equal`
`regex`	`matches_regex`
`between`	Two conditions: `greater_than_or_equal` AND `less_than_or_equal`

Schema and syntax errors

alert_conditions (plural) vs alert_condition (singular). Validation uses the singular form. The plural form causes a validation error.
percentage_threshold and percentage_operator must be paired. Setting one without the other causes a validation error. Both must be present or both must be omitted.
Literal on left side of BINARY. Left side must always be a field reference — you cannot put a literal, sql, or aggregate on the left side.
SELECT in a SQL node. SQL nodes must be boolean expressions, not queries. No SELECT keyword.
alert_condition as a JSON string. Must be a YAML/JSON object, not a serialized string.
Size limit. The serialized condition tree must be under 200,000 characters.

Noise reduction and updates

event_rollup_count , event_rollup_until_changed , and alert grouping are mutually exclusive. Setting more than one causes a validation error. Choose one strategy.

Forgetting PUT semantics on updates. When providing uuid, omitted fields revert to defaults. Always include every field you want to preserve.

Updated about 1 month ago

Did this page help you?

Overview

Reference scope

Conditions define INVALID data.

Quick Start

Configuration

alert_condition is singular.

Predicate availability varies by warehouse

Left side must always be a field reference.

API-only fields

Examples

Null check — alert when required field is null

Range validation — alert when age is out of bounds

Percentage threshold — tolerate small violation rates

Troubleshooting

Conditions and predicates

Schema and syntax errors

Noise reduction and updates

`alert_condition` is singular.