Validation Monitor

Row-level data quality assertions that flag invalid rows.

Overview

Define row-level data quality rules and get alerted when rows break them. You write an alert_condition β€” a predicate tree β€” that describes what invalid data looks like. Rows that match are breaches; if any exist (or exceed a percentage threshold), an incident is created.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For available predicates and UI setup, see Validation Monitors and Available Conditions.

🚧

Conditions define INVALID data.

The single most common mistake: writing conditions that match valid rows. A validation condition describes what's wrong β€” rows that match are flagged as breaches.

Predicate types: UNARY (single-field checks like null), BINARY (field-to-value comparisons like greater_than), SQL (free-form boolean expressions), and GROUP (AND/OR combinators). Any predicate can be inverted with negated: true.

For aggregate metrics, use a Metric Monitor. For arbitrary SQL returning a single number, use a Custom SQL Monitor.

Quick Start

montecarlo:
  validation:
    - name: orders_user_id_not_null
      description: "Alert when user_id is null in orders table"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "my_database:my_schema.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: user_id
      domains:
        - my-domain

Create interactively with create_or_update_validation_monitor(dry_run=True) via the Monte Carlo MCP server. Use get_validation_predicates(warehouse=<uuid>) to discover available predicates for your warehouse.

Configuration

schedule β€” when and how often to run

object Β· required

Supported modes: fixed, dynamic, manual. Crontab (interval_crontab) is supported.

Properties


type β€” schedule type

enum Β· optional Β· default: fixed

Accepted values: fixed Β· dynamic Β· manual

schedule:
  type: fixed

interval_minutes β€” run interval for fixed schedules

integer Β· optional

Required when type is fixed.

schedule:
  type: fixed
  interval_minutes: 720

interval_crontab β€” cron expressions

array of strings Β· optional

5-field cron format.

schedule:
  type: fixed
  interval_crontab:
    - "0 8 * * *"

interval_crontab_day_operator β€” day-of-week/day-of-month combination

enum Β· optional

Accepted values: AND Β· OR

schedule:
  interval_crontab_day_operator: AND

start_time β€” ISO 8601 start time

string Β· optional

schedule:
  start_time: "2024-01-01T08:00:00Z"

timezone β€” IANA timezone

string Β· optional

schedule:
  timezone: America/Los_Angeles

dynamic_schedule_tables β€” tables that trigger the monitor

array of strings Β· optional (required when type is dynamic)

schedule:
  type: dynamic
  dynamic_schedule_tables:
    - analytics:public.orders

dynamic_schedule_jobs β€” jobs that trigger the monitor

array of objects Β· optional

PropertyTypeRequiredDescription
job_typeenumyesAirflowDag Β· DatabricksJob Β· AdfJob Β· DbtJob
job_namestringyesName of the job
project_namestringyesProject or resource name
task_namestringnoTask within the job
mconstringnoMCON identifier

schedule:
  type: dynamic
  dynamic_schedule_jobs:
    - job_type: AirflowDag
      job_name: etl_orders
      project_name: my-airflow

min_interval_minutes β€” minimum interval for dynamic schedules

integer Β· optional

schedule:
  type: dynamic
  min_interval_minutes: 30
schedule:
  type: fixed
  interval_minutes: 720
data_source β€” the table or SQL query to validate

object Β· required

Provide either table or sql, not both.

Properties


table β€” fully qualified table name

string Β· required (if sql is not provided)

Format: database:schema.table.

data_source:
  table: "analytics_db:public.orders"

sql β€” SQL query returning rows to validate

string Β· required (if table is not provided)

data_source:
  sql: |
    SELECT * FROM orders
    WHERE created_at >= CURRENT_DATE - 1

transforms β€” AI-powered field transforms

array of objects Β· optional

See data_source.transforms in the Metric Monitor reference.

data_source:
  table: "analytics_db:public.orders"
  transforms:
    - field: status
      transform: normalize
data_source:
  table: "analytics_db:public.orders"
alert_condition β€” predicate tree defining invalid rows

object Β· required

A predicate tree. The root node must be a GROUP. Each node is one of four types: GROUP, UNARY, BINARY, or SQL. Conditions match invalid data β€” a row that matches is a breach. Validation monitors do not use operators or threshold types β€” alerting is determined by the predicate tree and optional percentage_threshold/percentage_operator fields.

Noise reduction: Set event_rollup_count (min 2) or event_rollup_until_changed at the monitor level to suppress notifications until N consecutive breaches or until the result changes. These two fields are mutually exclusive.

🚧

alert_condition is singular.

Other monitor types use alert_conditions (plural, an array). Validation monitors use alert_condition (singular, a single object). The plural form causes a validation error.

Properties


GROUP node β€” combine conditions with AND/OR logic

The root alert_condition must be a GROUP. Groups can be nested for complex logic.

PropertyTypeRequiredDescription
typestringyesGROUP
operatorstringnoAND (default) or OR
conditionsarrayyesChild nodes (GROUP, UNARY, BINARY, or SQL)

alert_condition:
  type: GROUP
  operator: AND
  conditions:
    - type: UNARY
      predicate:
        name: "null"
      value:
        - field: user_id

UNARY predicate β€” single-field check with no comparison value

PropertyTypeRequiredDescription
typestringyesUNARY
predicateobjectyesname (string) and optional negated (boolean)
valuearrayyesField references: - field: <column_name>

Available UNARY predicates by field type:

Field TypePredicates
Numericnull, is_zero, is_negative, is_between_0_and_1, is_between_0_and_100, is_nan
Stringnull, empty_string, all_space, containing_spaces, null_string, is_uuid, email, true_string, false_string, timestamp_iso_8601, timestamp_mm-dd-yyyy, timestamp_dd-mm-yyyy, timestamp_mm/dd/yyyy, timestamp_dd/mm/yyyy, us_zip_code, us_ssn, us_state_code, us_phone_number, can_sin, ca_postal_code, fr_insee_code, fr_postal_code, de_postal_code, de_tax_id, ie_ppsn, ie_postal_code, it_postal_code, it_fiscal_code, es_dni, es_postal_code, uk_postal_code, uk_nino, nl_postal_code, nl_bsn, tr_postal_code, tr_id_no, ch_oasi, ch_postal_code, pl_postal_code, pl_pesel, aus_postal_code, aus_state_code
Booleannull
Datenull, in_past, in_future, today, yesterday, in_past_7_days, in_past_30_days, in_past_365_days, in_past_calendar_month, in_past_calendar_week, weekday, sunday through saturday
Timestampnull, in_past, in_future, in_past_60_minutes, in_past_24_hours, in_past_7_days, in_past_30_days, in_past_365_days, in_past_calendar_month, in_past_calendar_week, today, yesterday, weekday, sunday through saturday

πŸ“˜

Predicate availability varies by warehouse

Run get_validation_predicates to get the current list for your warehouse.

- type: UNARY
  predicate:
    name: "null"
  value:
    - field: email

BINARY predicate β€” field-to-value comparison

PropertyTypeRequiredDescription
typestringyesBINARY
predicateobjectyesname (string) and optional negated (boolean)
leftarrayyesField references: - field: <column_name>
rightarrayyesOne of: literal, field, or sql. See right-side value types below.

Available BINARY predicates by field type:

Field TypePredicates
Numericequal, greater_than, greater_than_or_equal, less_than, less_than_or_equal, in_set
Stringequal, contains, starts_with, ends_with, matches_regex, in_set
Booleanequal
Date / Timestampequal, greater_than, greater_than_or_equal, less_than, less_than_or_equal

Right-side value types:

Value TypeSyntaxDescription
literal- literal: "value"A static value. Always pass as a string, even for numbers (e.g., - literal: "150", not - literal: 150).
field- field: column_nameReference to another column. Only with date/timestamp left fields or in_set.
sql- sql: "expression"A scalar SQL expression or subquery (must not start with SELECT β€” use parentheses).

🚧

Left side must always be a field reference.

You cannot put a literal, sql, or aggregate on the left side.

- type: BINARY
  predicate:
    name: greater_than
  left:
    - field: age
  right:
    - literal: "150"

SQL predicate β€” free-form SQL boolean expression

PropertyTypeRequiredDescription
typestringyesSQL
sqlstringyesA SQL boolean expression (e.g. amount > 0 AND amount < 10000). Not a query β€” no SELECT.

- type: SQL
  sql: "status = 'failed' AND amount < 0"

Negation β€” invert any UNARY or BINARY predicate

Any UNARY or BINARY predicate can be inverted with negated: true in the predicate object.

# Alert when status is NOT in the allowed set
- type: BINARY
  predicate:
    name: in_set
    negated: true
  left:
    - field: status
  right:
    - literal: "active"
    - literal: "inactive"
    - literal: "pending"
alert_condition:
  type: GROUP
  operator: AND
  conditions:
    - type: UNARY
      predicate:
        name: "null"
      value:
        - field: user_id
description β€” what this monitor checks

string Β· required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: "Alert when user_id is null in orders table"
domains β€” domain for this monitor

array of strings (exactly one entry) Β· required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain
percentage_threshold β€” invalid row percentage before alerting

number Β· optional

Alert only when the percentage of invalid rows exceeds this value. Minimum: 0. When omitted, the monitor alerts on any matching row. Must be paired with percentage_operator β€” setting one without the other causes a validation error.

percentage_threshold: 5
percentage_operator: GT
percentage_operator β€” operator for percentage threshold

enum Β· optional

Accepted values: EQ Β· GT Β· GTE Β· LT Β· LTE

Must be paired with percentage_threshold β€” setting one without the other causes a validation error.

percentage_threshold: 1.0
percentage_operator: GT
warehouse β€” which warehouse to use

string Β· optional (yes if multiple warehouses)

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: my-snowflake
name β€” unique identifier within the namespace

string Β· required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β€” incident history does not transfer.

name: null_user_id_check
connection_name β€” named connection

string Β· optional

Overrides the default connection for this warehouse.

connection_name: analytics-readonly
event_rollup_count β€” consecutive breaches before alerting

integer Β· optional

Minimum 2. Mutually exclusive with event_rollup_until_changed.

event_rollup_count: 3
event_rollup_until_changed β€” suppress repeat notifications

boolean Β· optional Β· default: false

Notifies only on status transitions. Mutually exclusive with event_rollup_count.

event_rollup_until_changed: true
exception_primary_key_column β€” column for exception tracking

string Β· optional

Track specific invalid rows across runs β€” see whether the same rows keep failing or new ones appear.

exception_primary_key_column: order_id
severity β€” incident severity level

enum Β· optional

Accepted values: SEV-0 Β· SEV-1 Β· SEV-2 Β· SEV-3 Β· SEV-4

severity: SEV-2
priority β€” incident priority level

enum Β· optional

Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5

priority: P2
audiences β€” notification channels

array of strings Β· optional Β· default: []

Audience names linking this monitor to channels defined in Notifications as Code. In exported/rendered YAML, appears as labels.

audiences:
  - data-eng-oncall
  - platform-alerts
failure_audiences β€” notification channels for run failures

array of strings Β· optional

Separate audience for run failures. Defaults to audiences when omitted.

failure_audiences:
  - data-eng-oncall
notify_run_failure β€” notify on run failures

boolean Β· optional

notify_run_failure: true
timeout β€” query timeout in seconds

integer Β· optional

timeout: 300
tags β€” key-value pairs for organizing monitors

array of objects Β· optional Β· default: []

PropertyTypeRequiredDescription
namestringyesTag key
valuestringnoTag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production
data_quality_dimension β€” data quality category

enum Β· optional

Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY

data_quality_dimension: VALIDITY
notes β€” internal notes

string Β· optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the analytics team. Reviewed quarterly.
is_draft β€” create as draft without activating

boolean Β· optional Β· default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β€” always include it if you want the monitor to stay in draft.

is_draft: true
uuid β€” update an existing monitor

string Β· optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19
Deprecated fields
FieldUse instead
resourcewarehouse
domaindomains
domain_uuidsdomains
labelsaudiences
notify_rule_run_failurenotify_run_failure
πŸ“˜

API-only fields

Some fields visible in the API or JSON Schema (metadata) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Examples

Null check β€” alert when required field is null

montecarlo:
  validation:
    - name: orders_user_id_not_null
      description: "Alert when user_id is null in orders table"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: UNARY
            predicate:
              name: "null"
            value:
              - field: user_id
      domains:
        - my-domain

Range validation β€” alert when age is out of bounds

montecarlo:
  validation:
    - name: users_age_range
      description: "age must be between 0 and 150"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.users"
      alert_condition:
        type: GROUP
        operator: OR
        conditions:
          - type: UNARY
            id: age_negative
            predicate:
              name: is_negative
            value:
              - field: age
          - type: BINARY
            id: age_too_high
            predicate:
              name: greater_than
            left:
              - field: age
            right:
              - literal: "150"
      domains:
        - my-domain

Percentage threshold β€” tolerate small violation rates

montecarlo:
  validation:
    - name: orders_failed_amount_check
      description: "Custom business logic check"
      schedule:
        type: fixed
        interval_minutes: 720
      data_source:
        table: "analytics_db:public.orders"
      alert_condition:
        type: GROUP
        operator: AND
        conditions:
          - type: SQL
            id: custom_check
            sql: "status = 'failed' AND amount < 0"
      percentage_threshold: 5
      percentage_operator: GT
      priority: P3
      audiences:
        - data-eng-oncall
      domains:
        - my-domain

Troubleshooting

Conditions and predicates

Conditions match INVALID data. The most common mistake is writing conditions that match valid rows. A row that matches the predicate is a breach.

IntentWrongCorrect
Email must not be nullnull with negated: truenull (no negation)
Status must be "active"equal to "active"equal to "active" with negated: true
Amount must be positiveis_negative with negated: trueis_negative (no negation)

Wrong predicate names. Use the exact names from the predicates list.

WrongCorrect
not_nullnull with negated: true
is_nullnull
gt, >greater_than
eq, ==equal
regexmatches_regex
betweenTwo conditions: greater_than_or_equal AND less_than_or_equal

Schema and syntax errors

  • alert_conditions (plural) vs alert_condition (singular). Validation uses the singular form. The plural form causes a validation error.
  • percentage_threshold and percentage_operator must be paired. Setting one without the other causes a validation error. Both must be present or both must be omitted.
  • Literal on left side of BINARY. Left side must always be a field reference β€” you cannot put a literal, sql, or aggregate on the left side.
  • SELECT in a SQL node. SQL nodes must be boolean expressions, not queries. No SELECT keyword.
  • alert_condition as a JSON string. Must be a YAML/JSON object, not a serialized string.
  • Size limit. The serialized condition tree must be under 200,000 characters.

Updates

  • Forgetting PUT semantics on updates. When providing uuid, omitted fields revert to defaults. Always include every field you want to preserve.