Bulk Monitor

Apply field-level monitoring rules across many tables at once.

Overview

Apply metric monitoring rules across many tables in a single definition. Instead of one metric monitor per table, define an asset_selection that matches tables by database and schema, then use field_pattern rules to automatically target matching columns across all of them.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. Bulk monitors are the code-first equivalent of multi-table metric monitors in the UI, with additional field_pattern matching available only through MaC.

MaC key: bulk_monitor. Two sub-types: bulk_metric for field-level metric monitoring, and bulk_pii for PII detection.

Quick Start

montecarlo:
  bulk_monitor:
    - name: id_null_rate_bulk
      description: Monitor null rates on all ID columns across analytics tables
      monitor_type: bulk_metric
      asset_selection:
        databases:
          - name: ANALYTICS
            schemas:
              - CORE
      alert_conditions:
        - metric: NULL_RATE
          operator: AUTO
          field_pattern:
            operator: ENDING_WITH
            value: _ID
            field_type: NUMERIC
      schedule:
        type: fixed
        interval_minutes: 1440
      domains:
        - my-domain

Configuration

description β€” what this monitor checks

string Β· required

Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.

description: Monitor null rates on ID columns across core analytics
monitor_type β€” bulk monitor sub-type

enum Β· required

Accepted values: bulk_metric Β· bulk_pii

bulk_metric for field-level metric monitoring, bulk_pii for PII detection.

monitor_type: bulk_metric
asset_selection β€” which tables to monitor

object Β· required

Select tables by database, schema, name patterns, or tags. Same structure as the table monitor asset selection.

asset_selection:
  databases:
    - name: ANALYTICS
      schemas:
        - CORE
  filters:
    - table_tags:
        - criticality:high
      table_tags_operator: HAS_ANY

Properties


databases β€” databases to monitor

array of objects Β· required

Each entry selects a database and optionally narrows to specific schemas.

PropertyTypeRequiredDescription
namestringyesDatabase name
schemasarray of stringsnoSchemas to include. Omit to include all.
tablesobjectnoTable-level selection within the database

databases:
  - name: ANALYTICS
    schemas:
      - CORE
      - STAGING
  - name: RAW

filters β€” narrow selection to matching tables

array of objects Β· optional

Each entry matches tables by name pattern, tag, or type. The type field is auto-inferred from the other fields present.

By name pattern:

PropertyTypeRequiredDescription
table_namestringyesPattern to match against table names
table_name_operatorstringyesSTARTS_WITH Β· ENDS_WITH Β· CONTAINS Β· MATCH_PATTERN
negatedbooleannoInvert the filter. Default: false

By tag:

PropertyTypeRequiredDescription
table_tagsarray of stringsyesTag values to match
table_tags_operatorstringyesHAS_ALL Β· HAS_ANY
negatedbooleannoInvert the filter. Default: false

By type:

PropertyTypeRequiredDescription
table_typestringyesTABLE Β· VIEW Β· EXTERNAL
negatedbooleannoInvert the filter. Default: false

filters:
  - table_tags:
      - criticality:high
    table_tags_operator: HAS_ANY

exclusions β€” remove matching tables from selection

array of objects Β· optional

Same structure as filters. Tables matching any exclusion are removed even if they match a filter.

exclusions:
  - table_name: _deprecated
    table_name_operator: ENDS_WITH
alert_conditions β€” metric and field-matching rules

array of objects Β· required

Each entry defines a metric and field-matching pattern. Every alert condition in a bulk monitor must include a field_pattern. Supported alert condition types: threshold (default), noop.

Available operators: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP

Threshold types:

  • Static β€” fixed numeric value via threshold_value with an explicit operator.
  • Range β€” lower_threshold and upper_threshold with INSIDE_RANGE or OUTSIDE_RANGE.
  • Anomaly detection (AUTO) β€” ML-based. Control aggressiveness with the monitor-level sensitivity field (if supported).
alert_conditions:
  - metric: NULL_RATE
    operator: AUTO
    field_pattern:
      operator: ENDING_WITH
      value: _ID
      field_type: NUMERIC

Properties


metric β€” built-in metric name

string Β· optional (one of metric or custom_metric required)

Built-in metric name (e.g., NULL_RATE, NUMERIC_MEAN). Mutually exclusive with custom_metric.

metric: NULL_RATE

custom_metric β€” custom SQL-based metric

object Β· optional (one of metric or custom_metric required)

Mutually exclusive with metric.

PropertyTypeRequiredDescription
uuidstringnoUUID of an existing custom metric to reuse
display_namestringyesName for the metric
sql_expressionstringyesSQL expression that evaluates to a single numeric value

custom_metric:
  display_name: Active Rate
  sql_expression: "COUNT(CASE WHEN status = 'ACTIVE' THEN 1 END)::FLOAT / COUNT(*)"

field_pattern β€” pattern-based field selection

object Β· required

Selects fields dynamically by name pattern across all tables matched by asset_selection.

PropertyTypeRequiredDescription
operatorstringyesCONTAINING Β· ENDING_WITH Β· MATCHING Β· STARTING_WITH
valuestringyesPattern string to match against field names
case_sensitivebooleannoWhether the match is case-sensitive. Default: false
field_typeenumnoRestrict matching to a specific field type: BOOLEAN Β· DATE Β· NUMERIC Β· TEXT Β· TIME Β· TIME_OF_DAY

field_pattern:
  operator: ENDING_WITH
  value: _ID
  field_type: NUMERIC

fields β€” explicit column names

array of strings Β· n/a

Not usable for bulk monitors. field_pattern is required on every bulk alert condition, and the backend rejects an alert condition that sets both fields and field_pattern ("Bulk monitors match fields either by pattern or by an explicit list, not both"). Because field_pattern is mandatory, there is no valid configuration that uses fields here β€” match columns with field_pattern instead. To pin specific columns, use a metric monitor with an explicit fields list.


type β€” alert condition type

enum Β· optional Β· default: threshold

Accepted values: threshold Β· noop

Use noop to collect data without alerting.

type: threshold

operator β€” comparison operator

enum Β· optional

Accepted values: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP

AUTO uses ML anomaly detection. GT, LT, etc. require threshold_value. INSIDE_RANGE / OUTSIDE_RANGE require lower_threshold and upper_threshold.

operator: AUTO

threshold_value β€” static threshold

number Β· yes, when using explicit operators (GT, LT, etc.)

Static threshold for comparison.

threshold_value: 0.05

lower_threshold β€” lower bound for range operators

number Β· yes, for INSIDE_RANGE / OUTSIDE_RANGE

lower_threshold: 0.01

upper_threshold β€” upper bound for range operators

number Β· yes, for INSIDE_RANGE / OUTSIDE_RANGE

upper_threshold: 0.10

baseline_trailing_days β€” ML baseline window

integer Β· optional

Number of trailing days for the ML baseline window. Minimum: 1.

baseline_trailing_days: 14

baseline_start β€” fixed baseline start date

string Β· optional

ISO 8601 format.

baseline_start: "2024-01-01"

baseline_end β€” fixed baseline end date

string Β· optional

ISO 8601 format.

baseline_end: "2024-03-31"

num_bins β€” histogram bins for drift metrics

integer Β· optional

Number of histogram bins. Range: 2--1000.

num_bins: 50

id β€” stable condition identifier

string Β· optional

Preserved across updates.

id: null_rate_id_check
schedule β€” execution schedule

object Β· required

Controls when the monitor runs. Bulk monitors do not support crontab-based scheduling and require a minimum interval of 60 minutes.

schedule:
  type: fixed
  interval_minutes: 1440

Properties


type β€” schedule type

enum Β· optional Β· default: fixed

Accepted values: fixed Β· dynamic

Crontab, loose, and manual are not supported.

type: fixed

interval_minutes β€” run interval

integer Β· optional

Run interval for fixed schedules. Minimum 60 minutes.

interval_minutes: 1440

start_time β€” schedule start time

string Β· optional

ISO 8601 format.

start_time: "2024-01-01T06:00:00Z"

timezone β€” schedule timezone

string Β· optional

timezone: America/New_York

dynamic_schedule_tables β€” tables that trigger the monitor

array of strings Β· yes, when type is dynamic (unless dynamic_schedule_jobs is set)

dynamic_schedule_tables:
  - raw:ingestion.events

dynamic_schedule_jobs β€” jobs that trigger the monitor

array of objects Β· optional

PropertyTypeRequiredDescription
job_typestringyesAdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob
job_namestringyesName of the job
project_namestringyesProject or workspace containing the job
task_namestringnoSpecific task within the job
mconstringnoMCON identifier for the job

dynamic_schedule_jobs:
  - job_type: DbtJob
    job_name: nightly_build
    project_name: analytics

min_interval_minutes β€” minimum interval between dynamic runs

integer Β· optional

min_interval_minutes: 120
domains β€” domain for this monitor

array of strings (exactly one entry) Β· required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain
warehouse β€” which warehouse to use

string Β· optional (yes if multiple warehouses)

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: prod-snowflake
name β€” unique identifier within the namespace

string Β· required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one.

name: id_null_rate_monitoring
notes β€” internal notes

string Β· optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the analytics team. Reviewed quarterly.
audiences β€” notification channels

array of strings Β· optional

Audience names linking this monitor to channels defined in Notifications as Code.

audiences:
  - data-engineering-alerts
  - platform-alerts
failure_audiences β€” run-failure notification channels

array of strings Β· optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - oncall-alerts
priority β€” incident priority level

enum Β· optional

Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5

priority: P3
tags β€” key-value pairs for organizing monitors

array of objects Β· optional

PropertyTypeRequiredDescription
namestringyesTag key
valuestringnoTag value

tags:
  - name: team
    value: analytics
  - name: environment
    value: production
data_quality_dimension β€” data quality category

enum Β· optional

Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY

data_quality_dimension: COMPLETENESS
collection_lag_hours β€” lag for late-arriving data

integer Β· optional

Number of hours to lag data collection, accounting for late-arriving data.

collection_lag_hours: 6
aggregate_time_field β€” timestamp column for time bucketing

string Β· optional

Timestamp column used to bucket data by time across all matched tables. The column must exist on every matched table.

aggregate_time_field: CREATED_AT
aggregate_by β€” time bucket granularity

enum Β· optional

Accepted values: hour Β· day Β· week Β· month

aggregate_by: day
sampling_config β€” row sampling configuration

object Β· optional

Only valid when monitor_type is bulk_pii.

PropertyTypeRequiredDescription
percentagenumbernoPercentage of rows to sample (0--100)
countintegernoFixed number of rows to sample

sampling_config:
  percentage: 10
auto_prune_enabled β€” automatically remove deleted tables

boolean Β· optional Β· default: false

Automatically remove tables from monitoring when they are deleted or become inaccessible. Only valid when monitor_type is bulk_pii.

auto_prune_enabled: true
lineage_narrowing_enabled β€” narrow by lineage importance

boolean Β· optional Β· default: false

Only monitor tables that are upstream of important downstream assets. Only valid when monitor_type is bulk_pii.

lineage_narrowing_enabled: true
is_draft β€” create as draft without activating

boolean Β· optional Β· default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β€” always include it if you want the monitor to stay in draft.

is_draft: true
uuid β€” update an existing monitor

string Β· optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19
Deprecated fields
FieldUse instead
resourcewarehouse
domaindomains
domain_uuidsdomains
labelsaudiences
πŸ“˜

API-only fields

Some fields visible in the API or JSON Schema (domain_restrictions) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Examples

Null rate monitoring on all ID columns

Monitors null rates across all columns ending in _ID in the analytics core schema.

montecarlo:
  bulk_monitor:
    - name: core_id_null_rate
      description: Monitor null rates on ID columns across core analytics
      monitor_type: bulk_metric
      warehouse: prod-snowflake
      asset_selection:
        databases:
          - name: ANALYTICS
            schemas:
              - CORE
      alert_conditions:
        - metric: NULL_RATE
          operator: AUTO
          field_pattern:
            operator: ENDING_WITH
            value: _ID
            field_type: NUMERIC
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - data-engineering-alerts
      priority: P3
      data_quality_dimension: COMPLETENESS
      tags:
        - name: team
          value: analytics
      domains:
        - my-domain

PII detection across multiple schemas

Scans for PII patterns in text columns across raw data schemas.

montecarlo:
  bulk_monitor:
    - name: raw_pii_scan
      description: Scan for PII in raw ingestion tables
      monitor_type: bulk_pii
      warehouse: prod-snowflake
      asset_selection:
        databases:
          - name: RAW
            schemas:
              - INGESTION
              - API_IMPORTS
      alert_conditions:
        - metric: NULL_COUNT
          operator: GT
          threshold_value: 0
          field_pattern:
            operator: CONTAINING
            value: ""
            field_type: TEXT
      schedule:
        type: fixed
        interval_minutes: 1440
      audiences:
        - security-alerts
      priority: P1
      data_quality_dimension: VALIDITY
      domains:
        - my-domain

Dynamic scheduling with auto-pruning

Runs after table updates and automatically removes deleted tables. auto_prune_enabled and lineage_narrowing_enabled are only valid for bulk_pii monitors.

montecarlo:
  bulk_monitor:
    - name: timestamp_cols_bulk_pii
      description: Monitor freshness-sensitive columns after each load
      monitor_type: bulk_pii
      warehouse: prod-snowflake
      asset_selection:
        databases:
          - name: RAW
      auto_prune_enabled: true
      lineage_narrowing_enabled: true
      alert_conditions:
        - metric: NULL_RATE
          operator: AUTO
          field_pattern:
            operator: MATCHING
            value: ".*_TIMESTAMP"
            field_type: TIME
      schedule:
        type: dynamic
        dynamic_schedule_tables:
          - raw:ingestion.events
          - raw:ingestion.transactions
        min_interval_minutes: 120
      priority: P2
      domains:
        - my-domain

Troubleshooting

Monitor type and fields

  • Using monitor_type: metric instead of bulk_metric. The monitor_type enum values are bulk_metric and bulk_pii. Using metric (the MaC key for the single-table metric monitor) is not valid here.
  • Omitting field_pattern from alert conditions. Unlike the single-table metric monitor where you can list specific fields, bulk monitors require field_pattern on every alert condition.

Scheduling

  • Setting schedule.interval_minutes below 60. Bulk monitors enforce a minimum interval of 60 minutes. Lower values fail validation.
  • Using crontab scheduling. Bulk monitors do not support interval_crontab. Use interval_minutes for fixed schedules.
  • Using loose or manual schedule types. Bulk monitors only support fixed and dynamic schedule types.

Asset and field selection

  • Using aggregate_time_field with tables that lack the column. If any table matched by asset_selection does not have the specified timestamp column, the monitor fails on that table. Ensure the column exists across all matched tables, or split into separate monitors.
  • Wrong tags format. Tags must be objects with name and optional value keys. Writing tags: ["my-tag"] fails validation.

Updates and deprecated fields

  • Forgetting PUT semantics on updates. When updating a monitor by including uuid, every field you omit reverts to its default β€” it is not left unchanged. Always specify the complete desired configuration.
  • Using domain instead of domains. The singular domain field is deprecated. Use domains (plural) in all new configurations.