Bulk Monitor
Apply field-level monitoring rules across many tables at once.
Overview
Apply metric monitoring rules across many tables in a single definition. Instead of one metric monitor per table, define an asset_selection that matches tables by database and schema, then use field_pattern rules to automatically target matching columns across all of them.
Reference scopeThis page covers MaC YAML configuration. Bulk monitors are the code-first equivalent of multi-table metric monitors in the UI, with additional
field_patternmatching available only through MaC.
MaC key: bulk_monitor. Two sub-types: bulk_metric for field-level metric monitoring, and bulk_pii for PII detection.
Quick Start
montecarlo:
bulk_monitor:
- name: id_null_rate_bulk
description: Monitor null rates on all ID columns across analytics tables
monitor_type: bulk_metric
asset_selection:
databases:
- name: ANALYTICS
schemas:
- CORE
alert_conditions:
- metric: NULL_RATE
operator: AUTO
field_pattern:
operator: ENDING_WITH
value: _ID
field_type: NUMERIC
schedule:
type: fixed
interval_minutes: 1440
domains:
- my-domainConfiguration
string Β· required
Displayed in the Monte Carlo UI and in incident notifications. Max 512 characters.
description: Monitor null rates on ID columns across core analyticsenum Β· required
Accepted values: bulk_metric Β· bulk_pii
bulk_metric for field-level metric monitoring, bulk_pii for PII detection.
monitor_type: bulk_metricobject Β· required
Select tables by database, schema, name patterns, or tags. Same structure as the table monitor asset selection.
asset_selection:
databases:
- name: ANALYTICS
schemas:
- CORE
filters:
- table_tags:
- criticality:high
table_tags_operator: HAS_ANYProperties
databases β databases to monitor
array of objects Β· required
Each entry selects a database and optionally narrows to specific schemas.
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Database name |
schemas | array of strings | no | Schemas to include. Omit to include all. |
tables | object | no | Table-level selection within the database |
databases:
- name: ANALYTICS
schemas:
- CORE
- STAGING
- name: RAWfilters β narrow selection to matching tables
array of objects Β· optional
Each entry matches tables by name pattern, tag, or type. The type field is auto-inferred from the other fields present.
By name pattern:
| Property | Type | Required | Description |
|---|---|---|---|
table_name | string | yes | Pattern to match against table names |
table_name_operator | string | yes | STARTS_WITH Β· ENDS_WITH Β· CONTAINS Β· MATCH_PATTERN |
negated | boolean | no | Invert the filter. Default: false |
By tag:
| Property | Type | Required | Description |
|---|---|---|---|
table_tags | array of strings | yes | Tag values to match |
table_tags_operator | string | yes | HAS_ALL Β· HAS_ANY |
negated | boolean | no | Invert the filter. Default: false |
By type:
| Property | Type | Required | Description |
|---|---|---|---|
table_type | string | yes | TABLE Β· VIEW Β· EXTERNAL |
negated | boolean | no | Invert the filter. Default: false |
filters:
- table_tags:
- criticality:high
table_tags_operator: HAS_ANYexclusions β remove matching tables from selection
array of objects Β· optional
Same structure as filters. Tables matching any exclusion are removed even if they match a filter.
exclusions:
- table_name: _deprecated
table_name_operator: ENDS_WITHarray of objects Β· required
Each entry defines a metric and field-matching pattern. Every alert condition in a bulk monitor must include a field_pattern. Supported alert condition types: threshold (default), noop.
Available operators: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP
Threshold types:
- Static β fixed numeric value via
threshold_valuewith an explicit operator. - Range β
lower_thresholdandupper_thresholdwithINSIDE_RANGEorOUTSIDE_RANGE. - Anomaly detection (AUTO) β ML-based. Control aggressiveness with the monitor-level
sensitivityfield (if supported).
alert_conditions:
- metric: NULL_RATE
operator: AUTO
field_pattern:
operator: ENDING_WITH
value: _ID
field_type: NUMERICProperties
metric β built-in metric name
string Β· optional (one of metric or custom_metric required)
Built-in metric name (e.g., NULL_RATE, NUMERIC_MEAN). Mutually exclusive with custom_metric.
metric: NULL_RATEcustom_metric β custom SQL-based metric
object Β· optional (one of metric or custom_metric required)
Mutually exclusive with metric.
| Property | Type | Required | Description |
|---|---|---|---|
uuid | string | no | UUID of an existing custom metric to reuse |
display_name | string | yes | Name for the metric |
sql_expression | string | yes | SQL expression that evaluates to a single numeric value |
custom_metric:
display_name: Active Rate
sql_expression: "COUNT(CASE WHEN status = 'ACTIVE' THEN 1 END)::FLOAT / COUNT(*)"field_pattern β pattern-based field selection
object Β· required
Selects fields dynamically by name pattern across all tables matched by asset_selection.
| Property | Type | Required | Description |
|---|---|---|---|
operator | string | yes | CONTAINING Β· ENDING_WITH Β· MATCHING Β· STARTING_WITH |
value | string | yes | Pattern string to match against field names |
case_sensitive | boolean | no | Whether the match is case-sensitive. Default: false |
field_type | enum | no | Restrict matching to a specific field type: BOOLEAN Β· DATE Β· NUMERIC Β· TEXT Β· TIME Β· TIME_OF_DAY |
field_pattern:
operator: ENDING_WITH
value: _ID
field_type: NUMERICfields β explicit column names
array of strings Β· n/a
Not usable for bulk monitors. field_pattern is required on every bulk alert condition, and the backend rejects an alert condition that sets both fields and field_pattern ("Bulk monitors match fields either by pattern or by an explicit list, not both"). Because field_pattern is mandatory, there is no valid configuration that uses fields here β match columns with field_pattern instead. To pin specific columns, use a metric monitor with an explicit fields list.
type β alert condition type
enum Β· optional Β· default: threshold
Accepted values: threshold Β· noop
Use noop to collect data without alerting.
type: thresholdoperator β comparison operator
enum Β· optional
Accepted values: AUTO Β· AUTO_HIGH Β· AUTO_LOW Β· GT Β· GTE Β· LT Β· LTE Β· EQ Β· NEQ Β· INSIDE_RANGE Β· OUTSIDE_RANGE Β· NOOP
AUTO uses ML anomaly detection. GT, LT, etc. require threshold_value. INSIDE_RANGE / OUTSIDE_RANGE require lower_threshold and upper_threshold.
operator: AUTOthreshold_value β static threshold
number Β· yes, when using explicit operators (GT, LT, etc.)
Static threshold for comparison.
threshold_value: 0.05lower_threshold β lower bound for range operators
number Β· yes, for INSIDE_RANGE / OUTSIDE_RANGE
lower_threshold: 0.01upper_threshold β upper bound for range operators
number Β· yes, for INSIDE_RANGE / OUTSIDE_RANGE
upper_threshold: 0.10baseline_trailing_days β ML baseline window
integer Β· optional
Number of trailing days for the ML baseline window. Minimum: 1.
baseline_trailing_days: 14baseline_start β fixed baseline start date
string Β· optional
ISO 8601 format.
baseline_start: "2024-01-01"baseline_end β fixed baseline end date
string Β· optional
ISO 8601 format.
baseline_end: "2024-03-31"num_bins β histogram bins for drift metrics
integer Β· optional
Number of histogram bins. Range: 2--1000.
num_bins: 50id β stable condition identifier
string Β· optional
Preserved across updates.
id: null_rate_id_checkobject Β· required
Controls when the monitor runs. Bulk monitors do not support crontab-based scheduling and require a minimum interval of 60 minutes.
schedule:
type: fixed
interval_minutes: 1440Properties
type β schedule type
enum Β· optional Β· default: fixed
Accepted values: fixed Β· dynamic
Crontab, loose, and manual are not supported.
type: fixedinterval_minutes β run interval
integer Β· optional
Run interval for fixed schedules. Minimum 60 minutes.
interval_minutes: 1440start_time β schedule start time
string Β· optional
ISO 8601 format.
start_time: "2024-01-01T06:00:00Z"timezone β schedule timezone
string Β· optional
timezone: America/New_Yorkdynamic_schedule_tables β tables that trigger the monitor
array of strings Β· yes, when type is dynamic (unless dynamic_schedule_jobs is set)
dynamic_schedule_tables:
- raw:ingestion.eventsdynamic_schedule_jobs β jobs that trigger the monitor
array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
job_type | string | yes | AdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob |
job_name | string | yes | Name of the job |
project_name | string | yes | Project or workspace containing the job |
task_name | string | no | Specific task within the job |
mcon | string | no | MCON identifier for the job |
dynamic_schedule_jobs:
- job_type: DbtJob
job_name: nightly_build
project_name: analyticsmin_interval_minutes β minimum interval between dynamic runs
integer Β· optional
min_interval_minutes: 120array of strings (exactly one entry) Β· required on all accounts created after January 2025
Set default_domain in montecarlo.yml to avoid repeating it on every monitor.
domains:
- my-domainstring Β· optional (yes if multiple warehouses)
Warehouse UUID or name. Overrides default_resource from montecarlo.yml.
warehouse: prod-snowflakestring Β· required
Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one.
name: id_null_rate_monitoringstring Β· optional
Visible in the Monte Carlo UI. Not included in notifications.
notes: Owned by the analytics team. Reviewed quarterly.array of strings Β· optional
Audience names linking this monitor to channels defined in Notifications as Code.
audiences:
- data-engineering-alerts
- platform-alertsarray of strings Β· optional
Separate audiences for run-failure notifications. Falls back to audiences if not set.
failure_audiences:
- oncall-alertsenum Β· optional
Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5
priority: P3array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tag key |
value | string | no | Tag value |
tags:
- name: team
value: analytics
- name: environment
value: productionenum Β· optional
Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY
data_quality_dimension: COMPLETENESSinteger Β· optional
Number of hours to lag data collection, accounting for late-arriving data.
collection_lag_hours: 6string Β· optional
Timestamp column used to bucket data by time across all matched tables. The column must exist on every matched table.
aggregate_time_field: CREATED_ATenum Β· optional
Accepted values: hour Β· day Β· week Β· month
aggregate_by: dayobject Β· optional
Only valid when monitor_type is bulk_pii.
| Property | Type | Required | Description |
|---|---|---|---|
percentage | number | no | Percentage of rows to sample (0--100) |
count | integer | no | Fixed number of rows to sample |
sampling_config:
percentage: 10boolean Β· optional Β· default: false
Automatically remove tables from monitoring when they are deleted or become inaccessible. Only valid when monitor_type is bulk_pii.
auto_prune_enabled: trueboolean Β· optional Β· default: false
Only monitor tables that are upstream of important downstream assets. Only valid when monitor_type is bulk_pii.
lineage_narrowing_enabled: trueboolean Β· optional Β· default: false
Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β always include it if you want the monitor to stay in draft.
is_draft: truestring Β· optional
Include the UUID of an existing monitor to update it instead of creating a new one.
uuid: 0dae7702-0950-45c7-909c-7e183bddca19Deprecated fields
| Field | Use instead |
|---|---|
resource | warehouse |
domain | domains |
domain_uuids | domains |
labels | audiences |
API-only fieldsSome fields visible in the API or JSON Schema (
domain_restrictions) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.
Examples
Null rate monitoring on all ID columns
Monitors null rates across all columns ending in _ID in the analytics core schema.
montecarlo:
bulk_monitor:
- name: core_id_null_rate
description: Monitor null rates on ID columns across core analytics
monitor_type: bulk_metric
warehouse: prod-snowflake
asset_selection:
databases:
- name: ANALYTICS
schemas:
- CORE
alert_conditions:
- metric: NULL_RATE
operator: AUTO
field_pattern:
operator: ENDING_WITH
value: _ID
field_type: NUMERIC
schedule:
type: fixed
interval_minutes: 1440
audiences:
- data-engineering-alerts
priority: P3
data_quality_dimension: COMPLETENESS
tags:
- name: team
value: analytics
domains:
- my-domainPII detection across multiple schemas
Scans for PII patterns in text columns across raw data schemas.
montecarlo:
bulk_monitor:
- name: raw_pii_scan
description: Scan for PII in raw ingestion tables
monitor_type: bulk_pii
warehouse: prod-snowflake
asset_selection:
databases:
- name: RAW
schemas:
- INGESTION
- API_IMPORTS
alert_conditions:
- metric: NULL_COUNT
operator: GT
threshold_value: 0
field_pattern:
operator: CONTAINING
value: ""
field_type: TEXT
schedule:
type: fixed
interval_minutes: 1440
audiences:
- security-alerts
priority: P1
data_quality_dimension: VALIDITY
domains:
- my-domainDynamic scheduling with auto-pruning
Runs after table updates and automatically removes deleted tables. auto_prune_enabled and lineage_narrowing_enabled are only valid for bulk_pii monitors.
montecarlo:
bulk_monitor:
- name: timestamp_cols_bulk_pii
description: Monitor freshness-sensitive columns after each load
monitor_type: bulk_pii
warehouse: prod-snowflake
asset_selection:
databases:
- name: RAW
auto_prune_enabled: true
lineage_narrowing_enabled: true
alert_conditions:
- metric: NULL_RATE
operator: AUTO
field_pattern:
operator: MATCHING
value: ".*_TIMESTAMP"
field_type: TIME
schedule:
type: dynamic
dynamic_schedule_tables:
- raw:ingestion.events
- raw:ingestion.transactions
min_interval_minutes: 120
priority: P2
domains:
- my-domainTroubleshooting
Monitor type and fields
- Using
monitor_type: metricinstead ofbulk_metric. Themonitor_typeenum values arebulk_metricandbulk_pii. Usingmetric(the MaC key for the single-table metric monitor) is not valid here. - Omitting
field_patternfrom alert conditions. Unlike the single-table metric monitor where you can list specificfields, bulk monitors requirefield_patternon every alert condition.
Scheduling
- Setting
schedule.interval_minutesbelow 60. Bulk monitors enforce a minimum interval of 60 minutes. Lower values fail validation. - Using crontab scheduling. Bulk monitors do not support
interval_crontab. Useinterval_minutesfor fixed schedules. - Using
looseormanualschedule types. Bulk monitors only supportfixedanddynamicschedule types.
Asset and field selection
- Using
aggregate_time_fieldwith tables that lack the column. If any table matched byasset_selectiondoes not have the specified timestamp column, the monitor fails on that table. Ensure the column exists across all matched tables, or split into separate monitors. - Wrong
tagsformat. Tags must be objects withnameand optionalvaluekeys. Writingtags: ["my-tag"]fails validation.
Updates and deprecated fields
- Forgetting PUT semantics on updates. When updating a monitor by including
uuid, every field you omit reverts to its default β it is not left unchanged. Always specify the complete desired configuration. - Using
domaininstead ofdomains. The singulardomainfield is deprecated. Usedomains(plural) in all new configurations.
Updated about 2 hours ago
