JSON Schema Monitor

Monitor JSON column structure for unexpected schema changes.

Overview

Detect structural changes in a JSON column β€” new keys appearing, existing keys vanishing, or value types shifting. Point it at any semi-structured column (JSON, VARIANT, SUPER, etc.) to catch upstream schema drift before it breaks downstream consumers.

πŸ“˜

Reference scope

This page covers MaC YAML configuration. For how JSON schema monitors work, see JSON Schema Monitors.

MaC key: json_schema.

Quick Start

montecarlo:
  json_schema:
    - name: raw_events_payload_schema
      description: Detect schema changes in raw event payloads
      table: analytics:events.raw_events
      field: payload
      timestamp_field: received_at
      lookback_days: 2
      schedule:
        type: fixed
        interval_minutes: 720
      domains:
        - my-domain

Configuration

table β€” fully qualified table name

string Β· required

Format: database:schema.table.

table: analytics:events.raw_events
field β€” JSON column to monitor

string Β· required

Name of the JSON column to monitor for schema changes.

field: payload
timestamp_field β€” timestamp column for lookback window

string Β· optional

Timestamp column used to scope the lookback window. Without this, the monitor scans all available rows.

timestamp_field: received_at
timestamp_field_expression β€” SQL expression for timestamp

string Β· optional

SQL expression that evaluates to a timestamp, used instead of timestamp_field when the column needs transformation.

timestamp_field_expression: "TO_TIMESTAMP(epoch_seconds)"
lookback_days β€” days of data to scan

integer Β· optional

Number of days to look back when scanning data. Recommended range: 0--7. Only effective when timestamp_field is set.

lookback_days: 2
where_condition β€” SQL WHERE filter

string Β· optional

SQL WHERE clause (without the WHERE keyword) to filter rows before analysis.

where_condition: "event_type = 'purchase'"
unnest_field β€” nested JSON array to unnest

string Β· optional

Name of a nested JSON array field to unnest before schema analysis. Use when the JSON column contains an array of objects and you want to monitor the schema of the array elements.

unnest_field: items
aggregation_time_interval β€” time bucket granularity

enum Β· optional

Accepted values: hour Β· day Β· week Β· month

aggregation_time_interval: day
schedule β€” execution schedule

object Β· optional Β· default: system-managed schedule

Controls when the monitor runs. JSON schema monitors do not support crontab-based scheduling.

schedule:
  type: fixed
  interval_minutes: 720

Properties


type β€” schedule type

enum Β· optional Β· default: fixed

Accepted values: fixed Β· dynamic

Crontab and manual are not supported. loose is deprecated β€” the backend rejects it with "Loose schedules are deprecated, use fixed instead."

type: fixed

interval_minutes β€” run interval

integer Β· optional

Run interval for fixed schedules.

interval_minutes: 720

start_time β€” schedule start time

string Β· optional

ISO 8601 format.

start_time: "2024-01-01T06:00:00Z"

timezone β€” schedule timezone

string Β· optional

timezone: America/New_York

dynamic_schedule_tables β€” tables that trigger the monitor

array of strings Β· required when type is dynamic (unless dynamic_schedule_jobs is set)

dynamic_schedule_tables:
  - raw:api.webhook_responses

dynamic_schedule_jobs β€” jobs that trigger the monitor

array of objects Β· optional

PropertyTypeRequiredDescription
job_typestringyesAdfJob Β· AirflowDag Β· DatabricksJob Β· DbtJob
job_namestringyesName of the job
project_namestringyesProject or workspace containing the job
task_namestringnoSpecific task within the job
mconstringnoMCON identifier for the job

dynamic_schedule_jobs:
  - job_type: AirflowDag
    job_name: raw_events_pipeline
    project_name: data-platform

min_interval_minutes β€” minimum interval between runs

integer Β· optional

Minimum interval between runs for dynamic schedules.

min_interval_minutes: 60
domains β€” domain for this monitor

array of strings (exactly one entry) Β· required on all accounts created after January 2025

Set default_domain in montecarlo.yml to avoid repeating it on every monitor.

domains:
  - my-domain
warehouse β€” which warehouse to use

string Β· optional Β· required if multiple warehouses

Warehouse UUID or name. Overrides default_resource from montecarlo.yml.

warehouse: prod-snowflake
name β€” unique identifier within the namespace

string Β· required

Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one.

name: raw_events_schema_check
description β€” monitor description

string Β· required

Max 512 characters.

description: Detect schema changes in raw event payloads
notes β€” internal notes

string Β· optional

Visible in the Monte Carlo UI. Not included in notifications.

notes: Owned by the data platform team.
audiences β€” notification channels

array of strings Β· optional

Audience names linking this monitor to channels defined in Notifications as Code.

audiences:
  - data-engineering-alerts
  - api-integration-alerts
failure_audiences β€” run-failure notification channels

array of strings Β· optional

Separate audiences for run-failure notifications. Falls back to audiences if not set.

failure_audiences:
  - oncall-alerts
notify_run_failure β€” notify on query failure

boolean Β· optional

Notify when the monitor query itself fails.

notify_run_failure: true
severity β€” severity level

string Β· optional

Severity level for incidents created by this monitor.

severity: SEV-2
priority β€” incident priority level

enum Β· optional

Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5

priority: P3
connection_name β€” named connection override

string Β· optional

Overrides the default warehouse connection.

connection_name: analytics-read-only
disable_look_back_bootstrap β€” skip initial bootstrap

boolean Β· optional

Skip the initial historical data bootstrap when the monitor is first created.

disable_look_back_bootstrap: true
tags β€” key-value pairs for organizing monitors

array of objects Β· optional

PropertyTypeRequiredDescription
namestringyesTag key
valuestringnoTag value

tags:
  - name: team
    value: data-platform
  - name: source
    value: api
data_quality_dimension β€” data quality category

enum Β· optional

Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY

data_quality_dimension: CONSISTENCY
is_draft β€” create as draft without activating

boolean Β· optional Β· default: false

Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β€” always include it if you want the monitor to stay in draft.

is_draft: true
uuid β€” update an existing monitor

string Β· optional

Include the UUID of an existing monitor to update it instead of creating a new one.

uuid: 0dae7702-0950-45c7-909c-7e183bddca19
Deprecated fields
FieldUse instead
resourcewarehouse
domaindomains
domain_uuidsdomains
labelsaudiences
notify_rule_run_failurenotify_run_failure
πŸ“˜

API-only fields

Some fields visible in the API or JSON Schema (skip_reset, fail_on_reset, select_expressions) are present in the schema but silently stripped during MaC YAML processing. They have no effect and should not be included.

Examples

Basic JSON schema monitoring

Monitors a JSON column for structural changes, looking back 2 days from each run.

montecarlo:
  json_schema:
    - name: raw_events_payload_schema
      table: analytics:events.raw_events
      field: payload
      timestamp_field: received_at
      lookback_days: 2
      description: Detect schema changes in raw event payloads
      schedule:
        type: fixed
        interval_minutes: 720
      audiences:
        - data-engineering-alerts
      priority: P3
      data_quality_dimension: CONSISTENCY
      tags:
        - name: team
          value: data-platform
      domains:
        - my-domain

Monitoring nested JSON arrays

Uses unnest_field to monitor the schema of objects inside a nested JSON array.

montecarlo:
  json_schema:
    - name: webhook_items_schema
      table: raw:api.webhook_responses
      field: response_body
      unnest_field: items
      timestamp_field: created_at
      lookback_days: 3
      description: Monitor schema of items array in webhook response bodies
      schedule:
        type: dynamic
        dynamic_schedule_tables:
          - raw:api.webhook_responses
      audiences:
        - api-integration-alerts
      priority: P2
      domains:
        - my-domain

Filtered monitoring with WHERE clause

Monitors JSON schema only for a specific event type.

montecarlo:
  json_schema:
    - name: purchase_properties_schema
      table: analytics:events.raw_events
      field: properties
      where_condition: "event_type = 'purchase'"
      timestamp_field: event_time
      lookback_days: 1
      aggregation_time_interval: day
      description: Track schema stability of purchase event properties
      schedule:
        type: fixed
        interval_minutes: 1440
      priority: P2
      domains:
        - my-domain

Troubleshooting

Timestamp and lookback

  • Omitting timestamp_field when using lookback_days. Without a timestamp column, lookback_days has no effect and the monitor scans all rows.
  • Setting a large lookback_days. Values outside the recommended 0--7 range are accepted at apply time but a wide window increases scan cost; keep it small unless you have a reason not to.

Scheduling

  • Using crontab scheduling. JSON schema monitors do not support interval_crontab or interval_crontab_day_operator. Use interval_minutes for fixed schedules.
  • Using manual or loose schedule type. JSON schema monitors support fixed and dynamic schedule types only. loose is deprecated and rejected by the backend.

Data source and fields

  • Using data_source. JSON schema monitors do not support a data_source block β€” the backend rejects it with "Data source is not supported for this monitor type." Specify the table with the top-level table field instead.
  • Wrong tags format. Tags must be objects with name and optional value keys. Writing tags: ["my-tag"] fails validation.

Updates and deprecated fields

  • Forgetting PUT semantics on updates. When updating a monitor by including uuid, every field you omit reverts to its default β€” it is not left unchanged. Always specify the complete desired configuration.
  • Using resource instead of warehouse. The resource field still works but is deprecated. Use warehouse in all new configurations.