Monitors as Code

Overview

Monte Carlo developed a YAML-based monitors configuration to help teams deploy monitors as part of their CI/CD process. The following guide explains how to get started with monitors as code.

Prerequisites

  1. Install the CLI — https://docs.getmontecarlo.com/docs/using-the-cli
  2. When running montecarlo configure, provide your API key and you may leave the AWS settings blank

Integrating into your CI/CD pipeline

Deploying monitors within a continuous integration pipeline is straightforward. Once changes are merged into your main production branch, configure your CI pipeline to install the montecarlodata CLI:

pip install montecarlodata

And run this command:

MCD_DEFAULT_API_ID=${MCD_DEFAULT_API_ID} \
    MCD_DEFAULT_API_TOKEN=${MCD_DEFAULT_API_TOKEN} \
    montecarlo monitors apply \
            --namespace ${MC_MONITORS_NAMESPACE} \
            --project-dir ${PROJECT_DIR}

These environment variables need to be populated:

  • MCD_DEFAULT_API_ID: Your Monte Carlo API ID
  • MCD_DEFAULT_API_TOKEN: Your Monte Carlo API Token
  • MC_MONITORS_NAMESPACE: Namespace to apply monitor configuration to
  • PROJECT_DIR: If the montecarlo command is not run within your Monte Carlo project directory, you can specify it here. Optional.

Using code to define monitors

First, you will need to create a Monte Carlo project. A Monte Carlo project is simply a directory which contains a montecarlo.yml file, which contains project-level configuration options. If you are using DBT, we recommend placing montecarlo.yml in the same directory as dbt_project.yml.

The montecarlo.yml format:

version: 1
default_resource: <string>
include_file_patterns:
  - <string>
exclude_file_patterns:
  - <string>

Description of options:

  • version: The version of MC configuration. Set to 1
  • default_resource: The warehouse friendly name or UUID where YAML-defined monitors will be created.
    • If your account only has a single warehouse configured, MC will use this warehouse by default, and this option does not need to be defined.
    • If you have multiple warehouses configured, you will need to either (1) define default_resource, or (2) specify the warehouse friendly name or UUID for each monitor explicitly in the resource property (see YAML format for configuring monitors below).
  • include_file_patterns: List of file patterns to include when searching for monitor configuration files. By default, this is set to **/*.yaml and **/*.yml . With these defaults, MC will search recursively for all directories nested within the project directory for any files with a yaml or yml extension.
  • exclude_file_patterns: List of file patterns to exclude when searching for monitor configuration files.

Example montecarlo.yml configuration file, which should be sufficient for customers with a single warehouse:

version: 1

Example montecarlo.yml configuration file, for customers with multiple warehouses configured.

version: 1
default_resource: bigquery

Defining individual monitors

Monitors are defined in YAML files within directories nested within the project. Monitors can be configured in standalone YAML files, or embedded within DBT schema.yml files within the meta property of a DBT model definition.

Standalone monitor YAML files can be contained in any .yml file in any directory nested within the project. Example:

montecarlo:
  field_health:
    - table: project:dataset.table_name
      timestamp_field: created
  dimension_tracking:
    - table: project:dataset.table_name
      timestamp_field: created
      field: order_status

Example of monitor embedded within a DBT schema.yml file:

version: 2

models:
  - name: table_name
    description: My table
    meta:
      montecarlo:
        field_health:
          - table: project:dataset.table_name
            timestamp_field: created
        dimension_tracking:
          - table: project:dataset.table_name
            timestamp_field: created
            field: order_status

📘

Tip: Using monitors as code with DBT

You may find that embedding monitor configurations within DBT schema.yml may make maintenance easier, as all configuration/metadata concerning a given table are maintained in the same location.

Monitor configuration reference

montecarlo:
  field_health:
    - table: <string>  # required
      fields:
        - <string>
      timestamp_field: <string>
      where_condition: <string>
      schedule:  # optional -- by default, loose schedule with interval_minutes=720 (12h)
        type: <loose, fixed, or dynamic>  # required
        interval_minutes: <integer>  # required if loose or fixed
        start_time: <date as isoformatted string>  # required if fixed
  dimension_tracking:
    - table: <string>  # required
      field: <string>  # required
      timestamp_field: <string>
      where_condition: <string>
      schedule:  # optional -- by default, loose schedule with interval_minutes=720 (12h)
        type: <loose, fixed, or dynamic>  # required
        interval_minutes: <integer>  # required if loose or fixed
        start_time: <date as isoformatted string>  # required if fixed
  json_schema:
    - table: <string>  # required
      field: <string>  # required
      timestamp_field: <string>
      where_condition: <string>
      schedule:  # optional -- by default, loose schedule with interval_minutes=720 (12h)
        type: <loose, fixed, or dynamic>  # required
        interval_minutes: <integer>  # required if loose or fixed
        start_time: <date as isoformatted string>  # required if fixed
  custom_sql:
    - sql: <string>  # required
      comparisons: <comparison>  # required
      description: <string>
      schedule:
        type: fixed  # must be fixed
        start_time: <date as isoformatted string>
        interval_minutes: <integer> 
  freshness:
    - table: <string>  # required
      freshness_threshold: <integer>  # required
      description: <string>
      schedule:
        type: fixed  # must be fixed
        start_time: <date as isoformatted string>
        interval_minutes: <integer>
  volume:
    - table: <string>  # required
      comparisons: <comparison>  # required
      volume_metric: <row_count or byte_count>  # row_count by default
      description: <string>
      schedule:
        type: fixed  # must be fixed
        start_time: <date as isoformatted string>
        interval_minutes: <integer>

field_health

Configures a field health monitor

  • table: MC global table ID (format <database>:<schema>.<table name>
  • fields: List of fields in table to monitor. Optional — by default all fields are monitored
  • timestamp_field: Timestamp field
  • where_condition: SQL snippet of where condition to add to field health query
  • schedule
    • type: One of loose, fixed, or dynamic
    • interval_minutes: For loose or fixed, how frequently to run the monitor
    • start_time: For fixed, when to start the schedule

dimension_tracking

Configures a dimension tracking monitor

  • table: MC global table ID (format <database>:<schema>.<table name>
  • field: Field in table to monitor
  • timestamp_field: Timestamp field
  • where_condition: SQL snippet of where condition to add to field health query
  • schedule
    • type: One of loose, fixed, or dynamic
    • interval_minutes: For loose or fixed, how frequently to run the monitor
    • start_time: For fixed, when to start the schedule

json_schema

Configures a JSON schema monitor

  • table: MC global table ID (format <database>:<schema>.<table name>
  • field: Field in table to monitor
  • timestamp_field: Timestamp field
  • where_condition: SQL snippet of where condition to add to field health query
  • schedule
    • type: One of loose, fixed, or dynamic
    • interval_minutes: For loose or fixed, how frequently to run the monitor
    • start_time: For fixed, when to start the schedule

custom_sql

  • sql: SQL of rule
  • comparisons: See comparisons below
  • description: Friendly description of rule
  • schedule
    • type: Must be fixed
    • interval_minutes: How frequently to run the monitor
    • start_time: When to start the schedule

freshness

  • table: MC global table ID (format <database>:<schema>.<table name>
  • freshness_threshold: Freshness breach threshold in minutes
  • description: Friendly description of rule
  • schedule
    • type: Must be fixed
    • interval_minutes: How frequently to run the monitor
    • start_time: When to start the schedule

volume

  • table: MC global table ID (format <database>:<schema>.<table name>
  • volume_metric: Must be row_count or byte_count — defines which volume metric to monitor
  • comparisons: See comparisons below
  • description: Friendly description of rule
  • schedule
    • type: Must be "fixed"
    • interval_minutes: How frequently to run the monitor
    • start_time: When to start the schedule

comparisons

  • type: threshold or change. If threshold, threshold_value below is an absolute value. If change , threshold_value as change from the historical baseline
  • operator: One of EQ, NEQ, GT, GTE, LT, LTE. Operator of comparison, =, ≠, >, ≥, <, ≤ respectively.
  • threshold_value: Threshold value
  • baseline_agg_function: If type = change, the aggregation function used to aggregate data points to calculate historical baseline
  • baseline_interval_minutes: If type = change, the time interval in minutes (backwards from current time) to aggregate over to calculate historical baseline
  • is_threshold_relative: If type = change, whether or not theshold_value is a relative vs absolute threshold.

Example

montecarlo:
  field_health:
    - table: project:dataset.table_name
      timestamp_field: created
      schedule:
        type: dynamic
  dimension_tracking:
    - table: project:dataset.table_name
      timestamp_field: created
      field: order_status
  custom_sql:
    - description: Test rule
      sql: |
         select foo from project.dataset.my_table
      comparisons:
        - type: threshold
          operator: GT
          threshold_value: 0
      schedule:
        type: fixed
        interval_minutes: 60
        start_time: "2021-07-27T19:00:00"
  freshness:
    - table: project:dataset.table_name
      freshness_threshold: 30
      schedule:
        type: fixed
        interval_minutes: 30
        start_time: "2021-07-27T19:00:00"

Developing and testing locally

To apply monitor configuration to MC:

montecarlo monitors apply --namespace <namespace>

Monitors configured using the CLI are organized under namespaces. All apply operations are scoped to a namespace. Namespaces make it easier to organize and manage monitors as code.

The apply command behaves as follows:

  1. MC will search for all monitor configuration elements in the project, both in standalone and embedded in DBT schema files. All monitor configuration elements will be concatenated into a single configuration template.
  2. MC will apply the configuration template to your MC account:
    1. Any new monitors defined since last apply will be created
    2. Any previously defined monitors present in current configuration template will be updated. Note that the CLI will report the monitor as UPDATED even if none of the attributes of the monitor were changed.
    3. Any previously defined monitors absent from current configuration template will be deleted

The apply command also supports a --dry-run argument which will dry run the configuration update and report each operation.

To delete (destroy) a namespace:

montecarlo monitors delete --namespace <namespace>

This will delete all monitors for a given namespace.


Did this page help you?