Monitors as Code
Overview
Monte Carlo developed a YAML-based monitors configuration to help teams deploy monitors as part of their CI/CD process. The following guide explains how to get started with monitors as code.
Prerequisites
- Install the CLI β https://docs.getmontecarlo.com/docs/using-the-cli
- When running
montecarlo configure
, provide your API key and you may leave the AWS settings blank
Integrating into your CI/CD pipeline
Deploying monitors within a continuous integration pipeline is straightforward. Once changes are merged into your main production branch, configure your CI pipeline to install the montecarlodata
CLI:
pip install montecarlodata
And run this command:
MCD_DEFAULT_API_ID=${MCD_DEFAULT_API_ID} \
MCD_DEFAULT_API_TOKEN=${MCD_DEFAULT_API_TOKEN} \
montecarlo monitors apply \
--namespace ${MC_MONITORS_NAMESPACE} \
--project-dir ${PROJECT_DIR}
These environment variables need to be populated:
MCD_DEFAULT_API_ID
: Your Monte Carlo API IDMCD_DEFAULT_API_TOKEN
: Your Monte Carlo API TokenMC_MONITORS_NAMESPACE
: Namespace to apply monitor configuration toPROJECT_DIR
: If themontecarlo
command is not run within your Monte Carlo project directory, you can specify it here. Optional.
Namespaces
You can think of namespaces like Cloudformation stack names. Itβs a logical separation of a collection of resources that you can define. Monitors from different namespaces are isolated from each other.
Some examples of why this is useful -
- You have multiple people (or teams) working on managing monitors and donβt want to conflict or override configurations.
- You want to manage different groups monitors in different pipelines (e.g. dbt models in CI/CD x & non-dbt models in CI/CD y).
Using code to define monitors
First, you will need to create a Monte Carlo project. A Monte Carlo project is simply a directory which contains a montecarlo.yml
file, which contains project-level configuration options. If you are using DBT, we recommend placing montecarlo.yml
in the same directory as dbt_project.yml
.
The montecarlo.yml
format:
version: 1
default_resource: <string>
include_file_patterns:
- <string>
exclude_file_patterns:
- <string>
Description of options:
version
: The version of MC configuration. Set to1
default_resource
: The warehouse friendly name or UUID where YAML-defined monitors will be created.- If your account only has a single warehouse configured, MC will use this warehouse by default, and this option does not need to be defined.
- If you have multiple warehouses configured, you will need to either (1) define
default_resource
, or (2) specify the warehouse friendly name or UUID for each monitor explicitly in theresource
property (see YAML format for configuring monitors below).
include_file_patterns
: List of file patterns to include when searching for monitor configuration files. By default, this is set to**/*.yaml
and**/*.yml
. With these defaults, MC will search recursively for all directories nested within the project directory for any files with ayaml
oryml
extension.exclude_file_patterns
: List of file patterns to exclude when searching for monitor configuration files.
Example montecarlo.yml
configuration file, which should be sufficient for customers with a single warehouse:
version: 1
Example montecarlo.yml
configuration file, for customers with multiple warehouses configured.
version: 1
default_resource: bigquery
Define monitors in separate YML files than
montecarlo.yml
Your montecarlo.yml file should only be used to define project-level configuration options. Use separate YML files to define individual monitors.
Defining individual monitors
Monitors are defined in YAML files within directories nested within the project. Monitors can be configured in standalone YAML files, or embedded within DBT schema.yml
files within the meta
property of a DBT model definition.
Standalone monitor YAML files can be contained in any .yml file in any directory nested within the project. Example:
montecarlo:
field_health:
- table: project:dataset.table_name
timestamp_field: created
dimension_tracking:
- table: project:dataset.table_name
timestamp_field: created
field: order_status
Example of monitor embedded within a DBT schema.yml
file:
version: 2
models:
- name: table_name
description: My table
meta:
montecarlo:
field_health:
- table: project:dataset.table_name
timestamp_field: created
dimension_tracking:
- table: project:dataset.table_name
timestamp_field: created
field: order_status
Tip: Using monitors as code with DBT
You may find that embedding monitor configurations within DBT
schema.yml
may make maintenance easier, as all configuration/metadata concerning a given table are maintained in the same location.
Monitor configuration reference
montecarlo:
field_health:
- table: <string> # required
name: <string> # optional -- by default it will be autogenerated
description: <string>
fields:
- <string>
use_important_fields: <bool> # optional -- by default, do not use important fields
segmented_expressions:
- <string> # Can be a field or a SQL expression
timestamp_field: <string>
timestamp_field_expression: <string>
where_condition: <string>
lookback_days: <int>
aggregation_time_interval: <one of 'day' or 'hour'>
schedule: # optional -- by default, loose schedule with interval_minutes=720 (12h)
type: <loose, fixed, or dynamic> # required
interval_minutes: <integer> # required if loose or fixed
start_time: <date as isoformatted string> # required if fixed
labels:
- <string>
dimension_tracking:
- table: <string> # required
name: <string> # optional -- by default it will be autogenerated
description: <string>
field: <string> # required
timestamp_field: <string>
timestamp_field_expression: <string>
where_condition: <string>
lookback_days: <int>
aggregation_time_interval: <one of 'day' or 'hour'>
schedule: # optional -- by default, loose schedule with interval_minutes=720 (12h)
type: <loose, fixed, or dynamic> # required
interval_minutes: <integer> # required if loose or fixed
start_time: <date as isoformatted string> # required if fixed
labels:
- <string>
json_schema:
- table: <string> # required
name: <string> # optional -- by default it will be autogenerated
description: <string>
field: <string> # required
timestamp_field: <string>
timestamp_field_expression: <string>
where_condition: <string>
schedule: # optional -- by default, loose schedule with interval_minutes=720 (12h)
type: <loose, fixed, or dynamic> # required
interval_minutes: <integer> # required if loose or fixed
start_time: <date as isoformatted string> # required if fixed
labels:
- <string>
custom_sql:
- sql: <string> # required
name: <string> # optional -- by default it will be autogenerated
comparisons: <comparison> # required
variables: <variable values>
description: <string>
notes: <string>
schedule:
type: fixed # must be fixed
start_time: <date as isoformatted string>
interval_minutes: <integer>
interval_crontab:
- <string>
labels:
- <string>
severity: <string>
freshness:
- table: <string> / tables: <list> # required
name: <string> # optional -- by default it will be autogenerated
freshness_threshold: <integer> # required
description: <string>
schedule:
type: fixed # must be fixed
start_time: <date as isoformatted string>
interval_minutes: <integer>
interval_crontab:
- <string>
labels:
- <string>
severity: <string>
volume:
- table: <string> / tables: <list> # required
name: <string> # optional -- by default it will be autogenerated
comparisons: <comparison> # required
volume_metric: <row_count or byte_count> # row_count by default
description: <string>
schedule:
type: fixed # must be fixed
start_time: <date as isoformatted string>
interval_minutes: <integer>
interval_crontab:
- <string>
labels:
- <string>
severity: <string>
Lookback Limits
Where we allow you to specify a longer lookback period on some monitors (in case the data in your table has historical timestamps), you cannot pick a number larger than 7. This is because for each day we "lookback", an additional query against your table is run. This is a safeguard to prevent specifying a very large period, like 90 days, and then having 90 queries run against your warehouse each time the monitor runs. If you need help with these windows, please feel free to reach out to [email protected] or the chat bot in the lower right hand corner.
field_health
Configures a field health monitor
table
: MC global table ID (format<database>:<schema>.<table name>
name
: Optional name used to identify the monitor (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingtable
,timestamp_field
andwhere_condition
.description
: Friendly description of rulefields
: List of fields in table to monitor. Optional β by default all fields are monitoreduse_important_fields
: Defaults to false. If true, use the table current important fields to build the monitor. You can use important fields and also provide a specific list of fields at the same time.segmented_expressions
: List of fields or SQL expressions used to segment the field (must have exactly one field infields
). Enables Monitoring by Dimension.timestamp_field
: Timestamp fieldtimestamp_field_expression
: Arbitrary SQL expression to be used as timestamp field, e.g.DATE(created)
. Must use eithertimestamp_field
ortimestamp_field_expression
or neither.where_condition
: SQL snippet of where condition to add to field health querylookback_days
: Lookback period in days. Default: 3aggregation_time_interval
: Aggregation bucket time interval, eitherhour
(default) orday
schedule
type
: One ofloose
,fixed
, ordynamic
interval_minutes
: For loose or fixed, how frequently to run the monitorstart_time
: For fixed, when to start the schedule
labels
: Optional list of labels associated with the monitor.
The monitored fields cannot exceed 300 fields, including important fields and manually specified fields.
dimension_tracking
Configures a dimension tracking monitor
table
: MC global table ID (format<database>:<schema>.<table name>
name
: Optional name used to identify the monitor (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingtable
,field
,timestamp_field
andwhere_condition
.description
: Friendly description of rulefield
: Field in table to monitor or a valid SQL expression that returns the row's dimension value as a stringtimestamp_field
: Timestamp fieldtimestamp_field_expression
: Arbitrary SQL expression to be used as timestamp field, e.g.DATE(created)
. Must use eithertimestamp_field
ortimestamp_field_expression
or neither.where_condition
: SQL snippet of where condition to add to field health querylookback_days
: Lookback period in days. Default: 3aggregation_time_interval
: Aggregation bucket time interval, eitherhour
(default) orday
schedule
type
: One ofloose
,fixed
, ordynamic
interval_minutes
: For loose or fixed, how frequently to run the monitorstart_time
: For fixed, when to start the schedule
labels
: Optional list of labels associated with the monitor.
json_schema
Configures a JSON schema monitor
table
: MC global table ID (format<database>:<schema>.<table name>
name
: Optional name used to identify the monitor (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingtable
,field
,timestamp_field
andwhere_condition
.description
: Friendly description of rulefield
: Field in table to monitortimestamp_field
: Timestamp fieldtimestamp_field_expression
: Arbitrary SQL expression to be used as timestamp field, e.g.DATE(created)
. Must use eithertimestamp_field
ortimestamp_field_expression
or neither.where_condition
: SQL snippet of where condition to add to field health queryschedule
type
: One ofloose
,fixed
, ordynamic
interval_minutes
: For loose or fixed, how frequently to run the monitorstart_time
: For fixed, when to start the schedule
labels
: Optional list of labels associated with the monitor.
custom_sql
sql
: SQL of rulequery_result_type
: Optional, can be set toSINGLE_NUMERIC
to make the rule use a value-based thresholdsampling_sql
: Optional custom SQL query to be run on breach (results will be displayed in Incident IQ to help with investigation). Only supported for value-based thresholds (query_result_type
isSINGLE_NUMERIC
, see above).name
: Optional name used to identify the rule (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingsql
.comparisons
: See comparisons belowvariables
: See variables belowdescription
: Friendly description of rulenotes
: Additional context for the ruleschedule
type
: Can befixed
ormanual
.Manual
would be for SQL rules implemented during processes like Circuit Breakers.interval_minutes
: How frequently to run the monitor (in minutes).interval_crontab
: How frequently to run the monitor (using a list of CRON expressions, check example below).start_time
: When to start the schedule. Required forfixed
.
labels
: Optional list of labels associated with the rule.severity
: Optional, pre-set the severity of incidents generated by this monitor.
comparisons
comparisons
are definitions of breaches, not expected return values. This section would be where you would define the logic for when to get alerted about anomalous behavior in your monitor. For example, if you make a custom SQL rule and pick:
type
:threshold
operator
:GT
threshold_value
: 100When Monte Carlo runs your monitor and the return results are greater than 100, we will fire an alert to any routes configured to be notified about breaches to this monitor.
type
:threshold
,dynamic_threshold
orchange
. Ifthreshold
,threshold_value
below is an absolute value. Ifdynamic_threshold
no threshold is needed (it will be determined automatically). Ifchange
,threshold_value
as change from the historical baselineoperator
: One ofEQ
,NEQ
,GT
,GTE
,LT
,LTE
. Operator of comparison, =, β , >, β₯, <, β€ respectively.threshold_value
: Threshold valuebaseline_agg_function
: If type =change
, the aggregation function used to aggregate data points to calculate historical baseline. One ofAVG
,MAX
,MIN
.baseline_interval_minutes
: If type =change
, the time interval in minutes (backwards from current time) to aggregate over to calculate historical baselineis_threshold_relative
: If type =change
, whether or notthreshold_value
is a relative vs absolute threshold.is_threshold_relative: true
would be a percentage measurement,is_threshold_relative: false
would be a numerical measurement. Relative means the threshold_value will be treated as a percentage value, Absolute means the threshold_value will be treated as an actual count of rows.
variables
When defining custom sql sentences, you can use variables to execute the same sentence for different combinations of values. Variables are defined as {{variable_name}}. Then, you can define one or more values for each variable, and all combinations will be tested.
Here is an example defining the same sentence for several tables and conditions (4 sentences will be executed):
custom_sql: - sql: | select foo from {{table}} where {{cond}} variables: table: - project:dataset.table1 - project:dataset.table2 cond: - col1 > 1 - col2 > 2
freshness
table
: MC global table ID (format<database>:<schema>.<table name>
)tables
: Instead oftable
, can also usetables
to define a list of tables (check example with multiple tables below).name
: Optional name used to identify the rule (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingtable
.freshness_threshold
: Freshness breach threshold in minutesdescription
: Friendly description of ruleschedule
type
: Must befixed
interval_minutes
: How frequently to run the monitor (in minutes).interval_crontab
: How frequently to run the monitor (using a list of CRON expressions, check example below).start_time
: When to start the schedule
labels
: Optional list of labels associated with the rule.severity
: Optional, pre-set the severity of incidents generated by this monitor.
volume
table
: MC global table ID (format<database>:<schema>.<table name>
tables
: Instead oftable
, can also usetables
to define a list of tables (check example with multiple tables below).name
: Optional name used to identify the rule (to determine whether to create a new one or update an existing one for the namespace). By default, it will be autogenerated usingtable
.volume_metric
: Must betotal_row_count
ortotal_byte_count
β defines which volume metric to monitorcomparisons
: See comparisons belowdescription
: Friendly description of ruleschedule
type
: Must be "fixed"interval_minutes
: How frequently to run the monitor (in minutes).interval_crontab
: How frequently to run the monitor (using a list of CRON expressions, check example below).start_time
: When to start the schedule
labels
: Optional list of labels associated with the rule.severity
: Optional, pre-set the severity of incidents generated by this monitor.
comparisons
type
:absolute_volume
orgrowth_volume
.
If absolute_volume
:
operator
: One ofEQ
,GTE
,LTE
. Operator of comparison, =, β₯, β€ respectively.threshold_lookback_minutes
: if operator isEQ
, the time to look back to compare with the current value.threshold_value
: If operator isGTE
orLTE
, the threshold value
If growth_volume
:
operator
: One ofEQ
,GT
,GTE
,LT
,LTE
. Operator of comparison, =, >, β₯, <, β€ respectively.baseline_agg_function
: the aggregation function used to aggregate data points to calculate historical baseline. One ofAVG
,MAX
,MIN
.number_of_agg_periods
: the number of periods to use in the aggregate comparison.baseline_interval_minutes
: the aggregation period length.min_buffer_value
/max_buffer_value
: the lower / upper bound buffer to modify the alert threshold.min_buffer_modifier_type
/max_buffer_modifier_type
: the modifier type of min / max buffer, can beMETRIC
(absolute value) orPERCENTAGE
.
Example
montecarlo:
field_health:
- table: project:dataset.table_name
timestamp_field: created
schedule:
type: dynamic
labels:
- label_name1
- table: project:dataset.table_name
timestamp_field: created
fields:
- field_name
segmented_expressions:
- segmented_expression
schedule:
type: dynamic
dimension_tracking:
- table: project:dataset.table_name
timestamp_field: created
field: order_status
labels:
- label_name2
custom_sql:
- description: Test rule
sql: |
select foo from project.dataset.my_table
comparisons:
- type: threshold
operator: GT
threshold_value: 0
schedule:
type: fixed
interval_minutes: 60
start_time: "2021-07-27T19:00:00"
severity: SEV1
freshness:
- table: project:dataset.table_name
freshness_threshold: 30
schedule:
type: fixed
interval_minutes: 30
start_time: "2021-07-27T19:00:00"
Example with multiple tables
montecarlo:
freshness:
- tables:
- project:dataset.table_name1
- project:dataset.table_name2
freshness_threshold: 30
schedule:
type: fixed
interval_minutes: 30
start_time: "2021-07-27T19:00:00"
Example with CRON expressions
montecarlo:
custom_sql:
- description: Test rule
sql: |
select foo from project.dataset.my_table
comparisons:
- type: threshold
operator: GT
threshold_value: 0
schedule:
type: fixed
interval_crontab:
- "0 10,16 * * MON-FRI"
- "0 12 * * SAT-SUN"
start_time: "2021-07-27T19:00:00"
Developing and testing locally
To apply monitor configuration to MC:
montecarlo monitors apply --namespace <namespace>
Monitors configured using the CLI are organized under namespaces. All apply
operations are scoped to a namespace.
Namespaces make it easier to organize and manage monitors as code.
The apply
command behaves as follows:
- MC will search for all monitor configuration elements in the project, both in standalone and embedded in DBT schema files. All monitor configuration elements will be concatenated into a single configuration template.
- MC will apply the configuration template to your MC account:
- Any new monitors defined since last apply will be created
- Any previously defined monitors present in current configuration template will be updated if any attributes have changed
- Any previously defined monitors absent from current configuration template will be deleted
Dry Runs
The
apply
command also supports a--dry-run
argument which will dry run the configuration update and report each operation. Using this argument just shows planned changes but doesn't apply them.
The apply
command also supports a --dry-run
argument which will dry run the configuration update and report each operation.
To delete (destroy) a namespace:
montecarlo monitors delete --namespace <namespace>
This will delete all monitors for a given namespace.
Updated 10 days ago