In metric monitors, users can now define segments with a SQL Expression. Previously, segmentation could only be configured by picking 1 or 2 fields.

This helps support a long tail of segmentation use cases. For example, when segmenting, users can now concatenate many different fields (e.g. if you wanted to segment by 4 or 5 different fields), or shorten really long field values that impair usability.

Learn more about segmentation.

Up until this week, Cardinality Rules and Referential Integrity Rules were options on the Monitor Menu in Monte Carlo. These were purpose-built monitor creation experiences that produced a SQL Rule. These were for use cases like:

  • Alert me if any of the values in [field] are not included in set [value1, value2, value3, etc]
  • Alert me if any of the values in [field] are not present in [table > field]

Existing Cardinality Rules and Referential Integrity Rules will continue to run normally. But the workflows to create these have now been removed from the monitor menu.

Referential Integrity Rules and Cardinality Rules were made redundant by new ways to define a set in Validation Monitors. In Validation Monitors, the recommended way to address these use cases in now with the Is in set and Is not in set operators, which allow a user to define a set:

  • From a list: manually enter the values
  • From a field: select a table and field, and the set with be populated with the distinct values in that field. The selected field will be referenced each time the monitor is run, so the set values may change.
  • From a query: write SQL to define the values in the set. The query will run each time the monitor is run, so the set values may change.

Read more about this change here.

The is in set and is not in set operators now allow users to define a set using 3 possible methods:

  • From a list: manually enter the values
  • From a field: select a table and field, and the set with be populated with the distinct values in that field. The selected field will be referenced each time the monitor is run, so the set values may change.
  • From a query: write SQL to define the values in the set. The query will run each time the monitor is run, so the set values may change.

Previously, "From a list" was the only way to define a set. These new options make it easy generate large sets and keep them automatically updated as your data evolves. They are ideal for referential integrity checks and scenarios where you have large numbers of allowed values.

dbt and Airflow alerts will now be raised on ALL tables in Monte Carlo – whether the table is muted/monitored or not.

The alerts can be configured on a Job-level basis for each integration.

dbt

To configure which dbt Jobs will raise alerts, go to Settings -> Integrations -> dbt integration -> Edit and find the below configuration.

Learn more at dbt Failures As Monte Carlo Alerts

Airflow

To configure which Airlow Jobs will raise alerts there are two options:

  1. Go to Settings -> Integrations -> Airflow integration -> Configure Jobs and find the below configuration:

  2. Under Assets, search and navigate to the Airflow Job that you want to configure alerts for. On the top right, toggle "Generates alerts".

Learn more at Airflow Alerts and Task Observability

dbt snapshot and seed errors are now available in MC as alerts, alongside model and test alerts. Users can go to settings -> dbt integrations -> edit, to configure the option to send those alerts. Make sure to add the new alert types in the relevant audiences to receive notifications. (docs)

snapshot errors in alert feed

snapshot errors in alert feed


Configure alerts options in Settings

Configure alerts options in Settings

With the availability of SQL query history, a series of features for Databricks integration are added:

  • Importance scores: estimates the importance (0 to 1) of assets based on various query history data (see details here).
  • Usage stats: usage information like reads/day, writes/day, users etc are now available on the "assets" page, "general information" tab.
  • Performance dashboard and monitors: the dashboard helps identify the slowest SQL queries and enables investigation on performance issues. Users can also use performance monitors to detect slow running SQL queries.
  • Query logs: SQL query logs are now shown on the assets page
  • Query Change RCA insights: associated SQL query changes are presented for a volume / freshness / field alert to help uncover query related root causes.

Limitations: note these features are available for assets on Unity Catalog and are constrained to SQL queries.

In order to enable these features, read permission needs to be granted for system.query.history table per docs here.

Usage stats

Usage stats

Performance Dashboard for Databricks SQL queries

Performance Dashboard for Databricks SQL queries

Databricks Query Logs

Databricks Query Logs

Query Change Insights in Incident

Query Change Insights in Incident

For many businesses, incoming data looks very different during major holidays. Maybe there is more or less data than normal, or the profile of that data is much different. For example, a financial technology firm may see less new data than normal on July 4, because markets are closed in the United States. Or an e-commerce company based in the United States may see more new data than normal on Black Friday, because it's one of the busiest shopping days of the year.

You can now easily use exclusion windows to Manage Holidays in order to:

  • Remove the unusual holiday data from influencing machine learning thresholds
  • Avoid receiving unwanted alerts during holidays

The available holidays include the 11 Federal Holidays within the United States, plus a few additional unofficial holidays that often influence data (e.g. Black Friday).