Custom Monitors - SQL Rules

This guide explains how to create SQL Rules in Monte Carlo.

Introduction to SQL rules

With this advanced feature, you can write custom SQL to check for anomalies specific to your business. Given the flexibility of SQL, Monte Carlo customers often use these rules to get notified when thresholds for null values, referential integrity, empty strings, etc. are breached.

A list of common use cases and associated SQL templates can be found below .

To view current SQL Rules and create new rules, visit the Monitors tab in the Monte Carlo app and click on the Rules bar.

When creating a new SQL rule, keep the following in mind:

  • The SQL query must be a valid expression that can be run as a subquery. To avoid pulling data from your warehouse, Monte Carlo will wrap your expression so it only fetches the number of rows returned. This means rules should be written such that the query returns the anomalous rows.
  • The SQL query must take less than 15 minutes to run. We recommend filtering on partitions and indexes to minimize query time.
  • When testing the SQL query from the app, it must take less than 30 seconds to run. You can add a limit to reduce query time, or test the query in your warehouse.

Setting up a SQL Rule

To create a rule in Monte Carlo, please visit the monitors page, select Rules, and follow the wizard.

Common SQL rules

The following patterns cover some common scenarios you can address with SQL rules.

📘

How should I use the templates in this article?

  1. Replace any {{table}} and {{column}} placeholders with the table and column names you want to test for. {{value}} placeholders should be replaced with thresholds that makes sense for your domain. The query should return only rows you would consider anomalous, or breaking the rule.
  2. Set the rule to alert when row count is greater than 0.

🚧

Please consider the volume of data you will be processing in your rule

To prevent expensive queries for large tables, we recommend filtering records on a partition (e.g. rows created in the last day).

Referential integrity
All IDs in a child table must be present in a parent table.

WITH child 
     AS (SELECT {{column}} AS id 
         FROM   {{table_1}}), 
     parent 
     AS (SELECT {{column}} AS id 
         FROM   {{table_2}}) 
SELECT * 
FROM   child 
       LEFT JOIN parent using (id) 
WHERE  child.id IS NOT NULL 
       AND parent.id IS NULL

No duplicates
No duplicates in a given column.

SELECT {{column}}
FROM   {{table}}
GROUP  BY {{column}} 
HAVING Count(*) > 1

Label validation
All values in a column are part of a set of valid labels.

SELECT {{column}} 
FROM   {{table}} 
WHERE  {{column}} NOT IN ( '{{value_1}}', '{{value_2}}' )

Null values
There are no records with a NULL value in a given column.

SELECT {{column}} 
FROM   {{table}} 
WHERE  {{column}} IS NULL

Empty strings
There are no records with an empty string in a given column.

SELECT {{column}} 
FROM   {{table}} 
WHERE  TRIM({{column}}) = ''

Minimum and maximum
There are no values that exceed particular thresholds.

SELECT {{column}} 
FROM   {{table}} 
WHERE  NOT( {{column}} > {{value_1}} 
            AND {{column}} <= {{value_2}} )

Null-like strings
No records contain the string "Null" or "None".

SELECT {{column}} 
FROM   {{table}}
WHERE  Lower({{column}}) LIKE 'null' 
        OR Lower({{column}}) LIKE 'none'

Row count match
Two tables have the same number of rows.

WITH t1 
     AS (SELECT Count(*) 
         FROM   {{table_1}}), 
     t2 
     AS (SELECT Count(*) 
         FROM   {{table_2}}), 
     t3 
     AS (SELECT (SELECT * 
                 FROM   t1) - (SELECT * 
                               FROM   t2) AS diff_count) 
SELECT diff_count 
FROM   t3 
WHERE  diff_count != 0

Identical tables
Two tables are identical record for record.

(SELECT * 
 FROM   {{table_1}} 
 {{except_operator}} 
 SELECT * 
 FROM   {{table_2}}) 
UNION ALL 
(SELECT * 
 FROM   {{table_2}} 
 {{except_operator}}  
 SELECT * 
 FROM   {{table_1}})

📘

What should I use for the except operator?

This operator varies per warehouse.

On BigQuery you can use EXCEPT DISTINCT, while EXCEPT works for Redshift and Snowflake.

More than one value
A column has at least two distinct values.

SELECT Count(DISTINCT {{column}}) c 
FROM   {{table}}
HAVING c = 1

Record validation
A SQL expression holds true for all individual records in a table.

SELECT * 
FROM   {{table}} 
WHERE  NOT({{expression}})

📘

What is an example of an expression?

For instance, you can check that the sum of two columns is constant
{{expression}} = {{column_1}} + {{column_2}} = {{value}}

String patterns
All strings in column match a particular pattern, expressed as a regular expression. Can be used to validate common patterns like UUIDs, phone numbers, emails, numbers, escape characters, dates, etc.

Please note:

  • {{regex_function}} will be the appropriate regex matching function for your use case and your data warehouse. For example, to fully match the string with a pattern on Snowflake you would use REGEXP_LIKE, whereas to partially match the string with a pattern on BigQuery, you would use REGEXP_CONTAINS.
  • {{pattern}} can be any regular expression. For example, to check for US zip codes, you might use ^[0-9]{5}(?:-[0-9]{4})?$.
SELECT * 
FROM   {{table}} 
WHERE  NOT({{regex_function}}({{column}}, {{pattern}}))

Did this page help you?