The following templates are available by default for SQL Rules in Monte Carlo. Users can also create their own, custom templates.

To use the templates below, the user must populate the {{variable}} values. This can either be done by adding variable values in the 'advanced' dropdown of the Define SQL section. Or, they can be manually overwritten in the SQL itself.

By far, the most common threshold for these templates is "alert when row count is greater than 0".


Referential integrity

All IDs in a child table must be present in a parent table.

WITH child 
     AS (SELECT {{column}} AS id 
         FROM   {{table_1}}), 
     parent 
     AS (SELECT {{column}} AS id 
         FROM   {{table_2}}) 
SELECT * 
FROM   child 
       LEFT JOIN parent using (id) 
WHERE  child.id IS NOT NULL 
       AND parent.id IS NULL

No duplicates

No duplicates in a given column.

SELECT {{column}}
FROM   {{table}}
GROUP  BY {{column}} 
HAVING Count(*) > 1

Label validation

All values in a column are part of a set of valid labels.

SELECT {{column}} 
FROM   {{table}} 
WHERE  {{column}} NOT IN ( '{{value_1}}', '{{value_2}}' )

Null values

There are no records with a NULL value in a given column.

SELECT {{column}} 
FROM   {{table}} 
WHERE  {{column}} IS NULL

Empty strings

There are no records with an empty string in a given column.

SELECT {{column}} 
FROM   {{table}} 
WHERE  TRIM({{column}}) = ''

Minimum and maximum

There are no values that exceed particular thresholds.

SELECT {{column}} 
FROM   {{table}} 
WHERE  NOT( {{column}} > {{value_1}} 
            AND {{column}} <= {{value_2}} )

Null-like strings

No records contain the string "Null" or "None".

SELECT {{column}} 
FROM   {{table}}
WHERE  Lower({{column}}) LIKE 'null' 
        OR Lower({{column}}) LIKE 'none'

Row count match

Two tables have the same number of rows.

WITH t1 
     AS (SELECT Count(*) 
         FROM   {{table_1}}), 
     t2 
     AS (SELECT Count(*) 
         FROM   {{table_2}}), 
     t3 
     AS (SELECT (SELECT * 
                 FROM   t1) - (SELECT * 
                               FROM   t2) AS diff_count) 
SELECT diff_count 
FROM   t3 
WHERE  diff_count != 0

Identical tables

Two tables are identical record for record.

(SELECT * 
 FROM   {{table_1}} 
 {{except_operator}} 
 SELECT * 
 FROM   {{table_2}}) 
UNION ALL 
(SELECT * 
 FROM   {{table_2}} 
 {{except_operator}}  
 SELECT * 
 FROM   {{table_1}})

The "except_operator" varies per warehouse. On BigQuery, you can use EXCEPT DISTINCT, while EXCEPT works for Redshift and Snowflake.


More than one value

A column has at least two distinct values.

SELECT Count(DISTINCT {{column}}) c 
FROM   {{table}}
HAVING c = 1

Record validation

A SQL expression holds true for all individual records in a table.

SELECT * 
FROM   {{table}} 
WHERE  NOT({{expression}})

An example of an expression is {{column_1}} + {{column_2}} = {{value}}, which would check that the sum of two values is constant.


String patterns

All strings in column match a particular pattern, expressed as a regular expression. Can be used to validate common patterns like UUIDs, phone numbers, emails, numbers, escape characters, dates, etc.

SELECT * 
FROM   {{table}} 
WHERE  NOT({{regex_function}}({{column}}, {{pattern}}))

Please note:

  • {{regex_function}} will be the appropriate regex matching function for your use case and your data warehouse. For example, to fully match the string with a pattern on Snowflake you would use REGEXP_LIKE, whereas to partially match the string with a pattern on BigQuery, you would use REGEXP_CONTAINS.
  • {{pattern}} can be any regular expression. For example, to check for US zip codes, you might use ^[0-9]{5}(?:-[0-9]{4})?$.