Templates
The following templates are available by default for SQL Rules in Monte Carlo. Users can also create their own, custom templates.
To use the templates below, the user must populate the {{variable}} values. This can either be done by adding variable values in the 'advanced' dropdown of the Define SQL section. Or, they can be manually overwritten in the SQL itself.
By far, the most common threshold for these templates is "alert when row count is greater than 0".
Referential integrity
All IDs in a child table must be present in a parent table.
WITH child
AS (SELECT {{column}} AS id
FROM {{table_1}}),
parent
AS (SELECT {{column}} AS id
FROM {{table_2}})
SELECT *
FROM child
LEFT JOIN parent using (id)
WHERE child.id IS NOT NULL
AND parent.id IS NULL
No duplicates
No duplicates in a given column.
SELECT {{column}}
FROM {{table}}
GROUP BY {{column}}
HAVING Count(*) > 1
Label validation
All values in a column are part of a set of valid labels.
SELECT {{column}}
FROM {{table}}
WHERE {{column}} NOT IN ( '{{value_1}}', '{{value_2}}' )
Null values
There are no records with a NULL value in a given column.
SELECT {{column}}
FROM {{table}}
WHERE {{column}} IS NULL
Empty strings
There are no records with an empty string in a given column.
SELECT {{column}}
FROM {{table}}
WHERE TRIM({{column}}) = ''
Minimum and maximum
There are no values that exceed particular thresholds.
SELECT {{column}}
FROM {{table}}
WHERE NOT( {{column}} > {{value_1}}
AND {{column}} <= {{value_2}} )
Null-like strings
No records contain the string "Null" or "None".
SELECT {{column}}
FROM {{table}}
WHERE Lower({{column}}) LIKE 'null'
OR Lower({{column}}) LIKE 'none'
Row count match
Two tables have the same number of rows.
WITH t1
AS (SELECT Count(*)
FROM {{table_1}}),
t2
AS (SELECT Count(*)
FROM {{table_2}}),
t3
AS (SELECT (SELECT *
FROM t1) - (SELECT *
FROM t2) AS diff_count)
SELECT diff_count
FROM t3
WHERE diff_count != 0
Identical tables
Two tables are identical record for record.
(SELECT *
FROM {{table_1}}
{{except_operator}}
SELECT *
FROM {{table_2}})
UNION ALL
(SELECT *
FROM {{table_2}}
{{except_operator}}
SELECT *
FROM {{table_1}})
The "except_operator" varies per warehouse. On BigQuery, you can use EXCEPT DISTINCT
, while EXCEPT
works for Redshift and Snowflake.
More than one value
A column has at least two distinct values.
SELECT Count(DISTINCT {{column}}) c
FROM {{table}}
HAVING c = 1
Record validation
A SQL expression holds true for all individual records in a table.
SELECT *
FROM {{table}}
WHERE NOT({{expression}})
An example of an expression is {{column_1}} + {{column_2}} = {{value}}
, which would check that the sum of two values is constant.
String patterns
All strings in column match a particular pattern, expressed as a regular expression. Can be used to validate common patterns like UUIDs, phone numbers, emails, numbers, escape characters, dates, etc.
SELECT *
FROM {{table}}
WHERE NOT({{regex_function}}({{column}}, {{pattern}}))
Please note:
{{regex_function}}
will be the appropriate regex matching function for your use case and your data warehouse. For example, to fully match the string with a pattern on Snowflake you would use REGEXP_LIKE, whereas to partially match the string with a pattern on BigQuery, you would use REGEXP_CONTAINS.{{pattern}}
can be any regular expression. For example, to check for US zip codes, you might use^[0-9]{5}(?:-[0-9]{4})?$
.
Updated 4 days ago