The field lineage product (located in Catalog page under Fields section) aims to help users track field level dependencies among tables and reports. Monte Carlos displays 2 types of field relationships:
- SELECT clause lineage: field relationships defined by SQL clause
SELECT; these are field-to-field relationships where a change in upstream field directly changes the downstream field.
- Non SELECT lineage: relationships defined by all other SQL clauses, i.e.
WHERE; these are field-to-table relationships, where the downstream fields are often shaped by a filtering or ordering logic defined by upstream fields.
Please note that in the UI, for a chosen field, its upstream non-SELECT lineage fields are the filtering/ordering fields that result in the chosen field, where as its downstream non-SELECT lineage fields are the resulting fields from the filtering/ordering logic defined by the chosen field.
Here is a video that explains the product.
Currently, there are a number of exceptions to what the field lineage tool covers. However, Monte Carlo is continuously adding coverage for these use cases and this list is shortening every week.
- Wildcard Tables: Field lineages with wildcard tables are not covered.
USINGClause Pair: If the fields defined following
USINGclause do not have prefix, those columns are not covered.
- BI Integrations Field lineage with BI tools are currently being developed and will be available soon.
Below are tips on how to optimize your queries to help Monte Carlo mostly accurately capture the field lineages for your data.
Add table prefixes when
In complex queries with subqueries, adding prefixes to indicate table sources will help strengthen field lineage coverage by Monte Carlo.
select a.col1 as col1, b.col2 as col2 from table1 as a inner join table2 as b on a.id = b.id
Try to avoid:
select col1 as col1, col2 as col2 from table1 as a inner join table2 as b on a.id = b.id
Use alias in
asto define column names as much as possible, especially for columns that contain complex functions/expressions. This will increase accuracy of the field lineage coverage.
select concat(a.name, b.email) as personal_info from table1 as a inner join table2 as b on a.id = b.id
Try to avoid:
select concat(a.name, b.email) from table1 as a inner join table2 as b on a.id = b.id
Whenever possible, name columns instead of using
*. Fields can be especially easily mismatched when multiple
SELECT *exist in a query and columns from different tables have the same names.
select a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name from table1
Try to avoid:
select * from table1 as a inner join table2 as b on a.id = b.id
*used in functions, i.e.
create or replace table final_count as select count(*) as total_count from table1, Monte Carlo would consider every column in table1 as a source to
Updated 5 days ago