Field Lineage
Overview
The field lineage product (located in Catalog page under Fields section) aims to help users track field level dependencies among tables and reports. Monte Carlos displays 2 types of field relationships:
- SELECT clause lineage: field relationships defined by SQL clause
SELECT
; these are field-to-field relationships where a change in upstream field directly changes the downstream field. - Non SELECT lineage: relationships defined by all other SQL clauses, i.e.
WHERE
; these are field-to-table relationships, where the downstream fields are often shaped by a filtering or ordering logic defined by upstream fields.
Please note that in the UI, for a chosen field, its upstream non-SELECT lineage fields are the filtering/ordering fields that result in the chosen field, where as its downstream non-SELECT lineage fields are the resulting fields from the filtering/ordering logic defined by the chosen field.
Here is a video that explains the product.
Current Limitations
Currently, there are a number of exceptions to what the field lineage tool covers. However, Monte Carlo is continuously adding coverage for these use cases and this list is shortening every week.
-
Wildcard Tables:
Field lineages with wildcard tables are not covered. -
JOIN
+USING
Clause Pair:
If the fields defined followingUSING
clause do not have prefix, those columns are not covered. -
BI Integrations:
To enable Tableau field level lineage, please make sure your data-collector's version is equals to or above v2530. If Monte Carlo hosts your data collector, please reach out to your Monte Carlo representative for help. -
Nested Fields:
Currently, nested fields are not supported in field lineage.
Field lineage with other BI tools are currently being developed and will be available soon.
Tips to Optimize Queries for Field Lineage
Monte Carlo will be able to parse majority of your query cases. Nevertheless, following the suggestions below will help Monte Carlo most accurately capture your field lineages even in the edge cases.
-
Add table prefixes when
SELECT
ing columns.
In complex queries with subqueries, adding prefixes to indicate table sources will help strengthen field lineage coverage by Monte Carlo.
Preferred:select a.col1 as col1, b.col2 as col2 from table1 as a inner join table2 as b on a.id = b.id
Try to avoid:
select col1 as col1, col2 as col2 from table1 as a inner join table2 as b on a.id = b.id
-
Use alias in
SELECT
ed columns.Use
as
to define column names as much as possible, especially for columns that contain complex functions/expressions. This will increase accuracy of the field lineage coverage.Preferred:
select concat(a.name, b.email) as personal_info from table1 as a inner join table2 as b on a.id = b.id
Try to avoid:
select concat(a.name, b.email) from table1 as a inner join table2 as b on a.id = b.id
-
Specify column names when possible, instead of
SELECT *
.Monte Carlo SQL parser could understand most of the complex nested
SELECT *
situations. Whenever possible, name columns instead of using*
. Fields can be especially easily mismatched when multipleSELECT *
exist in a query and columns from different tables have the same names.Preferred:
select a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name from table1
Try to avoid:
select * from table1 as a inner join table2 as b on a.id = b.id
Note for
*
used in functions, i.e.create or replace table final_count as select count(*) as total_count from table1
, Monte Carlo would consider every column in table1 as a source tototal_count
column infinal_count
table.
Updated about 1 year ago