Additional Support for ADF in Lineage

Monte Carlo also offers query tagging as an additional way to surface ADF pipeline info in lineage. This is needed in cases where a custom SQL query is executed as part of an ADF pipeline. Examples of this include use of the Script Activity or Stored Procedure Activity. In those cases Monte Carlo can't provide ADF info in lineage without the queries being tagged as described below.

📘

ADF Connection Required for ADF in Lineage

An ADF connection in Monte Carlo is necessary to display ADF lineage.

How to tag a query

Monte Carlo will look for the following tags in queries (uppercase or lowercase can be used):
adf_pipeline_name, adf_activity_name.

Tags are added to a query using a special comment. The comment should be a JSON formatted object with the tag names as keys. Example:

-- {"adf_pipeline_name": "my-pipeline", "adf_activity_name": "my-activity"}

INSERT INTO PUBLIC.MY_TABLE ... ;

In the case when Script Activity is used, the query is written in ADF and therefore the tags are added to the query in ADF. If the Stored Procedure Activity is used, the query is written in the target data warehouse, and so the tags need to be added to the stored procedure in the data warehouse system.

Note: if tags are added to a stored procedure, they will be included in all executions of the stored procedure, regardless of whether it was actually triggered by ADF or not. It is therefore recommended to only add ADF tags to stored procedures that are intended to only be executed by ADF.

If the query is written in ADF, the name of the pipeline can be retrieved from the System Variable pipeline().Pipeline instead of hard-coding it. Example:

-- {"adf_pipeline_name": "@{pipeline().Pipeline}", "adf_activity_name": "my-activity"}

INSERT INTO PUBLIC.MY_TABLE ... ;

ADF will replace @{pipeline().Pipeline} with the name of the pipeline before sending the query to the target system.

Snowflake

If the target system is Snowflake and the query is written in ADF (using the Script Activity), then comments are removed before the query is executed in Snowflake and the tags will be lost. In this case, the tags can be passed as Snowflake Query Tags instead of comments. Example:

alter session set query_tag='{"adf_pipeline_name": "@{pipeline().Pipeline}", "adf_activity_name": "my-activity"}';

INSERT INTO PUBLIC.MY_TABLE ... ;

Multiple ADF Integrations

If you have multiple Azure Data Factory integrations, you may need to include an additional query tag to specify the integration. This is only required in the situation where there are shared Pipeline or Activity names between different ADF integrations.

Monte Carlo will look for the following tags to identify the integration (lowercase or uppercase can be used): adf_environment, adf_env.

The value of the environment tag must be the name of the integration in Monte Carlo.
Example:

-- {"adf_environment_name": "ADF-Prod", "adf_pipeline_name": "@{pipeline().Pipeline}", "adf_activity_name": "my-activity"}

INSERT INTO PUBLIC.MY_TABLE ... ;