Azure Data Factory (Beta)

With the Monte Carlo Azure Data Factory (ADF) Integration, you can quickly determine which ADF pipeline potentially caused an anomaly downstream, accelerating your time to resolution. You will also be able to manage your ADF pipeline failure alerts along with all other data quality alerts in Monte Carlo, so you have centralized incident triage, notification routing, and data quality reporting across all your data and system issues.

I. ADF Pipeline in Lineage & ADF Pipeline as Asset

ADF Pipeline in Lineage

The integration generates lineage from ADF pipeline that takes data from point A to point B, and it helps you easily understand visually which ADF pipeline creates the lineage between two tables. Click into the ADF icon to see recent runs for this pipeline and activity.

ADF Pipeline as Asset: Run History and Run Details

Use Monte Carlo as a single pane of glass for all data quality context for your stack, including ADF pipeline run results. On the asset summary page of a table updated by a ADF pipeline, see a module for "ADF pipeline runs" for the pipeline runs history, which helps you understand how the pipeline runs correlate with the freshness, volume and other aspects of your table.

Go to the asset page for a ADF pipeline, where you can check the status and duration of its recent runs. For each pipeline run, you can also view the dependencies among the activities that ran as part of the pipeline.

How to Set Up

  1. Create Azure Service Principal with Reader access to your factory
  2. Create the ADF integration at https://getmontecarlo.com/settings/integrations

Create Azure Service Principal with Reader access to your factory

In order to obtain ADF pipeline metadata, Monte Carlo needs service principal credentials which have Reader access to your ADF factory.

  1. Create a new Azure App Registration by visiting the Azure Portal: https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps/ApplicationsListBlade
  1. Copy the following values from your App Registration:
    1. Tenant ID
    2. Client ID
  1. Create a new Secret in your new App Registration
    1. Click "Manage", "Certificates & secrets", "New client secret"
  1. Copy the "value" of your Client Secret (note: this secret can only be viewed once!)
  2. From the Azure Portal, search for your factory and copy the following values:
    1. Resource Group name
    2. Subscription ID
  1. On this same page, select Access control (IAM) and create a new Role Assignment
  1. Choose the "Reader" role
  1. Select your newly created App Registration and save the Role Assignment

Congrats! You've created a new service principal with read access to your factory. The next step is to provide these credentials to Monte Carlo.

Create the ADF integration in Monte Carlo

To create the Azure Data Factory integration in Monte Carlo, start by logging into your account and visiting the Integrations page: https://getmontecarlo.com/settings/integrations

  1. From the Orchestration section, select Create followed by Azure Data Factory
  1. Complete the form providing the credentials copied from the previous steps above.
  1. Click Add and Monte Carlo will verify your credentials have the required access to the ADF REST APIs.
  1. Click Continue to save the integration

Congrats! You now have an Azure Data Factory integration added to your Monte Carlo account. Please allow 24 hours for the integration to collect your pipeline metadata and lineage.


II. ADF Pipeline Failures in MC

This is currently in the works and will be available over the next few weeks.



III. FAQs

Are multiple Azure Data Factory factories supported?

Yes! Repeat the Monte Carlo integration onboarding at https://getmontecarlo.com/settings/integrations for each of your factories.

How long does it take for Azure Data Factory data to show up in Monte Carlo?

Pipeline and activity run data will be immediately available in the Asset Page and Incidents. However, ADF data in Lineage and the Catalog may have delays of up to 24 hours due to batch processing.