Azure Data Factory (public preview)

With the Monte Carlo Azure Data Factory (ADF) Integration, you can quickly determine which ADF pipeline potentially caused an anomaly downstream, accelerating your time to resolution. You will also be able to manage your ADF pipeline failure alerts along with all other data quality alerts in Monte Carlo, so you have centralized incident triage, notification routing, and data quality reporting across all your data and system issues.

I. ADF Pipeline in Lineage & ADF Pipeline as Asset

ADF Pipeline in Lineage

The integration generates lineage from ADF pipeline that takes data from point A to point B, and it helps you easily understand visually which ADF pipeline creates the lineage between two tables. Click into the ADF icon to see recent runs for this pipeline and activity.

ADF Pipeline as Asset: Run History and Run Details

Use Monte Carlo as a single pane of glass for all data quality context for your stack, including ADF pipeline run results. On the asset summary page of a table updated by a ADF pipeline, see a module for "ADF pipeline runs" for the pipeline runs history, which helps you understand how the pipeline runs correlate with the freshness, volume and other aspects of your table.

Go to the asset page for a ADF pipeline, where you can check the status and duration of its recent runs. For each pipeline run, you can also view the dependencies among the activities that ran as part of the pipeline.

How to Set Up

  1. Create Azure Service Principal with Reader access to your factory
  2. Create the ADF integration at https://getmontecarlo.com/settings/integrations

Create Azure Service Principal with Reader access to your factory

In order to obtain ADF pipeline metadata, Monte Carlo needs service principal credentials which have Reader access to your ADF factory.

  1. Create a new Azure App Registration by visiting the Azure Portal: https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps/ApplicationsListBlade
  1. Copy the following values from your App Registration:
    1. Tenant ID
    2. Client ID
  1. Create a new Secret in your new App Registration
    1. Click "Manage", "Certificates & secrets", "New client secret"
  1. Copy the "value" of your Client Secret (note: this secret can only be viewed once!)
  2. From the Azure Portal, search for your factory and copy the following values:
    1. Resource Group name
    2. Subscription ID
  1. On this same page, select Access control (IAM) and create a new Role Assignment
  1. Choose the "Reader" role
  1. Select your newly created App Registration and save the Role Assignment

Congrats! You've created a new service principal with read access to your factory. The next step is to provide these credentials to Monte Carlo.

Create the ADF integration in Monte Carlo

To create the Azure Data Factory integration in Monte Carlo, start by logging into your account and visiting the Integrations page: https://getmontecarlo.com/settings/integrations

  1. From the Orchestration section, select Create followed by Azure Data Factory
  1. Complete the form providing the credentials copied from the previous steps above.
  1. Click Add and Monte Carlo will verify your credentials have the required access to the ADF REST APIs.
  1. Click Continue to save the integration

Congrats! You now have an Azure Data Factory integration added to your Monte Carlo account. Please allow 24 hours for the integration to collect your pipeline metadata and lineage.

II. ADF Pipeline Failures in MC

Monte Carlo allows you to surface ADF pipeline failures as Monte Carlo alerts. Among other things, this will enable you to:

  • Route and receive notifications similar to other Monte Carlo alerts
  • Analyze the downstream impact of those alerts
  • Create holistic incident reporting and tooling for all data issues

How to Set Up:

  1. Configure an Azure Alert using a Monte Carlo webhook from your ADF integration.
  2. Select which pipelines should produce Monte Carlo alerts.
  3. Create an audience in Monte Carlo to receive ADF pipeline failure alerts.

1. Configure Azure Alert with Monte Carlo webhook

To route ADF pipeline failures to Monte Carlo you will need to configure a new Azure Alert which invokes a Monte Carlo webhook.

  1. From Monte Carlo's Settings -> Integrations page, click the "Manage webhook" item from your ADF integration menu.
  1. Copy the webhook url provided by Monte Carlo.
  1. Login to the Azure Portal and visit the Alerts section of the Monitor page. From there, create a new Action Group.
  1. After configuring the Action Group with the required fields in the Basics tab, move forward to the Actions tab and select "Webhook".
    1. Paste the webhook url provided by Monte Carlo.
    2. Select "Yes" to "Enable the common alert schema".
    3. Review & Create the Action Group.
  1. Next, create the Alert Rule from the Alerts section of the Monitor page in the Azure Portal.
  1. On the Scope tab, select your Azure Data Factory from the Azure resource panel.
  1. On the Condition tab, make sure to select the following items. Use the defaults for all other values.
    1. Signal Name should be: "Failed pipeline run metrics".
    2. Dimension name should be: "Pipeline".
    3. Select all pipelines in the "Dimension values" dropdown.
    4. Check the "Include all future values" checkbox.
  1. On the Actions tab, select the Action Group you previously created.
  2. On the Details tab, complete the required fields. Review and Create the Alert Rule.

Congrats! You've configured an Azure Alert which will invoke the Monte Carlo webhook when your ADF pipelines fail.

2. Select which pipelines should produce Monte Carlo alerts

  1. From the Monte Carlo Settings -> Integrations page, click the "Configure pipelines" item from your ADF integration menu.
  1. Enable the "Generates alerts" toggle for each applicable pipeline.

Congrats! You've enabled Monte Carlo to produce alerts for your ADF pipeline failures.

3. Create an audience for ADF pipeline failures

Now that you have enabled some ADF pipelines to produce Monte Carlo alerts you will need to create an Audience to route these alerts to the proper notification channels.

  1. Create an audience that includes an "Other Notification".

  1. Select "ADF pipeline failures" as the alert type.
  2. Under "Affected Data", either select "All" to send all Data Factory alerts to the audience, or select "Databases, schemas, tables, jobs, and tags" and add the ADF pipelines of interest to include in this audience.

Congrats! Now any pipeline failures from the selected ADF pipelines will notify the configured audience. This completes the required steps for configuring your ADF integration in Monte Carlo to produce alerts for ADF pipeline failures.

III. FAQs

Are multiple Azure Data Factory factories supported?

Yes! Repeat the Monte Carlo integration onboarding at https://getmontecarlo.com/settings/integrations for each of your factories.

How long does it take for Azure Data Factory data to show up in Monte Carlo?

Pipeline and activity run data will be immediately available in the Asset Page and Incidents. However, ADF data in Lineage and the Catalog may have delays of up to 24 hours due to batch processing.