dbt Integration

Overview

Monte Carlo's dbt integration imports metadata about dbt models and runs into the Monte Carlo dashboard. The following guide explains how to set up the integration.

Monte Carlo integrates with dbt by importing the manifest file and run results file. This will make dbt metadata available within the Monte Carlo dashboard.

Performing a one-time import of dbt metadata

The Monte Carlo CLI can be used to perform a one-time import of dbt metadata.

Prerequisites

  1. Install the CLI — https://docs.getmontecarlo.com/docs/using-the-cli
  2. When running montecarlo configure, provide your API key and you may leave the AWS settings blank

One-time import with standalone dbt

When executing dbt compile or dbt run, dbt writes a manifest.json file into the target directory. The CLI command reads this file, and sends data to Monte Carlo.

To import the manifest:

    montecarlo import dbt-manifest \
             target/manifest.json --project-name <project name>

The project name can be set to the name of the dbt project.

Specifically, the CLI will import descriptions and tags defined under models in schema.yml as tags for tables and fields in Monte Carlo.

To import run results:

    montecarlo import dbt-run-results \
             target/run_results.json --project-name <project name>

One-time import with dbt Cloud

When a dbt run has completed in dbt Cloud, the manifest.json and run_results.json build artifacts are available to download via dbt Cloud's API. The CLI will download these artifacts and import to Monte Carlo.

The following command will import the latest run from every dbt project and job available in the specified dbt Cloud account:

DBT_CLOUD_API_TOKEN=<dbt cloud API token here> \
    DBT_CLOUD_ACCOUNT_ID=<dbt cloud account ID here> \
        montecarlo import dbt-cloud
  • The dbt Cloud API key can be generated via the API Setting page in your dbt Cloud profile.
  • The dbt Cloud account ID can be found within the URL when navigating to any project in the dbt Cloud dashboard -- https://cloud.getdbt.com/#/accounts/<account ID here>/...

Additionally, the montecarlo import dbt-cloud command can accept these additional arguments:

  • --project-id: To limit import to a specific dbt Cloud project ID
  • --job-id: To limit import to a specific dbt Cloud job ID
  • --manifest-only: To import only manifest, and not run results

Performing periodic import of dbt metadata

To ensure that the latest dbt metadata is imported into Monte Carlo, you can perform the import on an ongoing basis.

Periodic import with standalone dbt

The CLI command must be run after your dbt pipeline is run, and also must have access to dbt's output build artifacts in the target directory. In most cases, this will involve installing the CLI on the same machine or container that runs dbt in your production environment, and executing the CLI import commands.

Periodic import with dbt Cloud

In order to perform a periodic import with dbt Cloud, you will need an external scheduler (e.g. Airflow, AWS lambda, Cron) to run the import.

If you are using Airflow, you can use the Monte Carlo Python SDK to perform the import within a custom Airflow operator:

  1. Add pycarlo==0.0.3 to your Airflow instance's requirements.txt
  2. Create a PythonOperator that executes this code:
from pycarlo.core import Client, Session
from pycarlo.features.dbt import DbtCloudImporter, DbtCloudClient

### It's recommended that you populate these variables using Airflow Connections
mcd_api_id = <MC API ID>
mcd_token = <MC API token>
dbt_cloud_token = <dbt Cloud API token>
dbt_cloud_account_id = <dbt Cloud account ID>
dbt_cloud_project_id = <dbt Cloud project ID, can set to None if all projects>
dbt_cloud_job_id = <dbt Cloud job ID, can set to None if all jobs>

dbt_cloud_client = DbtCloudClient(
    dbt_cloud_api_token=dbt_cloud_token,
    dbt_cloud_account_id=dbt_cloud_account_id
)
dbt_cloud_importer = DbtCloudImporter(
    dbt_cloud_client=dbt_cloud_client,
    client=Client(Session(
        mcd_id=mcd_api_id,
        mcd_token=mcd_token
    )),
    print_func=print
)

dbt_cloud_importer.import_dbt_cloud(
    project_id=dbt_cloud_project_id,
    job_id=dbt_cloud_job_id
)

It's best to schedule this import task directly after the dbt jobs are run.


Did this page help you?