πŸ“˜

Minimum dbt version

This integration requires dbt v0.22+.

Overview

Monte Carlo integrates with dbt by importing the manifest file, run results file, and optionally, a file containing the logs generated by dbt during command execution. This will make dbt metadata available within the Monte Carlo dashboard.

Detailed Guide

You will need to use the Monte Carlo CLI to upload the dbt run artifacts whenever a dbt build, dbt run, or dbt test command is executed in your production workflow. Follow these steps to complete the integration:

  1. Install the Monte Carlo CLI on the production machine or image that runs dbt
  2. Create a Monte Carlo API key
  3. Modify your production dbt CI/CD workflow to run the CLI

Install the Monte Carlo CLI

The Monte Carlo CLI is installed using pip:

pip install -U montecarlodata

You will need to install this pip library on the same image or machine that runs dbt. In most cases, this will involve updating a Dockerfile to update the machine image with this library.

Create a Monte Carlo API key

If you do not already have a Monte Carlo API key, you can create one in the "Settings" tab on the Monte Carlo dashboard. You will need both the key ID and secret.
The key is required to be generated with the "Editor" or "Owner" roles, for example if you create a Service Account Key you need to select "Editors" or "Account Owners" under "Authorization Groups". If you're using a personal key, the user that generated it needs to be an "Editor" or "Owner".

Modify your production dbt CI/CD workflow (⚠️ updated)

Whenever dbt build, dbt run, or dbt test is executed in your production dbt workflow, you will need to run the CLI directly afterwards. This command needs to be run within the same shell session, because when these dbt commands are executed, dbt writes build artifacts to the local filesystem in the target directory of the dbt project, and the CLI will need direct access to these files.

🚧

Note: Whenever a dbt command is executed, files in the target directory will be overwritten. So you will need to run the MC CLI command after each dbt command.

The CLI command to run is (⚠️ updated):

% montecarlo import dbt-run --help
Usage: montecarlo import dbt-run [OPTIONS]

  Import dbt run artifacts.

Options:
  --project-name TEXT   Project name (perhaps a logical group of dbt models,
                        analogous to a project in dbt Cloud)  [default:
                        default-project]
  --job-name TEXT       Job name (perhaps a logical sequence of dbt
                        executions, analogous to a job in dbt Cloud)
                        [default: default-job]
  --manifest PATH       Path to the dbt manifest file (manifest.json)
                        [required]
  --run-results PATH    Path to the dbt run results file (run_results.json)
                        [required]
  --logs PATH           Path to a file containing dbt run logs
  --connection-id UUID  Identifier of warehouse or lake connection to use to
                        resolve dbt models to tables. Required if you have
                        more than one warehouse or lake connection.
                        (Connection Id can be found by running the 
                        montecarlo integrations list CLI command)
  --help                Show this message and exit.

Project

The project name is a descriptive name for your dbt project. It will be associated with all dbt objects imported by Monte Carlo, and can be used to disambiguate results from different dbt projects. If a project name is not provided, a default-project will be created.

Job

The job name can be used to logically group sets of dbt run results within a project. For example, if you orchestrate subsets of models within a project to be run with independent dbt commands, you could associate each subset/command with a unique job name. If a job name is not provided, a default-job will be created.

Connection ID

The Connection ID is an identifier of the warehouse or lake connection that is used to resolve dbt models and tests to assets in Monte Carlo. This is required if you have more than one warehouse or lake connected to Monte Carlo. Connection ID can be found be running the command montecarlo integrations listin the CLI as shown here

Logs

To send the console logging produced by dbt to Monte Carlo, you will need to first write it to a file. There are several ways to do this, below is an example using the tee command:

dbt build | tee logs.txt

This command would still produce the normal dbt console output, but also copy that output to logs.txt. You can then use the --logs option to send that file to Monte Carlo.

❗️

Warning on the tee command and Job Failures

If you are using Airflow to monitor failures, it is important to note that the above command will not show the dbt build job has failed if the logs are successfully written via the tee command. When piping commands like this the exit code of the full pipeline is the exit code of the last command.

If the above warning is a concern you can use the following variation of the command, which will return any failed exit code as the exit code for the full pipeline:

set -o pipefail && dbt build | tee logs.txt

Note: dbt transformations run via Fivetran are currently not supported in this integration.

πŸ“˜

Python SDK

You can also use DbtImporter.import_run in our Python SDK as an alternative to using the CLI.

πŸ“˜

Airflow

If you are using Airflow to run dbt, you can use the dbt operators in our Airflow provider that include this integration.

Query Tagging to Get Performance Features

Setup in this section enables MC to link your queries to dbt objects to provide additional features from the integration:

  • you will be able to filter for dbt jobs and models in the performance dashboard to drill into performance issues for a dbt job or model of interest
  • you will be able to identify the dbt job run for each query run from the performance dashboard details drawer
  • you will be able to leverage a dbt job gantt view in assets page to identify problematic models and queries in a job, then leverage MC's query-level metadata to investigate what caused the performance problems

See how you can use those features here.

To achieve those, configure your dbt project with the following macro to add additional query comments.

  1. In your dbt project, create a new file macros/query_comment.sql with these contents:
{% macro query_comment(node) %}
    {%- set comment_dict = {} -%}
    {%- do comment_dict.update(
        app='dbt',
        dbt_version=dbt_version,
        profile_name=target.get('profile_name'),
        target_name=target.get('target_name'),
        invocation_id=invocation_id,
    ) -%}
    {%- if node is not none -%}
      {%- do comment_dict.update(
        file=node.original_file_path,
        node_id=node.unique_id,
        node_name=node.name,
        resource_type=node.resource_type,
        package_name=node.package_name,
        relation={
            "database": node.database,
            "schema": node.schema,
            "identifier": node.identifier
        }
      ) -%}
    {% else %}
      {%- do comment_dict.update(node_id='internal') -%}
    {%- endif -%}
    {% do return(tojson(comment_dict)) %}
{% endmacro %}
  1. Update dbt_project.yml to include a query-comment configuration that references the macro you just added in step (1):
query-comment:
	comment: "{{ query_comment(node) }}"
	append: true # Required for Snowflake

πŸ“˜

query-comment

The append: true property is REQUIRED for Snowflake, which removes leading SQL comments from queries.