dbt Core
Minimum dbt version
This integration requires dbt v0.22+.
Overview
Monte Carlo integrates with dbt by importing the manifest file, run results file, and optionally, a file containing the logs generated by dbt during command execution. This will make dbt metadata available within the Monte Carlo dashboard.
Detailed Guide
You will need to use the Monte Carlo CLI to upload the dbt run artifacts whenever a dbt build
, dbt run
, or dbt test
command is executed in your production workflow. Follow these steps to complete the integration:
- Install the Monte Carlo CLI on the production machine or image that runs dbt
- Create a Monte Carlo API key
- Modify your production dbt CI/CD workflow to run the CLI
Install the Monte Carlo CLI
The Monte Carlo CLI is installed using pip:
pip install -U montecarlodata
You will need to install this pip library on the same image or machine that runs dbt. In most cases, this will involve updating a Dockerfile to update the machine image with this library.
Create a Monte Carlo API key
If you do not already have a Monte Carlo API key, you can create one in the "Settings" tab on the Monte Carlo dashboard. You will need both the key ID and secret.
The key is required to be generated with the "Editor" or "Owner" roles, for example if you create a Service Account Key you need to select "Editors" or "Account Owners" under "Authorization Groups". If you're using a personal key, the user that generated it needs to be an "Editor" or "Owner".
Modify your production dbt CI/CD workflow (⚠️ updated)
Whenever dbt build
, dbt run
, or dbt test
is executed in your production dbt workflow, you will need to run the CLI directly afterwards. This command needs to be run within the same shell session, because when these dbt commands are executed, dbt writes build artifacts to the local filesystem in the target
directory of the dbt project, and the CLI will need direct access to these files.
Note: Whenever a dbt command is executed, files in the
target
directory will be overwritten. So you will need to run the MC CLI command after each dbt command.
The CLI command to run is (⚠️ updated):
% montecarlo import dbt-run --help
Usage: montecarlo import dbt-run [OPTIONS]
Import dbt run artifacts.
Options:
--project-name TEXT Project name (perhaps a logical group of dbt models,
analogous to a project in dbt Cloud) [default:
default-project]
--job-name TEXT Job name (perhaps a logical sequence of dbt
executions, analogous to a job in dbt Cloud)
[required]
--manifest PATH Path to the dbt manifest file (manifest.json)
[required]
--run-results PATH Path to the dbt run results file (run_results.json)
[required]
--logs PATH Path to a file containing dbt run logs
--connection-id UUID Identifier of warehouse or lake connection to use to
resolve dbt models to tables. Required if you have
more than one warehouse or lake connection.
(Connection Id can be found by running the
montecarlo integrations list CLI command)
--help Show this message and exit.
Project
The project name is a descriptive name for your dbt project. It will be associated with all dbt objects imported by Monte Carlo, and can be used to disambiguate results from different dbt projects. If a project name is not provided, a default-project
will be created.
Job
The job name can be used to logically group sets of dbt run results within a project. For example, if you orchestrate subsets of models within a project to be run with independent dbt commands, you could associate each subset/command with a unique job name. You will get a distinct assets page for each job including a gantt view for each run of the job.
If a job name is not provided, a default-job
will be created and all run results will be grouped under the same job.
Connection ID
The Connection ID is an identifier of the warehouse or lake connection that is used to resolve dbt models and tests to assets in Monte Carlo. This is required if you have more than one warehouse or lake connected to Monte Carlo. Connection ID can be found be running the command montecarlo integrations list
in the CLI as shown here
Logs
To send the console logging produced by dbt to Monte Carlo, you will need to first write it to a file. There are several ways to do this, below is an example using the tee
command:
dbt build | tee logs.txt
This command would still produce the normal dbt console output, but also copy that output to logs.txt
. You can then use the --logs
option to send that file to Monte Carlo.
Warning on the tee command and Job Failures
If you are using Airflow to monitor failures, it is important to note that the above command will not show the dbt build job has failed if the logs are successfully written via the tee command. When piping commands like this the exit code of the full pipeline is the exit code of the last command.
If the above warning is a concern you can use the following variation of the command, which will return any failed exit code as the exit code for the full pipeline:
set -o pipefail && dbt build | tee logs.txt
Note: dbt transformations run via Fivetran are currently not supported in this integration.
Python SDK
You can also use
DbtImporter.import_run
in our Python SDK as an alternative to using the CLI.
Airflow
If you are using Airflow to run dbt, you can use the dbt operators in our Airflow provider that include this integration.
Query Tagging to Get Performance Features
Setup in this section enables MC to link your queries to dbt objects to provide additional features from the integration:
- you will be able to filter for dbt jobs and models in the performance dashboard to drill into performance issues for a dbt job or model of interest
- you will be able to identify the dbt job run for each query run from the performance dashboard details drawer
- you will be able to leverage a dbt job gantt view in assets page to identify problematic models and queries in a job, then leverage MC's query-level metadata to investigate what caused the performance problems
See how you can use those features here.
To achieve those, configure your dbt project with the following macro to add additional query comments.
- In your dbt project, create a new file
macros/query_comment.sql
with these contents:
{% macro query_comment(node) %}
{%- set comment_dict = {} -%}
{%- do comment_dict.update(
app='dbt',
dbt_version=dbt_version,
profile_name=target.get('profile_name'),
target_name=target.get('target_name'),
invocation_id=invocation_id,
) -%}
{%- if node is not none -%}
{%- do comment_dict.update(
file=node.original_file_path,
node_id=node.unique_id,
node_name=node.name,
resource_type=node.resource_type,
package_name=node.package_name,
relation={
"database": node.database,
"schema": node.schema,
"identifier": node.identifier
}
) -%}
{% else %}
{%- do comment_dict.update(node_id='internal') -%}
{%- endif -%}
{% do return(tojson(comment_dict)) %}
{% endmacro %}
- Update
dbt_project.yml
to include a query-comment configuration that references the macro you just added in step (1):
query-comment:
comment: "{{ query_comment(node) }}"
append: true # Required for Snowflake
query-comment
The
append: true
property is REQUIRED for Snowflake, which removes leading SQL comments from queries.
Updated 8 months ago