dbt Integration
What is dbt?
dbt (data build tool) is a tool that enables data analysts and engineers to transform data in their warehouses more effectively. It is the t (transform) in ELT (Extract, Load, Transform).
dbt lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.
dbt comes in two flavors. dbt Core is an open source command line tool where scheduling jobs relies on external tools like Airflow, CI or Github. dbt Cloud is a managed service that allows more advanced scheduling right from the web UI. Both Core and Cloud are supported by Monte Carlo.
Why connect dbt to Monte Carlo?
Monte Carlo centralizes context on dbt jobs, models and their associated tables on a single pane of glass. dbt context is overlaid on lineage graph to help you troubleshoot issues and evaluate impact of failures. Simply toggle on "show dbt status" to see latest model run status and timestamps. Error statuses will show by default without toggling needed.
If a downstream table was updated by a dbt job referencing an upstream table, the dbt job will be displayed on edge. Hover over or click on the icon to get detail info about the job.
Tags, descriptions, meta configs are imported from dbt into Monte Carlo.
Here you can see an overview of the model. Clicking the "View Model" in the upper right corner will display the SQL that defines it. We can also see a run history of the model. The graph visualizes the execution time of mode runs from the time period selected for the assets page. You can filter the runs based on run statuses: success, error, skipped. You also can toggle to the "Test Runs" tab to see an overview of dbt tests defined for the model, as well as a similar run history for all the tests.
You can also leverage this integration to centralize all data incidents, including dbt model errors and test failures, in one place. Monte Carlo allows you to generate incidents based on dbt model errors and test errors . You can detect, triage, investigate, and analyze dbt failures all within Monte Carlo.
Clicking on "Investigate Job" will land you on a job's view in Assets. You can analyze a dbt run in a waterfall view that allows you to easily identify bottlenecks, so you can quickly determine whether you need to optimize a model.
Setting Up dbt
The following guide explains how to set up the integration.If you are using dbt Core, this guide will help you through set up. We are able to build our dbt Core integration into most existing CI/CD pipelines with pycarlo as well as Airflow.
If you are using dbt Cloud, this guide will help you through set up.
Updated 4 months ago