dbt Cloud

πŸ“˜

Prerequisites

  • requires admin privileges for the dbt Cloud account that hosts and runs your production dbt models
  • requires Data Collector v14099+

Obtain a dbt Cloud API token and account ID

Please follow the instructions in dbt Cloud's API documentation to create a dbt Cloud API token. This token will be needed when creating the integration with the Monte Carlo CLI. The token should have read-only access.

To obtain your dbt Cloud account ID, sign into dbt Cloud in your browser. Take note of the number directly following the accounts path component of the URL -- this is your account ID. For example, if the URL is https://cloud.getdbt.com/#/accounts/1234/projects/6789/dashboard/, the account ID is 1234.

dbt Cloud Webhooks

Monte Carlo integrates with dbt Cloud using the --webhook-hmac-secret and --webhook-id options. This provides real-time notifications that dbt Cloud jobs have completed and run results are available for collection by Monte Carlo.

  1. Create an outbound webhook in the Account Settings view in dbt Cloud:
    1. Endpoint URL: https://integrations.getmontecarlo.com/webhooks/dbt-cloud
    2. Events: Only job.run.completed is supported by Monte Carlo.

To Be Deprecated: dbt Cloud integration without webhooks

Monte Carlo now requires all new integrations with dbt Cloud to use webhooks. Previously configured integrations need to be migrated by May 31st, 2024.

Create a dbt Cloud integration through the Monte Carlo UI

You can set up the dbt cloud integration all through the UI.

  1. From the integration tab on the settings page scroll down to the ETL Tool Connection section and click the Create dropdown and select DBT

  1. Fill out the fields in the form. Add the webhook related fields to ensure the timeliness of your dbt data in MC. See the section above on how to create the webhook.

Create a dbt Cloud integration with the Monte Carlo CLI

  1. Make sure Monte Carlo CLI version is v0.54.2+
  2. Choose a warehouse to link with your dbt Cloud:
    • If you only have a single warehouse integrated with Monte Carlo, proceed to step two.
    • Otherwise, you will need to select a warehouse to link with the dbt Cloud integration. Use montecarlo integrations list to list out all available connections, and take note of the ID of the warehouse that is used as the target for your dbt Cloud production models.

πŸ“˜

EMEA and Single Tenant Setups

If you are using dbt cloud and have a single tenant setup or are based in EMEA, you will need to include --dbt-cloud-base-url in the command set to your base URL for the validation to work. Please refer to the dbt documentation to obtain this.

Note: Make sure to include the api and version in the URL you provide. For example, for the URL https://cloud.getdbt.com, you should provide https://cloud.getdbt.com/api/v2.

Note: If you are using dbt cloud in EMEA, please use https://emea.dbt.com/api/v2.

  1. Run montecarlo integrations add-dbt-cloud:
$ montecarlo integrations add-dbt-cloud --help
Usage: montecarlo integrations add-dbt-cloud [OPTIONS]

  Setup a dbt Cloud integration.

Options:
  --dbt-cloud-api-token TEXT   dbt Cloud API token. If you prefer a prompt
                               (with hidden input) enter -1.  [required]
  --dbt-cloud-account-id TEXT  dbt Cloud Account ID.  [required]
  --dbt-cloud-base-url TEXT    dbt Cloud Base URL.
  --webhook-hmac-secret        HMAC secret of the dbt webhook obtained
    													 after successfully creating the webhook in dbt. [required]
  --webhook-id                 ID of the webhook obtained after successfully 
                               creating the webhook in dbt.[required]
  --name TEXT                  Friendly name of the warehouse which the
                               connection will belong to.
  --collector-id UUID          ID for the data collector. To disambiguate
                               accounts with multiple collectors.
  --skip-validation            Skip all connection tests. This option cannot
                               be used with 'validate-only'.
  --validate-only              Run connection tests without adding. This
                               option cannot be used with 'skip-validation'.
  --auto-yes                   Skip any interactive approval.
  --help                       Show this message and exit.
  • montecarlo integrations add-dbt-cloud --dbt-cloud-api-token <dbt api token> --dbt-cloud-account-id <dbt account id> --webhook-hmac-secret <webhook hmac secret> --webhook-id <webhook id>
  • See section the above on how to create the Webhook
  • Monte Carlo will perform a validation step to ensure that the dbt Cloud account can be successfully accessed. If validation passes, you will be prompted to create the dbt Cloud integration. If validation fails, please double-check that you provided the correct API token and account ID.

At this point, the integration has been created.

Query Tagging To Get Performance Features

Setup in this section enables MC to link your queries to dbt objects to provide additional features from the integration:

  • you will be able to filter for dbt jobs and models in the performance dashboard to drill into performance issues for a dbt job or model of interest
  • you will be able to identify the dbt job run for each query run from the performance dashboard details drawer
  • you will be able to leverage a dbt job gantt view in assets page to identify problematic models and queries in a job, then leverage MC's query-level metadata to investigate what caused the performance problems

See how you can use those features here

To achieve those, configure your dbt project with the following macro to add additional query comments.

  1. In your dbt project, create a new file macros/query_comment.sql with these contents:
{% macro query_comment(node) %}
    {%- set comment_dict = {} -%}
    {%- do comment_dict.update(
        app='dbt',
        dbt_version=dbt_version,
        profile_name=target.get('profile_name'),
        target_name=target.get('target_name'),
        invocation_id=invocation_id,
        dbt_cloud_project_id=env_var('DBT_CLOUD_PROJECT_ID', ''),
        dbt_cloud_job_id=env_var('DBT_CLOUD_JOB_ID', ''),
        dbt_cloud_run_id=env_var('DBT_CLOUD_RUN_ID', ''),
        dbt_cloud_pr_id=env_var('DBT_CLOUD_PR_ID', ''),
        dbt_cloud_git_sha=env_var('DBT_CLOUD_GIT_SHA', '')
    ) -%}
    {%- if node is not none -%}
      {%- do comment_dict.update(
        file=node.original_file_path,
        node_id=node.unique_id,
        node_name=node.name,
        resource_type=node.resource_type,
        package_name=node.package_name,
        relation={
            "database": node.database,
            "schema": node.schema,
            "identifier": node.identifier
        }
      ) -%}
    {% else %}
      {%- do comment_dict.update(node_id='internal') -%}
    {%- endif -%}
    {% do return(tojson(comment_dict)) %}
{% endmacro %}
  1. Update dbt_project.yml to include a query-comment configuration that references the macro you just added in step (1):
query-comment:
	comment: "{{ query_comment(node) }}"
	append: true # Required for Snowflake

πŸ“˜

query-comment

The append: true property is REQUIRED for Snowflake, which removes leading SQL comments from queries.