Github

Integrate Monte Carlo with your Github to gain visibility into code impact on your data

Overview

The Github integration allows customers to

  1. Reduce time to resolution by easily checking potentially relevant pull requests in the context of an incident via PRs overlaied on incident charts.

  2. Prevent data issues by seeing an impact report with the potentially impacted tables/reports for a pull request via a message from MC-Github bot (requires dbt integration)

  1. Get context on tables via reviewing recent pull request history on the asset page both in list view and as visual overlay on volume charts


Prerequisite: dbt Integrations

Having dbt integration set up is required for Github integration, as it provides information for mapping between source code, models and the actual tables.

Follow docs here to set up your dbt integration.

For customers with MC - dbt cloud integration

With data collector v14050, the setup should be automatic and you can follow the Github Integration Setup steps below to enable.

For earlier versions, the remote location of dbt projects needs to be configured manually (see "dbt core" below).

For customers with MC - dbt core integration

For dbt-core integrations, the remote location of each dbt project needs to be provided. You can do this by going to the Integrations settings page once you have completed the Github Integration setup below. Under Notifications and Collaboration, the GitHub integration row will show an alert requiring more Github information. Click into the editing drawer via the alert or pen icon to input the remote URL missing.

Alternatively, the remote location of each dbt project can be configured using GraphQL API:

mutation updateDbtProjectInfo($uuid: UUID!, $remoteUrl: String, $subdirectory: String) {
    updateDbtProjectInfo(uuid: $uuid, remoteUrl: $remoteUrl, subdirectory: $subdirectory) {
        project {
            uuid
        }
    }
}

Parameters:

uuid - dbt project UUID

remoteUrl - e.g. [email protected]:monte-carlo-data/dbt.git

subdirectory - root directory of the dbt project within the repo (e.g. analytics). This field is needed if the dbt project is further down under dbt subdirectories, i.e. in models, macros. For projects located directly in the repo on the highest level, this field should be left empty.

Github Integration Setup

You can set up the integration by installing an instance of the official MC Github App for your organization.

If you manage multiple Github organizations which all have code relevant to data collected by MC, you need to install the app for each organization.

๐Ÿ“˜

Permissions

The application requires following permissions:

  • Read access to administration and metadata
  • Read and write access to issues and pull requests

Note that the write access is limited only to pull request comments. It is needed by the "Downstream Impact" feature.

  1. In Monte Carlo, go to Settings -> Integrations
  2. In โ€œNotifications and Collaborationsโ€ section, click on โ€œCreateโ€ and then select โ€œGitHubโ€:

  1. The page will navigate to the Github UI. Select:
    • the organization
    • (optionally) the repositories that will be accessible to MC
  2. Click on โ€œInstall and Authorizeโ€
  3. The page will navigate back to Monte Carlo

If you have the โ€œownerโ€ role for Github account, the integration will now appear in the Settings -> Integrations -> Notifications and Collaborations list.

If you are not the โ€œownerโ€, request will be sent to the Github account owner for approval. Once the owner approves, the integration will appear in the Settings -> Integrations -> Notifications and Collaborations list.

Once set up, the app will start collecting pull requests that merged after the integration setup time. For customers that do not have dbt integration, you will start seeing pull requests show up in incidents only if the PRs merged after integration set up AND the incidents started after the PR merged.

๐Ÿšง

Warnings for non Github "owner" requested setup workflow

If the installation request has been sent for approval to the Github account owner, the setup will be successful only if the owner is a Monte Carlo user and is signed-in to Monte Carlo at the time of approval (or performed sign-in as part of the approval flow).

It could be simpler to have the Github account owner perform all the installation steps instead of going through the โ€œinstallation requestโ€ flow.

Alternatively, once you have installed the app in your organization, you can also reach out to MC support to finalize the setup.

FAQ

Q: I have MC-dbt core integration and I'm not sure that I filled out the remote url and subdirectory correctly in the settings page?

A: Remote URL should be in the format of either of the following

  • https://github.com/<org>/<repo>
  • git://github.com/<org>/<repo>.git

Subdirectory should be the root directory of the dbt models within the repo. It is only needed if the dbt project is further down the directory. For example,
If a model path is analytics/models/foo/bar.sql then subdirectory would be analytics.

If a model path is models/foo/bar.sql, then subdirectory field should be left empty.

Q: For certain impact report in Github PR, why don't I see expected impacted models?

A: Note that impact report is only available if i) a MC-dbt integration is setup, and ii) remote URLs and subdirectories are correctly provided for customers with MC-dbt core integration, and iii) the PR is modifying models in dbt repos included in the GitHub integration.

If the above conditions are met, it could be because the PR is creating a new model, in which case there would be no impacted assets available yet since until the model is run no query logs are available to generate lineage in MC. It could also be that the impacted model is ephemeral, which MC does not surface as impacted assets yet (but will be added soon).

Q: I just set up the integration, why don't I see pull requests showing up yet?

A: after the integration is set up, MC will start collect the PRs that merged after the integration setup time. MC does not have access to the historical PRs, so it might take a few hours or days for PRs to start showing up depending on how frequently your organization merge PRs. For customers without a dbt integration, since the PRs will only show up in the context of an incident, only PRs merged before the incident start time will be included in each incident (to limit the list of PRs to only ones that could have potentially led to the incident).