Data Products

Overview

Data Products in Monte Carlo are a way to organize and monitor the data assets that drive key business outcomes. By grouping related tables and reports together, you gain a holistic view of the monitoring coverage, health and reliability of your data, making it easier to ensure that your data pipelines are delivering accurate and trustworthy information.

Create a data product

Select a list of assets that form the core of your Data Product. These can be tables or BI Reports and Dashboards.

📘

Tip

Focus on selecting only the most downstream assets that serve as the core interfaces of your data product. Think of these as the last stop before the data is used by a client application or viewed by a user.

Once created, the Data Product will automatically include assets that are upstream via lineage to give you a complete picture of the overall health of your Data Product. The core assets that can be included in a Data Product is currently limited to 40.

Start Tier customers are limited to 3 Data Products.

  1. Add a Name and Description that will make this Data Product easily identifiable by other users viewing this across your workspace.

  2. Search for assets that you would like to add to this Data Product. Table and Reports can be added. Click "+" under "Add" column to add them to the Data Product.

  3. As you add Tables and Reports, you will see them appear on the right side.

    1. The number behind "Included assets", e.g. "8" in screenshot below, represents the total number of unique tables necessary to build the data product. It includes all tables used directly by the product, as well as their upstream dependencies.

    2. For tables and reports where upstream lineage is available, you will see a column for the number of upstream tables connected to that asset that will be included in the Data Product, e.g. "4 upstream tables".

  4. Click "Next: Review"

  5. In the Review step, you will be able to see all Tables and Reports to be included in your Data Product. The tables and reports you included in the previous step and all their upstream tables have been included in the Data Product. Refer to Why Include All Upstream Tables in Data Products.

    1. Use the "Lineage" and "List" view to see the Data Product in different visual layouts

    2. Use the Filters at the top to filter both of these views by the monitoring status of each of the tables included in the Data Product.

      1. Monitored - tables that are currently already monitored through other monitoring inclusion rules in the Usage UI
      2. Not monitored - tables that are not monitored or explicitly excluded from monitoring.
      3. Not supported - tables that exist in the lineage but monitoring through Monte Carlo is not supported
    3. Click "Create and Monitor" to create the Data Product and monitor all "Not monitored" tables.

      1. Note: The estimate of new tables to be monitored is based on the tables currently included in the data product. The final count may vary depending on the complete lineage, which will be analyzed after creation. The exact number of monitored tables will be visible on the data product details page once it's ready.

Once a Data Product is created, all tables in the Data Product will be automatically tagged with a Table tag.

Why Include All Upstream Tables in Data Products?

At Monte Carlo, we believe that true data quality monitoring requires a comprehensive view of your data pipeline. While many define data products solely as the final tables consumed by users, we take a broader approach.

We include all upstream tables in a data product because data quality isn't just about the end result; it's about the entire journey. Issues originating in upstream tables can cascade down and impact the reliability of your final output. By monitoring the entire lineage, you gain a deeper understanding of your data health and can proactively identify and resolve issues before they impact your users.

Think of it like a car assembly line. The final product (the car) is critical, but quality control throughout the process (engine, chassis, etc.) is essential for a truly reliable outcome. Similarly, by monitoring all upstream tables in your data product, you ensure the delivery of trustworthy and high-quality data to your users.

Access and permissions

Data Products can be created by Account Owners, Domain Managers and Editor Managed Roles and are visible to all roles except Asset Editor and Asset Viewer.

  • Account Owners and Domain Managers can edit all Data Products
  • Editors are only able to edit Data Products they created

See full permissions under Managed Roles and Groups.

Coverage

The Coverage tab of Data Products shows all assets that are included in your Data Product and their current monitoring status.

  • The coverage chart at the top right provides a quick view on the percent of tables "Not monitored" out of the total tables that could be monitored in the Data Product. This excludes tables that are "Excluded" (through rules added in the Usage UI) or tables that are "Not supported".
  • The "Data product tag" is automatically generated by Monte Carlo and applied to all tables in the Data Product. Refer to Why Include All Upstream Tables in Data Products.
  • Clicking "Monitor this tag" will add a rule into the Usage UI to monitor any tables with the "Data product tag" for this Data Product.
  • The "Lineage" and "List" view to see the Data Product in different visual layouts
  • The Filters at the top to filter both of these views by the monitoring status of each of the tables included in the Data Product.
    1. Monitored - tables that are currently already monitored through other monitoring inclusion rules in the Usage UI
    2. Not monitored - tables that are not monitored or explicitly excluded from monitoring.
    3. Not supported - tables that exist in the lineage but monitoring through Monte Carlo is not supported

Health

The Health tab allow users to get an overview of the data quality across a selected list of assets that make up a Data Product. This dashboard enables you to build trustworthy data products by providing visibility into critical data asset health and reliability.

Filters

There are 4 main filters to control the incidents shown; Lookback Range, Incident Status, Incident Severity and Upstream Depth.

Lookback Range

Lookback Range is the date parameter filter of the detected incidents. It consists of a multi-select drop down menu that includes Today, Yesterday and Today, 7 Days, 2 Weeks, 3 Weeks, and 4 Weeks. We typically recommend using a 2-week lookback window as best practice.

Incident Status

The Incident Status filter allows a visualization of incidents that include Fixed, Expected, Investigating, No Action Needed, False Positive, and No Status.

  • Fixed: Incident has been flagged and resolved
  • Expected: Change has been made with anticipation of incident
  • Investigating: Incident has been flagged and is under investigation
  • No Action Needed: Incident has been flagged and required no action
  • False Positive: Incident was incorrectly identified as anomalous behavior
  • No Status: Incident has not been flagged and needs update

Incident Severity

The Incident Severity filter allows users to place a severity level on each individual incident. Incident severity is a categorization method that you can update manually and is up to the you to define the meaning of of each level of severity. This filter displays a checkbox style menu that consists of 6 different severity options (No Severity, Sev-0, Sev-1, Sev-2, Sev-3, Sev-4).

Upstream Depth

The Upstream Depth allows you to specify how many levels of upstream tables should be included in the dashboard against the core assets included. If there are incidents that meet the filters on upstream tables, they will be included in the Incident Metrics and shown under Upstream incidents in the Assets table.

Incident Metrics

Total Incidents

The first chart shows the total incidents related to the assets included in the Data Product that match the filters set. Hovering over the time series will provide exact numbers for each point on the graph.

By Status

This chart shows all incidents related to the assets included in the Data Product that match the filters set, distributed by their Status.

By Severity

This chart shows all incidents related to the assets included in the Data Product that match the filters set, distributed by their Severity.

Assets Table

The assets table shows incidents related to the core assets included when defining the Data Product Dashboard as well as any upstream incidents if a Upstream Depth is set. If an Upstream Depth is set, all incidents on tables upstream of the core asset will be rolled into a single "Upstream incidents" row nested under the core asset they relate to.

The far left column Incidents shows the total incidents related to that row. The cell will also be shown in one of three colors:

  • Green - All incidents are in a resolved status of Fixed, Expected, No Action Needed, or False Positive.
  • Yellow - At least one incident related to this asset is in Investigating status.
  • Orange - At least one incident related to this asset has No Status.

A Green check mark indicates there are no incidents for that asset that match the filters set.

📘

Tip

The coloring of the cell will indicate the most severe incident on the asset (ex. if there are 10 total incidents, 3 of which have been Fixed, 6 that are in Investigating status, and 1 that has No Status, the cell will remain orange as it is pulling the most severe status, No Status.

The right side of the assets table has the columns SQL Rule, Freshness, Volume, Field Quality, Field Health, Dimension, Schema and dbt errors. These represent the different monitor types that Monte Carlo allows on a table. For each column, the number of incidents for each monitor type for each asset will be shown. Similar color coding of Green, Yellow and Orange is used as above with the Incidents column.

A Green check mark here indicates there are no incidents for that asset that match the filters set but there are monitors (either out-of-the-box or custom) of that type current set up and running for that asset. If there are no monitors of that type currently set up for that asset, a "+" will be shown to allow easy creation of that monitor type for that asset.

Viewing Incidents

Clicking any number shown in the Assets Table will pull out a side drawer with all incident matching the respective number clicked. Use this side drawer to quickly examine and triage Incidents or dig deeper into them by selecting "View Incident IQ"

Sharing a dashboard

🚧

Deprecated

This feature is deprecated and will soon be removed. Data Products will be visible to all users in an workspace and will not need to be shared individually.

Once created a Data Product Dashboard can be shared with member of a Monte Carlo account.

To share, select the "Share data product" icon in the top right when viewing a Data Product Dashboard.

Toggle the setting to "Share dashboard"

Once shared, a Data Product Dashboard will show for users that have access to a Domain that has at least one of the core assets included in the definition of the Data Product. These dashboards will show under "Shared dashboards" under Dashboards -> Data Product

If a user is viewing the Data Product Dashboard from a Domain that does not have access to certain assets included in the dashboard or in the upstream lineage of one of the assets, these assets, and incidents related to these assets, will not be included in the Incident metrics or show in the Assets table. This will be indicated by a tooltip in the Asset Table to let the user know some asset are hidden.