Data quality (beta)

Overview

The data quality dashboard provides a way to view the health of a set of monitors. Customer use the data quality dashboard for:

  • SLAs: report on the current state and trend of specific monitors, rather than all monitors.
  • Critical data elements: report on a set of validations or metric monitors for a critical data element, applying tags like cde:total_revenue.
  • Enforcing data contracts: tag monitors with a specific tag per data contract or simply data-contract.
  • Data quality dimensions: coming soon: breaking down data quality score by dimension.
  • Team monitor reporting: for monitors that are owned by the finance team, using monitor tags like team:finance.
  • Use case monitor reporting: for organizing specific use-cases or data products, using monitor tags like data-product:salesforce_revenue or project:revenue_reliability.

🚧

Supported monitor types

The following monitor types are supported for measuring data quality:

  • Validations
  • Comparison
  • Custom SQL

Legacy Freshness and Volume rules can also be viewed. Other monitor types are under evaluation to be added, but generally this dashboard is meant to measure deterministic monitors versus non-deterministic anomaly detection monitors.

Using the data quality dashboard

By default, all Validation, Comparison, and Custom SQL monitors are shown. Best practice is to tag monitors in order to start measuring data quality for your data product, asset, or team.

Filtering by monitor tags

The dashboard can be filtered by monitor tags. Learn more about applying monitor tags.

Data quality score

The data quality score is calculated as the number of monitors that have passed on the last run divided by the total number of monitors.

Think of the score as "current state" - what percent of monitors are passing right now?

Why might total monitors number be different than expected?

There a few reasons:

  • All status monitors are displayed in this dashboard, including Enabled, Error, Disabled, Snoozed, Insufficient data, and Training. Confirm if any of these monitors are no longer needed or apply a tag filter to filter them out.

Why might the alerting monitors number be different than expected?

  • For monitors in "Error" status, the last run is displayed as the last "successful" run. Error status means the monitor entirely failed to execute, not that it ran and alerted. Confirm if any of these monitors are no longer needed or apply a tag filter to filter them out.
  • The monitor may have run and alerted, then was disabled. You may not be receiving alerts for this monitor recently, but since the last successful run of the monitor alerted it will show as alerting in the "current state."

Score trend

The score trend is a daily aggregation of the monitor runs that run that day. The score trend calculation is the number of successful runs of monitors that day divided by the number of total monitor runs that day.

Why might trend number be different than the current data quality score?

  • If a monitor is not a "daily" scheduled monitor, this may cause discrepancies between the score and trend. It is recommended to use monitors with similar frequencies for measuring data quality. For example:
    • If a monitor is run hourly, it carries a heavier weight as it will have 24 runs per day instead of a daily monitor that only has 1 run per day.
    • If a monitor is run weekly, it carries a lower weight as it will only show on the trend once per week instead of a daily. It will only be used in the trend once per week.
  • If a monitor is no longer actively running on a schedule, it won't be reflected in the trend but will still be reflected in the current data quality score.
  • The trend data is on a delay of up to 12 hours. Newly tagged monitors will not have trend data for 12 hours.

Alerting monitors

Alerting monitors provides detailed information on the monitors that alerted on their most recent run. Invalid records are provided, and drilling into the invalid records takes you to that alert for further investigation.