Ingestion Validation

Ingestion Validation is designed to check that data coming from external partners is landing in your warehouse on-time, and in the size and quality that you expect. Examples of data from external partners are:

  • Ad campaign data that’s coming from a social media platform
  • Policy data that’s coming from an insurance provider
  • Shipping data that’s coming from a mail carrier

Monte Carlo automatically monitors these tables for freshness and volume anomalies. However, there are many reasons you might want to add deeper or more explicit monitoring through Ingestion Validation, such as:

  • The external partner is chronically unreliable, which has desensitized Monte Carlo’s machine learning
  • There is a very short turnaround from when the data arrives to when it is consumed, so it needs to be validated very quickly
  • You would like to hold the external partner accountable to specific SLAs
  • The quality within certain columns is erratic (e.g. large spike in nulls)

How it works

Ingestion Validation is not a distinct custom monitor type. Rather, it collects a series of user inputs which then create Freshness SLOs, Volume SLOs, and Field Health. This is faster and easier than creating custom monitors individually.

Currently, there is not a cohesive edit experience for Ingestion Validation. After creation, those Freshness SLOs, Volume SLOs, and Field Health monitors can be edited or deleted individually.

To configure Ingestion Validation

  1. Go to the Create Monitor menu and select Ingestion Validation
  2. Select table. Ingestion Validation is most effective for tables updated once or twice a day.
  3. Define schedule. This defines when and how frequently the below set of validations will run.
  4. Select which validations to add.
    • The user is shown the current thresholds for Monte Carlo’s automatic freshness and volume monitors, and then can choose to set additional, manual thresholds for the arrival time of the data, and the quantity of data received. These selections then create corresponding Freshness SLOs and Volume SLOs.
    • The user can also select specific fields for automated monitoring for statistics like % null, % unique, and percentiles. This creates a corresponding Field Health monitor.
  5. Input labels and notes (optional). Labels can be used in organization of monitors and routing of alerts.
  6. Click ‘Create’

This will create corresponding Freshness SLOs, Volume SLOs, and Field Health that can then be viewed and edited from the Monitors page.

1366

Validations for the arrival time of data and quantity of data received

1368

Selector to add fields to be checked for quality