Data Sampling

Some features in Monte Carlo can temporarily surface rows from your warehouse within the UI, or let users query the warehouse in a way that can return sensitive business metrics.

Collectively, we refer to this set of features as data sampling. For security, compliance, or regulatory reasons, a small subset of customers choose to disable data sampling for their Monte Carlo environment. It is a warehouse-level control that can be switched by a technical member of your Monte Carlo account team.

The data from these features sits in Object Storage. See our Architecture & Deployment Options for details on where Object Storage fits within the broader Monte Carlo architecture.

Features that are unavailable when Data Sampling is disabled

Within Monitoring

  • Within SQL Rules:
    • "Value-based" SQL rules: unlike count-based SQL rules which wrap the results of the query in a count (helping to obscure any information within), a value-based SQL rule allows the user to return a numeric value. For example, the user may want to check that "sum of sales from yesterdays orders is > $1,000,000".
    • Parameterized values in SQL rules: within count-based SQL Rules, a user can pass values from breached rows directly into notifications. This is done using the syntax {{query_result:field_name}} within the monitor's notes. This is used to help accelerate time-to-resolution. For example, for a SQL Rule that is checking for a certain quality in table that contains salesforce opportunities, it is helpful to include in the notification the list of opportunity_id's that had faulty data.
    • Test your SQL query: users can test their SQL query to confirm that it will complete successfully. These tests will show the count of rows or value returned by the query.
  • Within Validation Monitors:
    • Previewing results of "sets": when using the is in set or is not in set operators, users can define a set by referring to another field or by writing a query. When testing the set, the user will see a preview of several values from the set.
  • Within Comparison Rules:
    • "Single-value" comparison rules: unlike count-of-rows comparison rules which wrap the results of the source and target queries in a count (helping to obscure any sensitive information within), a single-value comparison rule allows the user to return numeric values. For example, the user may want to check that "sum of sales from yesterdays orders is within .1% between postgres and Snowflake".
    • "Segmented values" comparison rules: similar to single-value comparison rules, segmented-value comparison rules allow the user to return numeric values, segmented by a particular dimension. For example, the user may want to check that "sum of sales from yesterdays orders by product is within .1% between postgres and Snowflake".
    • Test your SQL queries: users can test their source and target SQL queries to confirm that they will complete successfully. These tests will show the count of rows or values returned by the queries.

Within Resolution

  • Root cause analyses: after certain types of anomalies are detected, Monte Carlo will run followup queries that help to identify traits about the erroneous data. These can be used as helpful clues to more quickly identify the root cause of the data issue.
  • "Breached rows" from SQL rule breaches: after the breach of "count-based" SQL rules, the user can view the specific breached rows within the Monte Carlo UI to better understand the faulty data.

Within Assets

  • "Segments" section in Data Explorer: when exploring the statistical profile of data within a table, users can filter for specific segments of data. Users can select which fields they'd like to filter by, and can see the most frequent values in those fields that they can filter by.