Data Sampling

Some features in Monte Carlo can temporarily surface rows from your warehouse within the UI, or let users query the warehouse in a way that can return sensitive business metrics.

Collectively, we refer to this set of features as data sampling. For security, compliance, or regulatory reasons, a small subset of customers choose to disable data sampling for their Monte Carlo environment. It is a warehouse-level control that can be switched by a technical member of your Monte Carlo account team.

The data from these features sits in Object Storage. See our Architecture & Deployment Options for details on where Object Storage fits within the broader Monte Carlo architecture.

Features that are unavailable when Data Sampling is disabled

Within Monitoring

Within SQL Rules:
- "Value-based" SQL rules: unlike count-based SQL rules which wrap the results of the query in a count (helping to obscure any information within), a value-based SQL rule allows the user to return a numeric value. For example, the user may want to check that "sum of sales from yesterdays orders is > $1,000,000".
- Parameterized values in SQL rules: within count-based SQL Rules, a user can pass values from breached rows directly into notifications. This is done using the syntax {{query_result:field_name}} within the monitor's notes. This is used to help accelerate time-to-resolution. For example, for a SQL Rule that is checking for a certain quality in table that contains salesforce opportunities, it is helpful to include in the notification the list of opportunity_id's that had faulty data.
- Test your SQL query: users can test their SQL query to confirm that it will complete successfully. These tests will show the count of rows or value returned by the query.
Within Validation Monitors:
- Previewing results of "sets": when using the is in set or is not in set operators, users can define a set by referring to another field or by writing a query. When testing the set, the user will see a preview of several values from the set.
Monitoring agent:
- Monitoring suggestions would not be available since they require a sample of data from the table in order to validate the AI model's hallucinations.

Within Resolution

Root cause analyses: after certain types of anomalies are detected, Monte Carlo will run followup queries that help to identify traits about the erroneous data. These can be used as helpful clues to more quickly identify the root cause of the data issue.
"Breached rows" from SQL rule breaches: after the breach of "count-based" SQL rules, the user can view the specific breached rows within the Monte Carlo UI to better understand the faulty data.

Within Assets

"Segments" section inData Profiler: when exploring the statistical profile of data within a table, users can filter for specific segments of data. Users can select which fields they'd like to filter by, and can see the most frequent values in those fields that they can filter by.

Updated 7 days ago