Data Explorer (beta)

Data Explorer makes it easy to profile the contents of a table or view. This can be helpful when investigating a data quality issue reported by a business partner, when considering which monitors to create, or when simply getting familiar with the contents of a table.

The experience is interactive and no-code, making it approachable for less technical roles. Users can point and click to adjust the time range of data and filter down for particular segments. In the future, we'd like to make it easy to compare multiple segments of data side-by-side.

Currently, Data Explorer is available for just a subset of customers and for the Snowflake integration only. Your Monte Carlo Agent must be version 16624 or higher.

Using Data Explorer

No monitor or prior configuration is needed to use Data Explorer for a table. When a user loads the Data Explorer tab in Assets, it executes queries against the source warehouse to retrieve up to date statistics about the table. See the Architecture section to learn more about how the results of these queries are handled, to ensure data is not stored by Monte Carlo.

**Filters**, **Row count**, and **Segments** sections of Data Explorer. Adjust the slider in **Row count** and click on values in **Segments** to filter the rest of the data in the dashboard.

Filters, Row count, and Segments sections of Data Explorer. Adjust the slider in Row count and click on values in Segments to filter the rest of the data in the dashboard.

Data Explorer contains 4 sections:

  • Filters: summary of the filters applied on the data being shown. By default, a filter is applied for the trailing 7 days on a user-selected time field. The time filter can be changed using the time range selector at the top of the Assets page. You can further filter the data by applying a custom WHERE clause.
  • Row count: histogram of the count of rows, aggregated using a user-selected time field. At the bottom of this section is an easily adjustable slider to shorten or slide the desired time range.
  • Segments: bar charts that show the distribution of values of user-selected fields. Clicking the values will filter the rest of the data in Data Explorer for that segment. These are most useful if you select categorical fields. Results are limited to the 50 most frequent values.
  • Field profile: common statistical metrics for each field, like the count and % of nulls, count and % unique, minimum, maximum, mean, and standard deviation. Users can drill down into specific metrics to see how they trend over time.

It's intended for users to filter and refine for a specific segment or time range of data in the first 3 sections, so they can then see the Field Profile for that segment.

**Field profile** and **Sample rows** sections.

Field profile section of Data Explorer. Users can click into any metric to see its trend over time.

Users can also drill-down into a field profile metric to see how that metric has trended over time. This helps a user to validate an issue reported by a business partner. For example, if someone is reporting a sudden spike of nulls in a key field… it’s now much easier to validate that without ever writing SQL or setting up a monitor.

Drill-down into a specific field metric to see how has trended over time.

Drill-down into a specific field metric to see how has trended over time.

Architecture and Permissions

Data Explorer is only available for users with roles of Account Owner, Domains Manager, and Editor.

The queries executed by the Data Explorer are created in the backend by Monte Carlo and dispatched to the customer's Monte Carlo Agent. The agent executes the query and stores the result in the customer-specific object storage. Then it returns a signed URL (with a 5 minutes expiration time) back to Monte Carlo, which serves them back to the browser for download.

Data queried by the Data Explorer does not pass through the Monte Carlo Cloud Service. It's only handled by the Monte Carlo Agent, and the user's browser. For more detail, see our Architecture & Deployment Options.

Care is taken to ensure that Monte Carlo does not run a large or costly query. By default, Monte Carlo will filter for the trailing week's worth of data using a user-selected time field. If Monte Carlo anticipates that this will query too much data, it pre-emptively suggests that the user select a narrower time window or apply a WHERE clause.

If Monte Carlo anticipates that a query will be too large, it will prompt the user to select a shorter time range or apply. WHERE clause.

If Monte Carlo anticipates that a query will be too large, it will prompt the user to select a shorter time range or apply. WHERE clause.

Running the Data Explorer using customer-hosted Object Storage

If you are hosting the object storage, you need to ensure it allows CORS requests from the browser to allow the Data Explorer UI to fetch the query responses.

For S3, the following CORS access policy will allow us to get the data exported by the Data Explorer:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "https://*.getmontecarlo.com"
        ],
        "ExposeHeaders": [],
        "MaxAgeSeconds": 3000
    }
]

See https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html for more information.