Metadata Insights

Access Monte Carlo generated metadata

We offer multiple mechanisms to access the metadata Monte Carlo collects to support a range of analytical and tracking use cases. Many of our customers today access this data to define and track SLI performance, monitor data asset usage, determine the most important (or least important) data assets, and more.

The following document outlines the different options you have for accessing this data.

Dashboard

Customers can easily download CSV reports right from the UI. Navigate to the Dashboard tab and click to download the CSV reports.

CLI

Customers can leverage the CLI to programmatically download all CSV reports and/or upload directly to S3.

Supported schemes:

  • file:// - save insight locally.
  • s3:// - save insight to S3.

Follow this guide to install and configure the CLI. For reference, see help for these commands:

❗️

These commands will overwrite a file if it exists in the path and create any missing directories or prefixes.

$ montecarlo insights
Usage: montecarlo insights [OPTIONS] COMMAND [ARGS]...

  Aggregated insights on your tables.

Options:
  --help  Show this message and exit.

Commands:
  get-cleanup-suggestions    Get cleanup suggestions insight.
  get-coverage-overview      Get coverage overview (monitors) insight.
  get-deteriorating-queries  Get deteriorating queries insight.
  get-events                 Get events insight.
  get-incident-queries       Get incident query changes insight.
  get-key-assets             Get key assets insight.
  get-rule-results           Get rule and SLI results insight.
  get-table-activity         Get table read/write activity insight.
  list                       List insights details and availability.
# Save an insight locally to a directory called 'mc_data' with filename 'assets.csv'
$ montecarlo insights get-key-assets file://mc_data/assets.csv

# Save an insight to S3 bucket called 'bucket' with key 'mc_data/alerts.csv'
$ montecarlo insights get-events s3://bucket/mc_data/alerts.csv

# List all insights details and availability
$ montecarlo insights list

👍

Snowflake Data Marketplace

Customers with Snowflake data warehouses located in us-east-1 can request access to a subset of the Insight reports directly from their Snowflake environment. Reach out to your CSM or to [email protected] to learn more about this feature!

Available Fields

There are many fields available on each report. To name a few:

Key Assets Fields:

  • FULL_TABLE_ID
  • PROJECT_NAME - database
  • DATASET_NAME - (e.g. a "schema" in snowflake)
  • TABLE_NAME
  • TABLE_TYPE - field values: table, view, external, wildcard_table (table groups which are collapsed by Monte Carlo into "canonical" tables)
  • IMPORTANCE - high (score >= 0.8), medium (0.6 - 0.8), low(<0.6)
  • IMPORTANCE_SCORE - based on # dependencies, avg. reads/writes, users, and more
  • PRC_ACTIVE_DAYS - % of days with > 0 queries out of the last 30 days
  • USERS - # of query executors (reading & writing executor)
  • READ_USERS
  • WRITE_USERS
  • READS - total read queries (in the last 30 days)
  • DISTINCT_READS - # of distinct read queries
  • PRC_DISTINCT_READS - DISTINCT_READS / READS
  • AVG_READS_PER_ACTIVE_DAY
  • WRITES - total write queries (in the last 30 days)
  • DISTINCT_WRITES
  • PRC_DISTINCT_WRITES
  • AVG_WRITES_PER_ACTIVE_DAY
  • LAST_READ
  • DAYS_SINCE_LAST_READ
  • LAST_WRITE
    -DAYS_SINCE_LAST_WRITE
  • DIRECT_UPSTREAM_TABLES - # of parent tables
  • DIRECT_DOWNSTREAM_TABLES - # of children tables

Cleanup Suggestions Fields:

  • FULL_TABLE_ID
  • PROJECT_NAME
  • DATASET_NAME
  • TABLE_NAME
  • TABLE_TYPE - field values: table, view, external, wildcard_table (table groups which are collapsed by Monte Carlo into "canonical" tables)
  • DAYS_SINCE_LAST_READ - days since last read query. If the data is older than 30 days then the field value = ">30".
  • LAST_READ
  • DAYS_SINCE_LAST_WRITE - similar to DAYS_SINCE_LAST_READ
  • LAST_WRITE

Events Fields:

  • TYPE - anomaly type
  • FULL_TABLE_ID
  • PROJECT_NAME
  • DATASET_NAME
  • TABLE_NAME
  • TABLE_TYPE - field values: table, view, external, wildcard_table (table groups which are collapsed by Monte Carlo into "canonical" tables)
  • FIELD
  • CREATED_AT - anomaly creation time in MC
  • LAST_UPDATE - anomaly time (for type = freshness | unchanged_size only)
  • ANOMALOUS_DATA_TIME - anomaly time for rest of the types
  • MIN_EXPECTED - min expected number/timestamp
  • EXPECTED - expected timestamp (for type = freshness | unchanged_size)
  • MAX_EXPECTED - min expected number/timestamp
  • ACTUAL - actual value measurement (timestamp or number)
  • FIRST_STATUS - first user feedback (expected, FP, etc.)
  • FIRST_STATUS_TIME
  • HOURS_TO_FIRST_STATUS - (first_feedback_time - created_time)
  • LAST_STATUS
  • LAST_STATUS_TIME
  • RESOLVE_TIME - anomaly's end_time
  • HOURS_TO_RESOLUTION - (anomaly's end_time - created_time)
  • MC_INCIDENT_ID
  • EVENT_APP_URL - a link to our app
  • MC_EVENT_ID

Did this page help you?