πŸ“˜

Beta access

Manage ingestion and monitoring of tables in the Usage UI is currently in Beta and is being progressively rolled out to customers.

The Usage UI introduces an intuitive interface to manage what data assets are ingested and monitored by Monte Carlo.

  • Ingested tables are all unique tables (tables, views, external tables) with metadata and lineage in Monte Carlo. Ingested tables will show up under Assets and will be available in lineage
  • Monitored tables are all unique tables (tables, views, external tables) monitored for downtime by Monte Carlo.

πŸ“˜

Ingested vs Monitored tables

Tables that are enabled for monitoring can have Monitors set up for them and will raise incidents and send notifications if data quality issues occur.

If a table is ingested but not monitored will NOT be able to view monitoring data, see incidents or recieve alerts for that table.

Benefits of the Usage UI

  • Consolidated controls give you one place to manage what is ingested and monitored
  • By default, no tables will be monitored, allowing you to specify exactly what tables you want to monitor.
  • Specify rules based on table names on what tables you want to monitor.
  • Specify rules on what schemas you do not want to ingest.

πŸ“˜

How this was managed previously

Previously, ingested tables were managed by going to Edit for the Data Lake and Warehouse under Integrations and selecting the "Filtering" tab to choose what Databases and Schemas to ingest. Additionally, Monte Carlo previously monitored all tables by default and you managed what was not monitored by Muting Schemas and Tables. These features will be deprecated.

Access

The Usage UI is accessible to users with Account Owner, Domain Manager, or Editor roles. Learn more about these roles under Managed Roles and Groups.

Charts in the Usage UI

Two charts are available in the Usage UI, Total ingested tables and Total monitored tables. Both charts show daily totals for up to 90 days.

Using the Usage UI

Configure ingestion

  1. On the Usage page the Data lakes and Warehouses will show all Integrations that can be configured. Select an integration to configure by clicking on its name.
  2. Under the Integration, you will see a list of Databases that Monte Carlo can see. Each Database can be toggled on/off for ingestion. Select the Database name you want to configure or select "Set up" under the Monitored tables column.
  3. Within each Database, you will see a list of visible schemas. You can control which schemas to ingest by specifying rules. By default, all schemas are ingested into Monte Carlo and exclude rules can be applied. Select "Add exception rule" to add a new rule. Multiple rules can be added. You may schemas by name with the following conditions:
    1. Exact match
    2. Starts with
    3. Ends with
    4. Contains
    5. Matches Regex
  4. After selecting "Save", the "Ingested" and "Excluded from Ingestion" tabs at the bottom will be updated to reflect what schemas in that database are being included/exclude from ingestion. The numbers in the header will also be updated to reflect the total counts on ingested and monitored tables

πŸ“˜

Tables must be ingested to be eligible for monitoring

A table must first be ingested before monitoring can be enabled for it. Ensure that you are not excluding a schema from ingestion that you want to enable monitoring for.

Configure monitoring

  1. Under the appropriate Database on the Usage page, select the Schema that contains tables you want to enable monitoring for.
  2. By default, no tables are monitored in a Schema. Select "All tables" or "Select tables" at the top under Monitor rules to enable monitoring. For "Select tables" you can specify tables based off name with the following conditions:
    1. Monitor table names that
      1. Starts with
      2. Ends with
      3. Contains
      4. Matches pattern
    2. Except table name that
      1. Starts with
      2. Ends with
      3. Contains
      4. Matches pattern
  3. After selecting "Save", the Monitored and Not Monitored tables at the bottom will update to reflect what tables are monitored and not monitored based off the rules you have applied. The numbers in the header will also reflect the updated total counts.

πŸ“˜

Rules are case-sensitive

Be aware that schema and table names are case-sensitive when specifying patterns for exclusion from ingestion or inclusion in monitoring.

πŸ“˜

Using the "Matches pattern" rule

Use "*" to match one or more characters.

For example, specifying the pattern "prod_*_table" would match a table by the name "prod_1_18_snapshot_table"

πŸ“˜

Tip: Exclude just a few tables

Specify "All tables" in your monitor rules then add a few "Except table name" rules if you want to monitor most of a schema but exclude a few tables.

Steps to migrate to the new Usage UI

Since we are switching from an exclude model (where you muted what not to monitor) to an include model (where you choose what you DO want to monitor) we will need your help setting up what you do want to monitor. Here are the steps we would ask you to take:

  1. Navigate to the Usage page.

    1. You will see a banner at the top that indicates this is currently in "Staging". This means that changes here to what tables are monitored will not take effect until you toggle this to "Active".

  2. Under Data Lakes and Warehouses, you will see the integrations you currently have set up. They will show Monitored tables: 0 for each currently.

  3. Click into each integration and into each Database that you want to enable monitoring on.

  4. Beside each Schema you will see an "Enable" button under the Monitored Tables column. Selecting this will start monitoring All tables in that schema.

    1. If you want to be more selective on exactly what tables to monitor in that schema, you can select the Schema name and choose "Select tables" under Monitor rules. Make sure to click "Save"!
  5. As you specify what tables you want to monitor for each schema, you will see the numbers at the top of the page reflect the total rollup count at each level.

  6. Once you have specified the necessary rules on what tables to monitor, toggle the switch in the top banner to "Activate" these new rules.

🚧

Make sure your tables are being monitored!

Once you "Activate" these new rules, ensure that the Total Monitored Tables number shows the number of tables you expect to be monitored. If it shows "0" nothing is being monitored and you won't recieve incidents and alerts on data quality issues! Check back through each Database and Schema to make sure you have applied and Saved your rules.

See a video walkthrough

CLI

Management of the collection block list is supported on CLI v0.40.0+. View CLI docs here: https://clidocs.getmontecarlo.com/

You can see which schemas and entities you already have specified to be blocked from collection using the get-collection-block-list command.

% montecarlo management get-collection-block-list --help
Usage: montecarlo management get-collection-block-list [OPTIONS]

  List entities blocked from collection on this account.

Options:
  --resource-name TEXT  Name of a specific resource to filter by. Shows all
                        resources by default.
  --help                Show this message and exit.

You can make changes to the collection block list using the update-collection-block-list command.

% montecarlo management update-collection-block-list --help
Usage: montecarlo management update-collection-block-list [OPTIONS]

  Update entities for which collection is blocked on this account.

Options:
  --add / --remove                Whether the entities being specified should
                                  be added or removed from the block list.
                                  [required]
  --resource-name TEXT            Name of a specific resource to apply
                                  collection block to. This option cannot be
                                  used with 'filename'. This option requires
                                  setting 'project'.
  --project TEXT                  Top-level object hierarchy e.g. database,
                                  catalog, etc. This option cannot be used
                                  with 'filename'. This option requires
                                  setting 'resource-name'.
  --dataset TEXT                  Intermediate object hierarchy e.g. schema,
                                  database, etc. This option cannot be used
                                  with 'filename'. This option requires
                                  setting 'resource-name', and 'project'.
  --collection-block-list-filename TEXT
                                  Filename that contains collection block
                                  definitions. This file is expected to be in
                                  a CSV format with the headers resource_name,
                                  project, and dataset. This option cannot be
                                  used with 'resource-name', 'dataset', and
                                  'project'.
  --help                          Show this message and exit.
  • Resources are Monte Carlo integrations
  • Projects would be a metastore in Databricks (like hive_metastore) or database in Redshift
  • Datasets would be a schema in Databricks or Redshift

GraphQL API

Limitations

  • The following Integrations are not currently supported for configuration under the Usage UI
    • Glue
    • Pinecone
    • Kafka; Confluent
  • There is currently a limit of 100 table monitor/except rules within a given schema.
  • The criteria for a rule may only contain up to 255 characters.

Troubleshooting

The Current Total for Total monitored tables chart and the Monitored Tables column doesn't match

This is due to the Usage UI still being in "Staging" mode. While Monitoring rules are in staging, Muted data selections are still in effect and the Current total number for the Total monitored tables chart will be respecting the muting data toggles you have in place and will be showing the total of non-muted tables.

Once the toggle is moved to "Active" the Current total number for the Total monitored tables chart will match the sum of the Monitored tables column in the Data lakes and Warehouses section at the bottom of the page.

Use this "Staging" state to migrate between what is muted and what you want included in monitoring before toggling to active. Once activated, Muted data selections will no longer be available and the Monitoring Rule applied here will take effect. For more details, refer to Steps to migrate to the new Usage UI