Monte Carlo supports specifying a list of projects/databases/catalogs and schemas/datasets that will be controlled in collection.
Note that by default, new schemas are automatically detected and included in the monitoring. It is recommended that the users use the CLI commands to turn them off if they do not want the new schemas to be included in the monitoring.
Currently supported for Redshift, Databricks, BigQuery and Snowflake
Changes made to data collection filtering may take up to 48 hrs to be visible through dashboard metrics and the Catalog.
Management of the collection block list is supported on CLI
v0.40.0+. View CLI docs here: https://clidocs.getmontecarlo.com/
You can see which schemas and entities you already have specified to be blocked from collection using the
% montecarlo management get-collection-block-list --help Usage: montecarlo management get-collection-block-list [OPTIONS] List entities blocked from collection on this account. Options: --resource-name TEXT Name of a specific resource to filter by. Shows all resources by default. --help Show this message and exit.
You can make changes to the collection block list using the
% montecarlo management update-collection-block-list --help Usage: montecarlo management update-collection-block-list [OPTIONS] Update entities for which collection is blocked on this account. Options: --add / --remove Whether the entities being specified should be added or removed from the block list. [required] --resource-name TEXT Name of a specific resource to apply collection block to. This option cannot be used with 'filename'. This option requires setting 'project'. --project TEXT Top-level object hierarchy e.g. database, catalog, etc. This option cannot be used with 'filename'. This option requires setting 'resource-name'. --dataset TEXT Intermediate object hierarchy e.g. schema, database, etc. This option cannot be used with 'filename'. This option requires setting 'resource-name', and 'project'. --collection-block-list-filename TEXT Filename that contains collection block definitions. This file is expected to be in a CSV format with the headers resource_name, project, and dataset. This option cannot be used with 'resource-name', 'dataset', and 'project'. --help Show this message and exit.
- Resources are Monte Carlo integrations
- Projects would be a metastore in Databricks (like
hive_metastore) or database in Redshift
- Datasets would be a schema in Databricks or Redshift
You can also manage, select, and specify the schema and collection block list via our GraphQL API using the following operations:
- query getCollectionBlockList — shows which entities you already have specified to be blocked from collection.
- mutation addToCollectionBlockList — allows you to add or update entities specified to be blocked from collection.
- mutation removeFromCollectionBlockList — allows you to remove entities from the list, thereby allowing them to be discovered during the collection process.
Updated 5 months ago