Setting up dataset/schema controls

❗️
Page deprecated
This page is deprecated, refer to our Usage page.

Monte Carlo supports specifying a list of projects/databases/catalogs and schemas/datasets that will be controlled in collection.

Note that by default, new schemas are automatically detected and included in the monitoring. It is recommended that the users use the CLI commands to turn them off if they do not want the new schemas to be included in the monitoring.

📘
Currently supported for Redshift, Databricks, BigQuery and Snowflake

📘
Changes made to data collection filtering may take up to 48 hrs to be visible through dashboard metrics and the Catalog.

CLI

Management of the collection block list is supported on CLI v0.40.0+. View CLI docs here: https://clidocs.getmontecarlo.com/

You can see which schemas and entities you already have specified to be blocked from collection using the get-collection-block-list command.

% montecarlo management get-collection-block-list --help
Usage: montecarlo management get-collection-block-list [OPTIONS]

  List entities blocked from collection on this account.

Options:
  --resource-name TEXT  Name of a specific resource to filter by. Shows all
                        resources by default.
  --help                Show this message and exit.

You can make changes to the collection block list using the update-collection-block-list command.

% montecarlo management update-collection-block-list --help
Usage: montecarlo management update-collection-block-list [OPTIONS]

  Update entities for which collection is blocked on this account.

Options:
  --add / --remove                Whether the entities being specified should
                                  be added or removed from the block list.
                                  [required]
  --resource-name TEXT            Name of a specific resource to apply
                                  collection block to. This option cannot be
                                  used with 'filename'. This option requires
                                  setting 'project'.
  --project TEXT                  Top-level object hierarchy e.g. database,
                                  catalog, etc. This option cannot be used
                                  with 'filename'. This option requires
                                  setting 'resource-name'.
  --dataset TEXT                  Intermediate object hierarchy e.g. schema,
                                  database, etc. This option cannot be used
                                  with 'filename'. This option requires
                                  setting 'resource-name', and 'project'.
  --collection-block-list-filename TEXT
                                  Filename that contains collection block
                                  definitions. This file is expected to be in
                                  a CSV format with the headers resource_name,
                                  project, and dataset. This option cannot be
                                  used with 'resource-name', 'dataset', and
                                  'project'.
  --help                          Show this message and exit.

Resources are Monte Carlo integrations
Projects would be a metastore in Databricks (like hive_metastore) or database in Redshift
Datasets would be a schema in Databricks or Redshift

API

You can also manage, select, and specify the schema and collection block list via our GraphQL API using the following operations:

query getCollectionBlockList — shows which entities you already have specified to be blocked from collection.
mutation addToCollectionBlockList — allows you to add or update entities specified to be blocked from collection.
mutation removeFromCollectionBlockList — allows you to remove entities from the list, thereby allowing them to be discovered during the collection process.

Updated about 2 months ago