During setup, the Monte Carlo CLI is used to interact with the Monte Carlo API as well as with AWS. Please follow this guide to install and configure the CLI on your local machine.

📘

Prerequisites

Requires permission to create IAM roles and policies in AWS.

The easiest way to set up Glue is to use the Monte Carlo CLI exclusively (Option 1) as the CLI completely automates the infrastructure creation and policy configuration. In case that is not possible, proceed to Option 2.

Option 1: Use the Monte Carlo CLI [Recommended]

  1. Generate the Glue access policy
  2. Create an access role
  3. Provide role information to Monte Carlo

1. Generate the Glue access policy

  1. Run montecarlo discovery glue-policy-gen [parameters] > glue_access_policy.json with the necessary parameters. If the data account is not the same as the collector account, use --resource-aws-region and --resource-aws-profile to pass the data account profile.
$ montecarlo discovery glue-policy-gen --help
Usage: montecarlo discovery glue-policy-gen [OPTIONS]

  Generate an IAM policy for Glue. After review, output of this command can
  be redirected into `montecarlo integrations create-role` or `montecarlo
  discovery cf-role-gen` if you prefer IaC.

Options:
  --database-name TEXT         Glue/Athena database name to generate a policy
                               from. Enter '*' to give Monte Carlo access to
                               all databases. This option can be passed
                               multiple times for more than one database.
                               [required]

  --data-bucket-name TEXT      Name of a S3 bucket storing the data for your
                               Glue/Athena tables. If this option is not
                               specified the bucket names are derived (looked
                               up) from the tables in your databases. This
                               option can be passed multiple times for more
                               than one bucket. Enter '*' to give Monte Carlo
                               access to all buckets.

  --resource-aws-region TEXT   Override the AWS region where the resource is
                               located. Defaults to the region where the
                               collector is hosted.

  --resource-aws-profile TEXT  Override the AWS profile use by the CLI for the
                               resource. This can be helpful if the resource
                               and collector are in different accounts.

  --collector-id UUID          ID for the data collector. To disambiguate
                               accounts with multiple collectors.

  --help                       Show this message and exit.

2. Create an access role

  1. Run montecarlo integrations create-role glue_access_policy.json. If the data account is not the same as the collector account, use --aws-profile to pass the data account profile.
  2. The command prints a role ARN and an external id, they are used in the next section.
$ montecarlo integrations create-role --help
Usage: montecarlo integrations create-role [OPTIONS] FILE

  Create an IAM role from a policy FILE. The returned role ARN and external
  id should be used for adding lake assets.

Options:
  --aws-profile TEXT  Override the AWS profile used by the CLI, which
                      determines where the role is created. This can be
                      helpful when the account that manages the asset is not
                      the same as the collector.

  --help              Show this message and exit.

3. Provide role information to Monte Carlo

  1. Run montecarlo integrations add-glue with the necessary parameters.
$ montecarlo integrations add-glue --help
Usage: montecarlo integrations add-glue [OPTIONS]

  Setup a Glue integration. For metadata.

Options:
  --region TEXT        Glue catalog region. If not specified the region the
                       collector is deployed in is used.
  --role TEXT          Assumable role ARN to use for accessing AWS resources.
                       [required]
  --external-id TEXT   An external id, per assumable role conditions.
  --name TEXT          Friendly name for the created warehouse. Name must be
                       unique.
  --collector-id UUID  ID for the data collector. To disambiguate accounts
                       with multiple collectors.
  --skip-validation    Skip all connection tests. This option cannot be used
                       with 'validate-only'.
  --validate-only      Run connection tests without adding. This option cannot
                       be used with 'skip-validation'.
  --auto-yes           Skip any interactive approval.
  --option-file FILE   Read configuration from FILE.
  --help               Show this message and exit.

Option 2: Use the AWS UI

1. Create an access role

  1. Follow the steps outlined in Creating IAM Roles to create a role with the policy below, replacing the values REGION, ACCOUNT_ID, S3_ARN (of the S3 bucket(s) storing the data for your Glue/Athena tables - you can alternatively pass "*" to give access to all buckets), and DATABASE_NAME.
  2. The role ARN and external ID should be saved to be used in the next step.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "<S3_ARN>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "glue:GetConnections",
            "Resource": [
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:connection/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "glue:GetDatabases",
            "Resource": [
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:GetTable",
                "glue:GetPartitions",
                "glue:GetPartition"
            ],
            "Resource": [
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>",
                "arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/<DATABASE_NAME>/*"
            ]
        }
    ]
}

2. Provide role information to Monte Carlo

  1. Run montecarlo integrations add-glue with the necessary parameters.
$ montecarlo integrations add-glue --help
Usage: montecarlo integrations add-glue [OPTIONS]

  Setup a Glue integration. For metadata.

Options:
  --region TEXT        Glue catalog region. If not specified the region the
                       collector is deployed in is used.
  --role TEXT          Assumable role ARN to use for accessing AWS resources.
                       [required]
  --external-id TEXT   An external id, per assumable role conditions.
  --name TEXT          Friendly name for the created warehouse. Name must be
                       unique.
  --collector-id UUID  ID for the data collector. To disambiguate accounts
                       with multiple collectors.
  --skip-validation    Skip all connection tests. This option cannot be used
                       with 'validate-only'.
  --validate-only      Run connection tests without adding. This option cannot
                       be used with 'skip-validation'.
  --auto-yes           Skip any interactive approval.
  --option-file FILE   Read configuration from FILE.
  --help               Show this message and exit.

Did this page help you?