Glue
During setup, the Monte Carlo CLI is used to interact with the Monte Carlo API as well as with AWS. Please follow this guide to install and configure the CLI on your local machine.
Prerequisites
Requires permission to create IAM roles and policies in AWS.
The easiest way to set up Glue is to use the Monte Carlo CLI exclusively (Option 1) as the CLI completely automates the infrastructure creation and policy configuration. In case that is not possible, proceed to Option 2.
Option 1: Use the Monte Carlo CLI [Recommended]
- Generate the Glue access policy
- Create an access role
- Provide role information to Monte Carlo
1. Generate the Glue access policy
- Run
montecarlo discovery glue-policy-gen [parameters] > glue_access_policy.json
with the necessary parameters. If the data account is not the same as the collector account, use--resource-aws-region
and--resource-aws-profile
to pass the data account profile.
$ montecarlo discovery glue-policy-gen --help
Usage: montecarlo discovery glue-policy-gen [OPTIONS]
Generate an IAM policy for Glue. After review, output of this command can
be redirected into `montecarlo integrations create-role` or `montecarlo
discovery cf-role-gen` if you prefer IaC.
Options:
--database-name TEXT Glue/Athena database name to generate a policy
from. Enter '*' to give Monte Carlo access to
all databases. This option can be passed
multiple times for more than one database.
[required]
--data-bucket-name TEXT Name of a S3 bucket storing the data for your
Glue/Athena tables. If this option is not
specified the bucket names are derived (looked
up) from the tables in your databases. This
option can be passed multiple times for more
than one bucket. Enter '*' to give Monte Carlo
access to all buckets.
--resource-aws-region TEXT Override the AWS region where the resource is
located. Defaults to the region where the
collector is hosted.
--resource-aws-profile TEXT Override the AWS profile use by the CLI for the
resource. This can be helpful if the resource
and collector are in different accounts.
--collector-id UUID ID for the data collector. To disambiguate
accounts with multiple collectors.
--help Show this message and exit.
2. Create an access role
- Run
montecarlo integrations create-role glue_access_policy.json
. If the data account is not the same as the collector account, use--aws-profile
to pass the data account profile. - The command prints a role ARN and an external id, they are used in the next section.
$ montecarlo integrations create-role --help
Usage: montecarlo integrations create-role [OPTIONS] FILE
Create an IAM role from a policy FILE. The returned role ARN and external
id should be used for adding lake assets.
Options:
--aws-profile TEXT Override the AWS profile used by the CLI, which
determines where the role is created. This can be
helpful when the account that manages the asset is not
the same as the collector.
--help Show this message and exit.
3. Provide role information to Monte Carlo
- Run
montecarlo integrations add-glue
with the necessary parameters.
$ montecarlo integrations add-glue --help
Usage: montecarlo integrations add-glue [OPTIONS]
Setup a Glue integration. For metadata.
Options:
--region TEXT Glue catalog region. If not specified the region the
collector is deployed in is used.
--role TEXT Assumable role ARN to use for accessing AWS resources.
[required]
--external-id TEXT An external id, per assumable role conditions.
--name TEXT Friendly name for the created warehouse. Name must be
unique.
--collector-id UUID ID for the data collector. To disambiguate accounts
with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be used
with 'validate-only'.
--validate-only Run connection tests without adding. This option cannot
be used with 'skip-validation'.
--auto-yes Skip any interactive approval.
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
Option 2: Use the AWS UI
1. Create an access role
- Follow the steps outlined in Creating IAM Roles to create a role with the policy below, replacing the values REGION, ACCOUNT_ID, S3_ARN (of the S3 bucket(s) storing the data for your Glue/Athena tables - you can alternatively pass
"*"
to give access to all buckets), and DATABASE_NAME. - The role ARN and external ID should be saved to be used in the next step.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"<S3_ARN>"
]
},
{
"Effect": "Allow",
"Action": "glue:GetConnections",
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:connection/*"
]
},
{
"Effect": "Allow",
"Action": "glue:GetDatabases",
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetTables",
"glue:GetTable",
"glue:GetPartitions",
"glue:GetPartition"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DATABASE_NAME>",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/<DATABASE_NAME>/*"
]
}
]
}
2. Provide role information to Monte Carlo
- Run
montecarlo integrations add-glue
with the necessary parameters.
$ montecarlo integrations add-glue --help
Usage: montecarlo integrations add-glue [OPTIONS]
Setup a Glue integration. For metadata.
Options:
--region TEXT Glue catalog region. If not specified the region the
collector is deployed in is used.
--role TEXT Assumable role ARN to use for accessing AWS resources.
[required]
--external-id TEXT An external id, per assumable role conditions.
--name TEXT Friendly name for the created warehouse. Name must be
unique.
--collector-id UUID ID for the data collector. To disambiguate accounts
with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be used
with 'validate-only'.
--validate-only Run connection tests without adding. This option cannot
be used with 'skip-validation'.
--auto-yes Skip any interactive approval.
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
Updated 5 months ago