Athena
Prerequisites
Requires permission to create IAM roles and policies in AWS.
To connect Monte Carlo to Athena for query logs and data health queries, follow these steps:
- Install the CLI, it will be used in the next steps.
- Create a workgroup for Monte Carlo queries.
- Create an IAM role that allows Athena access for Monte Carlo's data collector.
- Provide the role's information to Monte Carlo to validate and complete the integration.
Installing the CLI
Please follow this guide to install and configure the CLI.
Creating a workgroup for Monte Carlo queries
Create a workgroup for Monte Carlo to use to perform queries, see instructions here. An S3 location (bucket and folder) to which query results will be written must be designated. It is recommended that the bucket be in the same region as the catalog that Athena uses.
Creating an IAM role for Athena access
In order to provide access to Athena, you will create an IAM role with the necessary API permissions:
- Create a JSON file on your computer with the following content, replacing the placeholders:
The CLI command
athena-policy-gen
can be used auto-generate this policy.
<region>
: you can specify the Athena AWS region or*
for the default region.<account-id>
: the Athena AWS account id.<data-bucket>
: the S3 bucket storing the data for your Athena tables - if more than one bucket, just add the others to the resource list as well.<database-name>
: you can specify one or more database names or use*
to give Monte Carlo access to all Athena databases.<montecarlo-workgroup>
: the workgroup created on the previous step.<results-bucket>
: the bucket configured for the workgroup.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<data-bucket>",
"arn:aws:s3:::<results-bucket>"
]
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::<data-bucket>/*",
"arn:aws:s3:::<results-bucket>/*"
]
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": [
"arn:aws:s3:::<results-bucket>/*"
]
},
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:StopQueryExecution",
"athena:GetQueryResults"
],
"Resource": "arn:aws:athena:<region>:<account-id>:workgroup/<montecarlo-workgroup>"
},
{
"Effect": "Allow",
"Action": "athena:ListWorkGroups",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "athena:ListDatabases",
"Resource": [
"arn:aws:athena:<region>:<account-id>:datacatalog/*"
]
},
{
"Effect": "Allow",
"Action": "glue:GetDatabases",
"Resource": [
"arn:aws:glue:<region>:<account-id>:catalog",
"arn:aws:glue:<region>:<account-id>:database/<database-name>"
]
},
{
"Effect": "Allow",
"Action": [
"athena:GetQueryExecution",
"athena:BatchGetQueryExecution",
"athena:ListQueryExecutions",
"athena:GetWorkGroup"
],
"Resource": [
"arn:aws:athena:<region>:<account-id>:workgroup/*",
"arn:aws:athena:<region>:<account-id>:datacatalog/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetTables",
"glue:GetTable",
"glue:GetPartitions",
"glue:GetPartition"
],
"Resource": [
"arn:aws:glue:<region>:<account-id>:catalog",
"arn:aws:glue:<region>:<account-id>:database/<database-name>",
"arn:aws:glue:<region>:<account-id>:table/<database-name>/*"
]
}
]
}
- Please follow the steps outlined here to create the create an IAM role with the policy from step 1. The role ARN and external ID should be saved to be used in the next step.
Recommendation - use infrastructure as code
The CLI command
cf-role-gen
can be used to help create a Data Collector compatible IAM role by generating a CloudFormation template with one or more IAM policies.
Providing role information to Monte Carlo
Please use the command montecarlo integrations add-athena
to set up Athena connectivity. For reference, see help for this command below:
$ montecarlo integrations add-athena --help
Usage: montecarlo integrations add-athena [OPTIONS]
Setup an Athena integration. For query logs and health queries.
Options:
--catalog TEXT Glue data catalog. If not specified the AwsDataCatalog
is used.
--workgroup TEXT Workbook for running queries and retrieving logs. If
not specified the primary is used.
--region TEXT Athena cluster region. If not specified the region the
collector is deployed in is used.
--name TEXT Friendly name of the warehouse which the connection
will belong to.
--role TEXT Assumable role ARN to use for accessing AWS resources.
[required]
--external-id TEXT An external id, per assumable role conditions.
--collector-id UUID ID for the data collector. To disambiguate accounts
with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be used
with 'validate-only'.
--validate-only Run connection tests without adding. This option cannot
be used with 'skip-validation'.
--auto-yes Skip any interactive approval.
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
Updated about 1 year ago