EMR/Presto logs
S3 events is the recommended mechanism to fetch query logs from EMR.
Prerequisites
Requires permission to create IAM roles and policies in AWS.
To enable query log ingestion by Monte Carlo from an S3 location, follow these steps:
- Create a role that allows S3 access for Monte Carlo's data collector.
- Provide the role's information to Monte Carlo to validate and complete the integration.
Log formats supported by Monte Carlo
Monte Carlo can currently ingest and process the following formats:
- Hive logs created by AWS EMR using its default logging configuration.
- Presto query logs exported to S3. The logs are expected to have the following schema:
{
"queryId":"20200219_173831_00731_6rarz",
"query":"\nselect * from some_table\n\n",
"sessionSchema":"default",
"sessionCatalog":"hive",
"user":"joe",
"userAgent":"python-requests/2.18.4",
"principal":null,
"sourceIp":"1.2.3.4",
"coordinatorIp":"1.2.3.5",
"connectorType":"pyhive",
"environment":"prod",
"startTime":1582133911984,
"endTime":1582134280539,
"outputRows":5955227,
"outputBytes":5884105195,
"writtenRows":0,
"writtenBytes":0,
"peakUserMemoryBytes":6864015462,
"cpuTime":1547293,
"queryFailureType":null,
"queryFailureMessage":null,
"queryFailureCode":null
}
Creating an IAM role for log access
In order to provide access to logs on S3, you will create an IAM role with the necessary API permissions:
- Copy the policy below. Please specify the location of your query logs where <bucket> appears.
{
"Statement": [
{
"Action": [
"s3:GetObjectAcl",
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket>/*",
"arn:aws:s3:::<bucket>"
]
}
],
"Version": "2012-10-17"
}
- Follow the steps outlined here to create the IAM role. You will attach the policy from step 1 to this role as part of the process.
Providing role information to Monte Carlo
You will provide connection details for EMR/Presto logs using Monte Carlo's CLI:
- Please follow this guide to install and configure the CLI.
- Please use the command
montecarlo integrations add-presto-logs
to set up Presto logs or the commandmontecarlo integrations add-hive-logs
to set up EMR logs. For reference, see help below:
$ montecarlo integrations add-presto-logs --help
Usage: montecarlo integrations add-presto-logs [OPTIONS]
Setup a Presto logs integration (S3). For query logs.
Options:
--bucket TEXT S3 Bucket where query logs are contained. [required]
--prefix TEXT Path to query logs. [required]
--role TEXT Assumable role ARN to use for accessing AWS resources.
--external-id TEXT An external id, per assumable role conditions.
--collector-id UUID ID for the data collector. To disambiguate accounts
with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be used
with 'validate-only'.
--validate-only Run connection tests without adding. This option cannot
be used with 'skip-validation'.
--auto-yes Skip any interactive approval. [default: False]
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
$ montecarlo integrations add-hive-logs --help
Usage: montecarlo integrations add-hive-logs [OPTIONS]
Setup a Hive EMR logs integration (S3). For query logs.
Options:
--bucket TEXT S3 Bucket where query logs are contained. [required]
--prefix TEXT Path to query logs. [required]
--role TEXT Assumable role ARN to use for accessing AWS resources.
--external-id TEXT An external id, per assumable role conditions.
--collector-id UUID ID for the data collector. To disambiguate accounts
with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be used
with 'validate-only'.
--validate-only Run connection tests without adding. This option cannot
be used with 'skip-validation'.
--auto-yes Skip any interactive approval. [default: False]
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
Updated 4 months ago