S3 Events - Metadata Graveyard
PrerequisitesTo complete this guide, you will need admin credentials for AWS
To track data freshness and volume at scale for tables stored on S3, follow these steps:
- Identify data lake buckets.
- Configure event notifications for data lake buckets.
- Enable events to complete the integration.
Identify data lake buckets
Identify buckets that store data for external tables which you want to enable health metric collection for. After a metadata job is enabled your representative can help with this.
Configure event notifications
The CLI can be used to configure event notifications automatically. See here.
If you prefer using the AWS console to configure S3 event notification for a data lake bucket, please follow one of the guides below, based on your environment.
Guide | Are the bucket and the data collector in the same region? | Are the bucket and the data collector in the same AWS account? | Are there existing S3 event triggers? |
---|---|---|---|
Yes | Doesn't matter | No | |
No | Yes | No | |
Doesn't matter | Yes | Yes | |
No | No | No | |
No | No | Yes |
Enable events
You will enable events using Monte Carlo's CLI:
- Please follow this guide to install and configure the CLI.
- Please use the command
montecarlo integrations toggle-metadata-events --connection-type [hive-mysql|glue|databricks-metastore]
to enable events based on your metastore.
$ montecarlo integrations toggle-metadata-events --help
Usage: montecarlo integrations toggle-metadata-events [OPTIONS]
Toggle S3 metadata events. For tracking data freshness and volume at
scale. Requires s3 notifications to be configured first.
Options:
--enable / --disable Enable or disable events. Enables if not
specified.
--connection-id UUID ID for the connection.
--connection-type [hive-mysql|glue|databricks-metastore]
Type of the integration. This option cannot
be used with 'connection-id'.
--option-file FILE Read configuration from FILE.
--help Show this message and exit.
Scenario One
Follow these steps to enable S3 events if your needs fit under "scenario one":
- Retrieve relevant SQS ARNs
- Retrieve your account ID
- Open the S3 event management pane
- Update the SQS access policy
- Create event notification
Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
- Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:

- Select the βOutputsβ tab:

- Save the Metadata Queue ARN for later
Key: MetadataEventQueue
Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.
- From the console, select your username in the upper right corner.
- Select βMy Accountβ.
- Save the Account Id (without dashes) for later.
Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.
- Open the S3 Console and search for the bucket that you would like to enable events for.
- Select the bucket.
- Save the bucket ARN by selecting βCopy Bucket ARNβ for later.
- Select the βPropertiesβ tab. Leave this page open you will come back to it later.
Update the SQS access policy
Follow these steps to allow your S3 bucket to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the S3 Bucket ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID
- Open the SQS console in the account the Monte Carlo Collector was deployed to
- Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
- Select the queue and confirm the the ARN matches the ARN you saved previously
- Select the βAccess Policyβ Tab and Select βEditβ.
If the access policy is empty or looks something like this:
{
"Version": "2012-10-17",
"Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}
Paste the following (replacing any values in brackets):
- The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
- The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
- The S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection.
{
"Version":"2008-10-17",
"Statement":[
{
"Sid":"__owner",
"Effect":"Allow",
"Principal":{
"AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
},
"Action":"SQS:*",
"Resource":"<EVENT_QUEUE_ARN>"
},
{
"Sid":"__sender",
"Effect":"Allow",
"Principal":{
"AWS":"*"
},
"Action":"SQS:SendMessage",
"Resource":"<EVENT_QUEUE_ARN>",
"Condition":{
"ArnLike":{
"aws:SourceArn":[
"<S3_ARN>"
]
}
}
}
]
}
But, if the access policy already has a SID with β__senderβ (i.e. looks like above) append your S3_ARN to the SourceArn list instead. The S3_ARN was saved in the "Locate the S3 event management pane" subsection.
"aws:SourceArn": [
"arn:aws:s3:::existing_bucket",
"<S3_ARN>"
]
Create event notification
Follow these steps to create an event notification in S3.
- Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
- Select βCreate event notificationβ under Event notifications.
- Fill in a meaningful name.
- Optionally specify a prefix and/or suffix.
- Select βAll object create eventsβ and βAll object delete eventsβ under Event types.

- Enter the SQS queue ARN you had saved from the "Retrieve relevant SQS ARNs" subsection as the Destination queue ARN.

- Save changes.
That's it! You are now set up with s3 metadata events.
Scenario Two
Follow these steps to enable S3 events if your needs fit under "scenario two":
- Retrieve relevant SQS ARNs
- Retrieve your account ID
- Open the S3 event management pane
- Create a SNS Topic
- Update the SQS access policy
- Create event notification
- Create a SNS subscription
Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
- Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:

- Select the βOutputsβ tab:

- Save the Metadata Queue ARN for later
Key: MetadataEventQueue
Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.
- From the console, select your username in the upper right corner.
- Select βMy Accountβ.
- Save the Account Id (without dashes) for later.
Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.
- Open the S3 Console and search for the bucket that you would like to enable events for
- Select the bucket
- Save the bucket ARN by selecting βCopy Bucket ARNβ for later.
- Select the βPropertiesβ tab. Leave this page open you will come back to it later.
Create a SNS Topic
What region should I create my topic in?Make sure you are in the same region as the bucket you want to add an event for.
- Open the SNS console and select βTopicsβ
- Select βCreate Topicβ. Choose "Standard" type, enter a meaningful name and fill any optional fields
Itβs highly recommended to enable delivery status logging for SQS.
- Select βCreate Topicβ and save the Topic ARN for later.
- Update (append) the topic you just created with the following policy statement:
- SNS_ARN is the the ARN from above
- S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "SNS:Publish",
"Resource": "<SNS_ARN>",
"Condition": {
"StringEquals": {
"aws:SourceArn": "<S3_ARN>"
}
}
}
You may need to include a "Sid" here too.
- Save changes.
Update the SQS access policy
Follow these steps to allow your SNS topic to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the SNS Topic ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID.
- Open the SQS console in the account the Monte Carlo Collector was deployed to
- Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
- Select the queue and confirm the the ARN matches the ARN you saved previously
- Select the βAccess Policyβ Tab and Select βEditβ.
If the access policy is empty or looks something like this:
{
"Version": "2012-10-17",
"Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}
Paste the following (replacing any values in brackets):
- The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
- The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
- The SNS_ARN is the SNS ARN, which you saved in the "Create a SNS Topic" subsection.
Be sure to use the SNS topic ARN and not the S3 bucket ARN here!
{
"Version":"2008-10-17",
"Statement":[
{
"Sid":"__owner",
"Effect":"Allow",
"Principal":{
"AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
},
"Action":"SQS:*",
"Resource":"<EVENT_QUEUE_ARN>"
},
{
"Sid":"__sender",
"Effect":"Allow",
"Principal":{
"AWS":"*"
},
"Action":"SQS:SendMessage",
"Resource":"<EVENT_QUEUE_ARN>",
"Condition":{
"ArnLike":{
"aws:SourceArn":[
"<SNS_ARN>"
]
}
}
}
]
}
But, if the access policy already has a SID with β__senderβ (i.e. looks like above) append your SNS_ARN to the SourceArn list instead. The SNS_ARN was saved in the "Create a SNS Topic" subsection.
"aws:SourceArn": [
"arn:aws:s3:::existing_bucket",
"<SNS_ARN>"
]
Create event notification
Follow these steps to create an event notification in S3.
- Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
- Select βCreate event notificationβ under Event notifications.
- Fill in a meaningful name.
- Optionally specify a prefix and/or suffix.
- Select βAll object create eventsβ and βAll object delete eventsβ under Event types.

- Enter the SNS queue ARN you had saved from the "Create a SNS Topic" subsection as the Destination topic ARN.

- Save changes.
Create a SNS subscription
Follow these steps subscribe SQS to the SNS topic you created and enabled notifications for.
- Open the SNS console and select βSubscriptionsβ.
- Select βCreate Subscriptionβ.
- Select the topic ARN you saved above in the "Create a SNS Topic" subsection.
- Select Amazon SQS as the protocol
- Select (or paste) the SQS ARN you saved above in the "Retrieve relevant SQS ARNs" subsection.

Be sure to select βEnable raw message deliveryβ!
- Select βCreate Subscriptionβ.
- Validate the status is βConfirmedβ.
That's it! You are now set up with s3 metadata events.
Scenario Three
Heads upThese steps may temporarily affect (disable) existing event notifications!
Follow these steps to enable S3 events if your needs fit under "scenario three":
Same steps as Scenario two, except you would also need to temporarily delete the conflicting trigger and subscribe it to the SNS topic that was created. This is known as SNS Fanout and allows you to publish from one endpoint to multiple destinations, thus allowing for parallel asynchronous processing.
That's it! You are now set up with s3 metadata events.
Scenario Four
Follow these steps to enable S3 events if your needs fit under "scenario four":
- Retrieve relevant SQS ARNs
- Retrieve your account ID
- Open the S3 event management pane
- Create a SNS Topic
- Update the SQS access policy
- Create event notification
- Create a SNS subscription
Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
- Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:

- Select the βOutputsβ tab

- Save the Metadata Queue ARN for later
Key: MetadataEventQueue
Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.
Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.
- From the console, select your username in the upper right corner.
- Select βMy Accountβ.
- Save the Account Id (without dashes) for later.
Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.
- Open the S3 Console and search for the bucket that you would like to enable events for.
- Select the bucket.
- Save the bucket ARN by selecting βCopy Bucket ARNβ for later.
- Select the βPropertiesβ tab. Leave this page open you will come back to it later.
Create a SNS Topic
What region should I create my topic in?Make sure you are in the same region as the bucket you want to add an event for.
- Open the SNS console and select βTopicsβ.
- Select βCreate Topicβ. Choose "Standard" type, enter a meaningful name and fill any optional fields.
Itβs highly recommended to enable delivery status logging for SQS.
- Select βCreate Topicβ and save the Topic ARN for later.
- Update (append) the topic you just created with the following policy statement:
- SNS_ARN is the the ARN from above.
- S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection.
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "SNS:Publish",
"Resource": "<SNS_ARN>",
"Condition": {
"StringEquals": {
"aws:SourceArn": "<S3_ARN>"
}
}
}
You may need to include a "Sid" here too.
- Update (append) the topic you just created with the following policy statement:
- COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
- SNS_ARN is the the ARN from above.
{
"Sid": "__dc_sub",
"Effect": "Allow",
"Principal": {
"AWS": "<COLLECTOR_ACCOUNT_ID>"
},
"Action": "sns:Subscribe",
"Resource": "<SNS_ARN>"
}
- Save changes
Update the SQS access policy
Follow these steps to allow your SNS topic to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the SNS Topic ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID.
- Open the SQS console in the account the Monte Carlo Collector was deployed to
- Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
- Select the queue and confirm the the ARN matches the ARN you saved previously
- Select the βAccess Policyβ Tab and Select βEditβ.
If the access policy is empty or looks something like this:
{
"Version": "2012-10-17",
"Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}
Paste the following (replacing any values in brackets):
- The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
- The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
- The SNS_ARN is the SNS ARN, which you saved in the "Create a SNS Topic" subsection.
Be sure to use the SNS topic ARN and not the S3 bucket ARN here!
{
"Version":"2008-10-17",
"Statement":[
{
"Sid":"__owner",
"Effect":"Allow",
"Principal":{
"AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
},
"Action":"SQS:*",
"Resource":"<EVENT_QUEUE_ARN>"
},
{
"Sid":"__sender",
"Effect":"Allow",
"Principal":{
"AWS":"*"
},
"Action":"SQS:SendMessage",
"Resource":"<EVENT_QUEUE_ARN>",
"Condition":{
"ArnLike":{
"aws:SourceArn":[
"<SNS_ARN>"
]
}
}
}
]
}
But, if the access policy already has a SID with β__senderβ (i.e. looks like above) append your SNS_ARN to the SourceArn list instead. The SNS_ARN was saved in the "Create a SNS Topic" subsection.
"aws:SourceArn": [
"arn:aws:s3:::existing_bucket",
"<SNS_ARN>"
]
Create event notification
Follow these steps to create an event notification in S3.
- Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
- Select βCreate event notificationβ under Event notifications.
- Fill in a meaningful name.
- Optionally specify a prefix and/or suffix.
- Select βAll object create eventsβ and βAll object delete eventsβ under Event types.

- Enter the SNS queue ARN you had saved from the "Create a SNS Topic" subsection as the Destination topic ARN.

- Save changes.
Create a SNS subscription
Follow these steps subscribe SQS to the SNS topic you created and enabled notifications for.
- Open the SNS console and select βSubscriptionsβ.
- Select βCreate Subscriptionβ.
- Select the topic ARN you saved above in the "Create a SNS Topic" subsection.
- Select Amazon SQS as the protocol
- Select (or paste) the SQS ARN you saved above in the "Retrieve relevant SQS ARNs" subsection.

Be sure to select βEnable raw message deliveryβ.
- Select βCreate Subscriptionβ
- Validate the status is βConfirmedβ.
The subscription may not necessarily confirm right away. Please check back in 24 hours.
That's it! You are now set up with s3 metadata events.
Scenario Five
Heads upThese steps may temporarily affect (disable) existing event notifications!
Follow these steps to enable S3 events if your needs fit under "scenario five":
Same steps as Scenario four, except you would also need to temporarily delete the conflicting trigger and subscribe it to the SNS topic that was created. This is known as SNS Fanout and allows you to publish from one endpoint to multiple destinations, thus allowing for parallel asynchronous processing.
That's it! You are now set up with s3 metadata events.
Updated 7 days ago