S3 Events - Metadata

📘

Prerequisites

To complete this guide, you will need admin credentials for AWS

To track data freshness and volume at scale for tables stored on S3, follow these steps:

  1. Identify data lake buckets.
  2. Configure event notifications for data lake buckets.
  3. Enable events to complete the integration.

Identify data lake buckets

Identify buckets that store data for external tables which you want to enable health metric collection for. After a metadata job is enabled your representative can help with this.

Configure event notifications

👍

The CLI can be used to configure event notifications automatically. See here.

If you prefer using the AWS console to configure S3 event notification for a data lake bucket, please follow one of the guides below, based on your environment.

Guide

Are the bucket and the data collector in the same region?

Are the bucket and the data collector in the same AWS account?

Are there existing S3 event triggers?

Scenario one

Yes

Doesn't matter

No

Scenario two

No

Yes

No

Scenario three

Doesn't matter

Yes

Yes

Scenario four

No

No

No

Scenario five

No

No

Yes

Enable events

You will enable events using Monte Carlo's CLI:

  1. Please follow this guide to install and configure the CLI.
  2. Please use the command montecarlo integrations toggle-hive-metadata-events to enable events for a Hive/Presto lake or montecarlo integrations toggle-glue-metadata-events to enable events for a Glue/Athena lake
$ montecarlo integrations toggle-hive-metadata-events --help
Usage: montecarlo integrations toggle-hive-metadata-events 
           [OPTIONS]

  Toggle S3 metadata events for a Hive/Presto lake. For tracking data
  freshness and volume at scale. Requires s3 notifications to be configured
  first.

Options:
  --enable / --disable  Enable or disable events. Enables if not specified.
  --option-file FILE    Read configuration from FILE.
  --help                Show this message and exit.
$ montecarlo integrations toggle-glue-metadata-events --help
Usage: montecarlo integrations toggle-glue-metadata-events 
           [OPTIONS]

  Toggle S3 metadata events for a Glue/Athena lake. For tracking data
  freshness and volume at scale. Requires s3 notifications to be configured
  first.

Options:
  --enable / --disable  Enable or disable events. Enables if not specified.
  --option-file FILE    Read configuration from FILE.
  --help                Show this message and exit.

Scenario One

Follow these steps to enable S3 events if your needs fit under "scenario one":

  1. Retrieve relevant SQS ARNs
  2. Retrieve your account ID
  3. Open the S3 event management pane
  4. Update the SQS access policy
  5. Create event notification

Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

  1. Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:
  1. Select the “Outputs” tab:
  1. Save the Metadata Queue ARN for later
    Key: MetadataEventQueue

Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.

  1. From the console, select your username in the upper right corner.
  2. Select “My Account”.
  3. Save the Account Id (without dashes) for later.

Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.

  1. Open the S3 Console and search for the bucket that you would like to enable events for.
  2. Select the bucket.
  3. Save the bucket ARN by selecting “Copy Bucket ARN” for later.
  4. Select the “Properties” tab. Leave this page open you will come back to it later.

Update the SQS access policy
Follow these steps to allow your S3 bucket to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the S3 Bucket ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID

  1. Open the SQS console in the account the Monte Carlo Collector was deployed to
  2. Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
  3. Select the queue and confirm the the ARN matches the ARN you saved previously
  4. Select the “Access Policy” Tab and Select “Edit”.

If the access policy is empty or looks something like this:

{
  "Version": "2012-10-17",
  "Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}

Paste the following (replacing any values in brackets):

  • The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
  • The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
  • The S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection.
{
   "Version":"2008-10-17",
   "Statement":[
      {
         "Sid":"__owner",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
         },
         "Action":"SQS:*",
         "Resource":"<EVENT_QUEUE_ARN>"
      },
      {
         "Sid":"__sender",
         "Effect":"Allow",
         "Principal":{
            "AWS":"*"
         },
         "Action":"SQS:SendMessage",
         "Resource":"<EVENT_QUEUE_ARN>",
         "Condition":{
            "ArnLike":{
               "aws:SourceArn":[
                  "<S3_ARN>"
               ]
            }
         }
      }
   ]
}

But, if the access policy already has a SID with “__sender” (i.e. looks like above) append your S3_ARN to the SourceArn list instead. The S3_ARN was saved in the "Locate the S3 event management pane" subsection.

"aws:SourceArn": [
            "arn:aws:s3:::existing_bucket",
            "<S3_ARN>"
          ]

Create event notification
Follow these steps to create an event notification in S3.

  1. Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
  2. Select “Create event notification” under Event notifications.
  3. Fill in a meaningful name.
  4. Optionally specify a prefix and/or suffix.
  5. Select “All object create events” and “All object delete events” under Event types.
  1. Enter the SQS queue ARN you had saved from the "Retrieve relevant SQS ARNs" subsection as the Destination queue ARN.
  1. Save changes.

👍

That's it! You are now set up with s3 metadata events.

Scenario Two

Follow these steps to enable S3 events if your needs fit under "scenario two":

  1. Retrieve relevant SQS ARNs
  2. Retrieve your account ID
  3. Open the S3 event management pane
  4. Create a SNS Topic
  5. Update the SQS access policy
  6. Create event notification
  7. Create a SNS subscription

Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

  1. Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:
  1. Select the “Outputs” tab:
  1. Save the Metadata Queue ARN for later
    Key: MetadataEventQueue

Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.

  1. From the console, select your username in the upper right corner.
  2. Select “My Account”.
  3. Save the Account Id (without dashes) for later.

Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.

  1. Open the S3 Console and search for the bucket that you would like to enable events for
  2. Select the bucket
  3. Save the bucket ARN by selecting “Copy Bucket ARN” for later.
  4. Select the “Properties” tab. Leave this page open you will come back to it later.

Create a SNS Topic

📘

What region should I create my topic in?

Make sure you are in the same region as the bucket you want to add an event for.

  1. Open the SNS console and select “Topics”
  2. Select “Create Topic”. Choose "Standard" type, enter a meaningful name and fill any optional fields

🚧

It’s highly recommended to enable delivery status logging for SQS.

  1. Select “Create Topic” and save the Topic ARN for later.
  2. Update (append) the topic you just created with the following policy statement:
  • SNS_ARN is the the ARN from above
  • S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection
{
    "Effect": "Allow",
    "Principal": {
        "AWS": "*"
    },
    "Action": "SNS:Publish",
    "Resource": "<SNS_ARN>",
    "Condition": {
        "StringEquals": {
            "aws:SourceArn": "<S3_ARN>"
        }
    }
}

You may need to include a "Sid" here too.

  1. Save changes.

Update the SQS access policy
Follow these steps to allow your SNS topic to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the SNS Topic ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID.

  1. Open the SQS console in the account the Monte Carlo Collector was deployed to
  2. Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
  3. Select the queue and confirm the the ARN matches the ARN you saved previously
  4. Select the “Access Policy” Tab and Select “Edit”.

If the access policy is empty or looks something like this:

{
  "Version": "2012-10-17",
  "Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}

Paste the following (replacing any values in brackets):

  • The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
  • The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
  • The SNS_ARN is the SNS ARN, which you saved in the "Create a SNS Topic" subsection.

🚧

Be sure to use the SNS topic ARN and not the S3 bucket ARN here!

{
   "Version":"2008-10-17",
   "Statement":[
      {
         "Sid":"__owner",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
         },
         "Action":"SQS:*",
         "Resource":"<EVENT_QUEUE_ARN>"
      },
      {
         "Sid":"__sender",
         "Effect":"Allow",
         "Principal":{
            "AWS":"*"
         },
         "Action":"SQS:SendMessage",
         "Resource":"<EVENT_QUEUE_ARN>",
         "Condition":{
            "ArnLike":{
               "aws:SourceArn":[
                  "<SNS_ARN>"
               ]
            }
         }
      }
   ]
}

But, if the access policy already has a SID with “__sender” (i.e. looks like above) append your SNS_ARN to the SourceArn list instead. The SNS_ARN was saved in the "Create a SNS Topic" subsection.

"aws:SourceArn": [
            "arn:aws:s3:::existing_bucket",
            "<SNS_ARN>"
          ]

Create event notification
Follow these steps to create an event notification in S3.

  1. Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
  2. Select “Create event notification” under Event notifications.
  3. Fill in a meaningful name.
  4. Optionally specify a prefix and/or suffix.
  5. Select “All object create events” and “All object delete events” under Event types.
  1. Enter the SNS queue ARN you had saved from the "Create a SNS Topic" subsection as the Destination topic ARN.
  1. Save changes.

Create a SNS subscription
Follow these steps subscribe SQS to the SNS topic you created and enabled notifications for.

  1. Open the SNS console and select “Subscriptions”.
  2. Select “Create Subscription”.
  3. Select the topic ARN you saved above in the "Create a SNS Topic" subsection.
  4. Select Amazon SQS as the protocol
  5. Select (or paste) the SQS ARN you saved above in the "Retrieve relevant SQS ARNs" subsection.

🚧

Be sure to select “Enable raw message delivery”!

  1. Select “Create Subscription”.
  2. Validate the status is “Confirmed”.

👍

That's it! You are now set up with s3 metadata events.

Scenario Three

🚧

Heads up

These steps may temporarily affect (disable) existing event notifications!

Follow these steps to enable S3 events if your needs fit under "scenario three":

Same steps as Scenario two, except you would also need to temporarily delete the conflicting trigger and subscribe it to the SNS topic that was created. This is known as SNS Fanout and allows you to publish from one endpoint to multiple destinations, thus allowing for parallel asynchronous processing.

👍

That's it! You are now set up with s3 metadata events.

Scenario Four

Follow these steps to enable S3 events if your needs fit under "scenario four":

  1. Retrieve relevant SQS ARNs
  2. Retrieve your account ID
  3. Open the S3 event management pane
  4. Create a SNS Topic
  5. Update the SQS access policy
  6. Create event notification
  7. Create a SNS subscription

Retrieve relevant SQS ARNs
Follow these steps to get the relevant SQS ARNs. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

  1. Open the Cloudformation console and search for the Monte Carlo data collector. Select the stack:
  1. Select the “Outputs” tab
  1. Save the Metadata Queue ARN for later
    Key: MetadataEventQueue

Retrieve your account ID
Follow these steps to retrieve your account ID. If the data collector is managed by Monte Carlo, please reach out to your representative for these values instead.

Be sure you are logged in the same account as the Monte Carlo Collector before proceeding.

  1. From the console, select your username in the upper right corner.
  2. Select “My Account”.
  3. Save the Account Id (without dashes) for later.

Open the S3 event management pane
Follow these steps to help locate the event configuration page for the bucket you want to enable events for.

  1. Open the S3 Console and search for the bucket that you would like to enable events for.
  2. Select the bucket.
  3. Save the bucket ARN by selecting “Copy Bucket ARN” for later.
  4. Select the “Properties” tab. Leave this page open you will come back to it later.

Create a SNS Topic

📘

What region should I create my topic in?

Make sure you are in the same region as the bucket you want to add an event for.

  1. Open the SNS console and select “Topics”.
  2. Select “Create Topic”. Choose "Standard" type, enter a meaningful name and fill any optional fields.

🚧

It’s highly recommended to enable delivery status logging for SQS.

  1. Select “Create Topic” and save the Topic ARN for later.
  2. Update (append) the topic you just created with the following policy statement:
  • SNS_ARN is the the ARN from above.
  • S3_ARN is the bucket ARN, which you saved in the "Locate the S3 event management pane" subsection.
{
    "Effect": "Allow",
    "Principal": {
        "AWS": "*"
    },
    "Action": "SNS:Publish",
    "Resource": "<SNS_ARN>",
    "Condition": {
        "StringEquals": {
            "aws:SourceArn": "<S3_ARN>"
        }
    }
}

You may need to include a "Sid" here too.

  1. Update (append) the topic you just created with the following policy statement:
  • COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
  • SNS_ARN is the the ARN from above.
{ 
  "Sid": "__dc_sub",
    "Effect": "Allow",
    "Principal": {
        "AWS": "<COLLECTOR_ACCOUNT_ID>"
    },
    "Action": "sns:Subscribe",
    "Resource": "<SNS_ARN>"
}
  1. Save changes

Update the SQS access policy
Follow these steps to allow your SNS topic to write to the relevant queue. If the data collector is managed by Monte Carlo these steps can be skipped by just sending the SNS Topic ARN to your representative. Your representative will in turn send you the SQS ARN and relevant account ID.

  1. Open the SQS console in the account the Monte Carlo Collector was deployed to
  2. Search for the queue. The name follows this structure: {CF_STACK}-MetadataEventQueue-{RANDOM_STR}
  3. Select the queue and confirm the the ARN matches the ARN you saved previously
  4. Select the “Access Policy” Tab and Select “Edit”.

If the access policy is empty or looks something like this:

{
  "Version": "2012-10-17",
  "Id": "arn:aws:sqs:<region>:<account>:<name>/SQSDefaultPolicy"
}

Paste the following (replacing any values in brackets):

  • The COLLECTOR_ACCOUNT_ID is the account ID you saved in the "Retrieve your account ID" subsection.
  • The EVENT_QUEUE_ARN is the ARN you saved in the "Retrieve relevant SQS ARNs subsection".
  • The SNS_ARN is the SNS ARN, which you saved in the "Create a SNS Topic" subsection.

🚧

Be sure to use the SNS topic ARN and not the S3 bucket ARN here!

{
   "Version":"2008-10-17",
   "Statement":[
      {
         "Sid":"__owner",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam::<COLLECTOR_ACCOUNT_ID>:root"
         },
         "Action":"SQS:*",
         "Resource":"<EVENT_QUEUE_ARN>"
      },
      {
         "Sid":"__sender",
         "Effect":"Allow",
         "Principal":{
            "AWS":"*"
         },
         "Action":"SQS:SendMessage",
         "Resource":"<EVENT_QUEUE_ARN>",
         "Condition":{
            "ArnLike":{
               "aws:SourceArn":[
                  "<SNS_ARN>"
               ]
            }
         }
      }
   ]
}

But, if the access policy already has a SID with “__sender” (i.e. looks like above) append your SNS_ARN to the SourceArn list instead. The SNS_ARN was saved in the "Create a SNS Topic" subsection.

"aws:SourceArn": [
            "arn:aws:s3:::existing_bucket",
            "<SNS_ARN>"
          ]

Create event notification
Follow these steps to create an event notification in S3.

  1. Return to the page you had opened in step 4 of the "Open the S3 event management pane" subsection.
  2. Select “Create event notification” under Event notifications.
  3. Fill in a meaningful name.
  4. Optionally specify a prefix and/or suffix.
  5. Select “All object create events” and “All object delete events” under Event types.
  1. Enter the SNS queue ARN you had saved from the "Create a SNS Topic" subsection as the Destination topic ARN.
  1. Save changes.

Create a SNS subscription
Follow these steps subscribe SQS to the SNS topic you created and enabled notifications for.

  1. Open the SNS console and select “Subscriptions”.
  2. Select “Create Subscription”.
  3. Select the topic ARN you saved above in the "Create a SNS Topic" subsection.
  4. Select Amazon SQS as the protocol
  5. Select (or paste) the SQS ARN you saved above in the "Retrieve relevant SQS ARNs" subsection.

🚧

Be sure to select “Enable raw message delivery”.

  1. Select “Create Subscription”
  2. Validate the status is “Confirmed”.

📘

The subscription may not necessarily confirm right away. Please check back in 24 hours.

👍

That's it! You are now set up with s3 metadata events.

Scenario Five

🚧

Heads up

These steps may temporarily affect (disable) existing event notifications!

Follow these steps to enable S3 events if your needs fit under "scenario five":

Same steps as Scenario four, except you would also need to temporarily delete the conflicting trigger and subscribe it to the SNS topic that was created. This is known as SNS Fanout and allows you to publish from one endpoint to multiple destinations, thus allowing for parallel asynchronous processing.

👍

That's it! You are now set up with s3 metadata events.


Did this page help you?