📝
Prerequisites

You are an admin in AWS (for step 1).

You are an Account Owner (for step 2).

This guide outlines how to setup an Agent (with object storage) in your AWS cloud.

The FAQs answer common questions like how to review resources and what integrations are supported.

Steps

1. Deploy the Agent

You can deploy the agent either via CloudFormation or Terraform.

Before getting started please review the Monte Carlo AWS account your collection service is hosted in.

When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Monte Carlo AWS Account ID 590183797493. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

And when deploying an agent if you wish to connect to a VPC please see details here. Specifying a VPC is not strictly required to run the agent, but enables certain connectivity scenarios like when you have an IP allowlist for your resource, want to peer, or deploy in your existing VPC.

Deploy with CloudFormation

You can use the following quick-create link to deploy the agent on any supported region in your AWS account:

If you need to share with a colleague or first review the template you can download a copy here (source).

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

Note that the AWS account ID is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.

Deploy with Terraform

You can use the mcd-agent Terraform module to deploy the Agent on any supported region in your AWS account.

For instance, using the following example Terraform config:

module "apollo" {
  source           = "monte-carlo-data/mcd-agent/aws"
  version          = "1.0.0"
  cloud_account_id = "590183797493"
}

output "function_arn" {
  value       = module.apollo.mcd_agent_function_arn
  description = "Agent Function ARN. To be used in registering."
}

output "invoker_role_arn" {
  value       = module.apollo.mcd_agent_invoker_role_arn
  description = "Assumable role ARN. To be used in registering."
}

output "invoker_role_external_id" {
  value       = module.apollo.mcd_agent_invoker_role_external_id
  description = "Assumable role External ID. To be used in registering."
}

You can build and deploy via:

terraform init && terraform apply

Additional module inputs, options, and defaults can be found here. And other details can be found here.

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

2. Register the Agent

After deploying the agent you can register either via the Monte Carlo UI or CLI.

And see here for examples on how to retrieve deployment output (i.e. registration input).

After this step is complete, all supported integrations using this deployment will automatically use this agent (and object store for troubleshooting and temporary data). You can add these integrations as you normally would using Monte Carlo's UI wizard or CLI.

UI

👍
If you are onboarding a new account, you can also register by following the steps on the onscreen

Navigate to settings/integrations/agents and select the Create button.
Follow the onscreen wizard for the "AWS" Platform and "Data Store + Agent" Type.

Monte Carlo Registration Wizard UI Example

CLI

Use montecarlo agents register-aws-agent to register.

See reference documentation here. And see here for how to install and configure the CLI. For instance:

montecarlo agents register-aws-agent \
  --lambda-arn arn:aws:lambda:us-east-1:123456789:function:mcd-agent-AgentLambda \
  --assumable-role arn:aws:iam::123456789:role/mcd-agent-InvocationRole-12345 \
  --external-id f3840b31-772e-4fe3-8a5f-3aa5ff7e6fec

FAQs

What integrations does the Agent support?

The agent supports all integrations except for the following:

Data Lake Query Logs from S3 Buckets are not supported: Learn more.
Tableau requires using the connected app authentication flow: Learn more.

Note that onboarding (connecting) any supported integration using this deployment will use the agent if one is provisioned. Otherwise, any other integrations will use the cloud service to connect directly.

Some integrations, such as dbt Core, Atlan, and Airflow, either leverage our developer toolkit or are managed by a third party and do not require an agent. These integrations natively push data to Monte Carlo, so an agent is not needed.

Can I use more than one Agent?

Yes, please reach out to [email protected] or contact your account representative if you would like to use more than one.

What regions does the Agent support?

You can deploy an agent in the following AWS regions:

Supported Regions
us-east-1
us-east-2
us-west-1
us-west-2
af-south-1
ap-south-1
ap-south-2
ap-southeast-1
ap-southeast-2
ap-southeast-3
ap-southeast-4
ap-northeast-1
ap-northeast-2
ap-northeast-3
ca-central-1
ca-west-1
eu-central-1
eu-central-2
eu-west-1
eu-west-2
eu-west-3
eu-north-1
eu-south-1
eu-south-2
il-central-1
sa-east-1

Can I review agent resources and code?

Absolutely! You can find details here:

Component	Target	Repository
Code	https://hub.docker.com/r/montecarlodata/agent*	https://github.com/monte-carlo-data/apollo-agent
CloudFormation Resources	https://mcd-public-resources.s3.amazonaws.com/cloudformation/aws_apollo_agent.yaml	https://github.com/monte-carlo-data/mcd-iac-resources#monte-carlos-agent-template-for-customer-hosted-deployments-in-aws-source
Terraform Resources	https://registry.terraform.io/modules/monte-carlo-data/mcd-agent/aws/latest	https://github.com/monte-carlo-data/terraform-aws-mcd-agent

*Note that due to an AWS limitation the agent image is also uploaded and then sourced from AWS ECR when executed on Lambda.

Repository: 590183797493.dkr.ecr.*.amazonaws.com/mcd-agent

How do I retrieve registration input from CloudFormation or Terraform?

CloudFormation

Navigate to your stack and then filter for "To be used in registering" in the Outputs tab.

These three values are to be used when you register an agent with Monte Carlo.

If you prefer you can also use the AWS CLI or API to retrieve these values.

Terraform

Outputs can be retrieved via terraform output. For instance:

 % terraform output
function_arn = "arn:aws:lambda:us-east-1:123456789:function:mcd-agent-service-12345"
invoker_role_arn = "arn:aws:iam::123456789:role/mcd_agent_service_invocation_role12345"
invoker_role_external_id = "example"

How do I monitor the Agent?

The Agent automatically generates a log of all operations, which can retrieved via this CLI command or from CloudWatch. Operational metrics can similarly be retrieved from CloudWatch metrics (or the Lambda console).

See additional details here.

How do I upgrade the Agent?

Please refer to the documentation here.

Can I further constraint inbound access (ingress) to the Agent?

👍
VPC endpoints by default
For all deployments on the V2 Platform or newer, Monte Carlo uses VPC endpoints to communicate with the AWS agent by default. This constraint is no longer applicable. Additional details and limitations can be found here.
To update the IAM invoke policy permissions to include an aws:SourceVpce condition, refer to this document.

Absolutely! By default this is done via the defined trust policy in the invocation role, but if you prefer you can further restrict requests via an IP allowlist. For instance you can:

Reach out to your Monte Carlo representative or support at [email protected] for an IP Address to allowlist. All inbound requests to the agent will originate here.
Update the IAM invoke policy permissions to include a aws:SourceIp condition with the IP address from step #1. If the agent was deployed via CloudFormation this the IAM role containing the policy will have a logical ID of InvocationRole. And for instance the updated policy can be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "<AGENT_FUNCTION>"
            ],
            "Effect": "Allow",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "<IP>/32"
                    ]
                }
            }
        }
    ]
}

Test connectivity between Monte Carlo's Service and the Agent (e.g. via the health CLI command or the "test" button on the UI).

You can also do this before registering an agent if you prefer. If you do so you can skip step #3 as reachability is automatically validated during registration.

For more information on connectivity, please refer to our Network Connectivity documentation.

Can I use private endpoints to configure inbound access (ingress) to the Agent?

For all deployments on the V2 Platform or newer, Monte Carlo will use VPC endpoints to communicate with the AWS agent by default. Additional details and limitations can be found here.

You can update the IAM invoke policy permissions to include an aws:SourceVpce condition with the endpoint provided in the table below. If the agent was deployed via CloudFormation, the IAM role containing the policy will have a logical ID of InvocationRole. For example, the updated policy could be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "<AGENT_FUNCTION>"
            ],
            "Effect": "Allow",
            "Condition": {
                "StringEquals": {
                    "aws:SourceVpce": "<VPCE_ID_FROM_TABLE_BELOW>"
                }
            }
        }
    ]
}

And for reference, below is the regional mapping for the V2 Platform:

Region	VPCE ID
us-east-1	vpce-019013ff90cf68727
us-east-2	vpce-012bbc4487e64055a
us-west-1	vpce-061dfb089827eef5e
us-west-2	vpce-0fd5fcf09921fe336
af-south-1	vpce-0309c6adabc62ee18
ap-south-1	vpce-0ad230c44f5e778ec
ap-south-2	vpce-0329e929b5b0ef117
ap-southeast-1	vpce-0d852cd249f75c98f
ap-southeast-2	vpce-0f879dec7e0f72e4c
ap-southeast-3	vpce-01b00e012bf003e7f
ap-southeast-4	vpce-07b97088fab273a7e
ap-northeast-1	vpce-03119d6dacf373214
ap-northeast-2	vpce-0cc30de1838047e40
ap-northeast-3	vpce-037a4f877617d88ad
ca-central-1	vpce-017b70e4d73cd789a
ca-west-1	vpce-0718d13003e07f530
eu-central-1	vpce-02b2728832eb229fc
eu-central-2	vpce-0a31bc7d13be7a02a
eu-north-1	vpce-074aecb112e945f49
eu-south-1	vpce-0863b6ff9e264d718
eu-south-2	vpce-08d11b71a2e25cfd0
eu-west-1	vpce-0997154afc4527ba9
eu-west-2	vpce-07b61fafd692c56bf
eu-west-3	vpce-069fe27d81aad21bb
il-central-1	vpce-0eb968f248f3aea0c
sa-east-1	vpce-082af521c2cc84550

Note that this condition cannot be used with the source IP condition outlined here as requests will not be sourced from that IP address and instead will use the AWS backbone.

For more information on connectivity, please refer to our Network Connectivity documentation.

Can I further constraint outbound access (egress) from the Agent?

👍
If you are connecting to a VPC with limited or no outbound internet connection, you will need to include VPC endpoints for the following services:

S3

CloudWatch

ECR*

Lambda*

CloudFormation**

*Not necessary if remote upgrades are disabled.
**Not necessary if remote upgrades are disabled or if using Terraform.

Absolutely! As with any Lambda based service you can connect to a VPC to enable connectivity to your resources.

Some scenarios where you might want to do this can include:

You want to IP allowlist connectivity between the agent and your resource.
You want to deploy the agent in a new VPC and peer and/or setup privatelink between services.
You want to deploy the agent in your existing VPC.

CloudFormation

With CloudFormation deployments you can specify a VPC ID and at least two private subnets in that VPC as parameters.

To assist with scenarios 1-2, after review, you can use this (source) template to create a new VPC.

Then, specify the generated VPC and private subnets when deploying the agent.

This (source) example template further demonstrates how you can deploy an agent with a connected VPC in one CloudFormation stack by leveraging nesting:

After deploying, the outputs of this stack include a PublicIP, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up a privatelink (e.g., via VPC endpoint service).

Terraform

With Terraform deployments you can specify at least two private subnets in the private_subnets variable.

To assist with scenarios 1-2, after review, you can use this module to connect an agent to a VPC. And then specify the generated private subnets when deploying the agent.

The following example template demonstrates how you can deploy an agent with a connected VPC with Terraform:

data "aws_region" "current" {}

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name               = "apollo-vpc"
  cidr               = "10.0.0.0/16"
  azs                = formatlist("${data.aws_region.current.name}%s", ["a", "b"])
  private_subnets    = ["10.0.0.0/24", "10.0.1.0/24"]
  public_subnets     = ["10.0.2.0/24", "10.0.3.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

resource "aws_vpc_endpoint" "s3" {
  vpc_id          = module.vpc.vpc_id
  service_name    = "com.amazonaws.${data.aws_region.current.name}.s3"
  route_table_ids = concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids)
}

module "apollo" {
  source           = "monte-carlo-data/mcd-agent/aws"
  cloud_account_id = "590183797493"
  region           = data.aws_region.current.name
  private_subnets  = module.vpc.private_subnets
}

output "function_arn" {
  value       = module.apollo.mcd_agent_function_arn
  description = "Agent Function ARN. To be used in registering."
}

output "invoker_role_arn" {
  value       = module.apollo.mcd_agent_invoker_role_arn
  description = "Assumable role ARN. To be used in registering."
}

output "invoker_role_external_id" {
  value       = module.apollo.mcd_agent_invoker_role_external_id
  description = "Assumable role External ID. To be used in registering."
}

output "public_ip" {
  value       = module.vpc.nat_public_ips
  description = "IP address from which agent resources access the Internet (e.g. for IP whitelisting)."
}

After deploying the outputs include a public_ip, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up privatelink (e.g. via VPC endpoint service).

How do I check the reachability between Monte Carlo and the Agent?

Please refer to the documentation here.

How do I debug connectivity between the Agent and my integration?

Please refer to the documentation here.