AWS: Agent Deployment
How-to create and register
Prerequisites
- You are an admin in AWS (for step 1).
- You are an Account Owner (for step 2).
This guide outlines how to setup an Agent (with object storage) in your AWS cloud.
The FAQs answer common questions like how to review resources and what integrations are supported.
Steps
1. Deploy the Agent
You can deploy the agent either via CloudFormation or Terraform.
Before getting started please review the Monte Carlo AWS account your collection service is hosted in.
When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Monte Carlo AWS Account ID 590183797493
. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.
If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.
And when deploying an agent if you wish to connect to a VPC please see details here. Specifying a VPC is not strictly required to run the agent, but enables certain connectivity scenarios like when you have an IP allowlist for your resource, want to peer, or deploy in your existing VPC.
Deploy with CloudFormation
You can use the following quick-create link to deploy the agent on any supported region in your AWS account:
If you need to share with a colleague or first review the template you can download a copy here (source).
When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Monte Carlo AWS Account ID 590183797493
. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.
If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.
Note that the AWS account ID is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.
Deploy with Terraform
You can use the mcd-agent Terraform module to deploy the Agent on any supported region in your AWS account.
For instance, using the following example Terraform config:
module "apollo" {
source = "monte-carlo-data/mcd-agent/aws"
version = "1.0.0"
cloud_account_id = "590183797493"
}
output "function_arn" {
value = module.apollo.mcd_agent_function_arn
description = "Agent Function ARN. To be used in registering."
}
output "invoker_role_arn" {
value = module.apollo.mcd_agent_invoker_role_arn
description = "Assumable role ARN. To be used in registering."
}
output "invoker_role_external_id" {
value = module.apollo.mcd_agent_invoker_role_external_id
description = "Assumable role External ID. To be used in registering."
}
You can build and deploy via:
terraform init && terraform apply
Additional module inputs, options, and defaults can be found here. And other details can be found here.
When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Monte Carlo AWS Account ID 590183797493
. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.
If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.
Note that the AWS account ID is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.
2. Register the Agent
After deploying the agent you can register either via the Monte Carlo UI or CLI.
And see here for examples on how to retrieve deployment output (i.e. registration input).
After this step is complete, all supported integrations using this deployment will automatically use this agent (and object store for troubleshooting and temporary data). You can add these integrations as you normally would using Monte Carlo's UI wizard or CLI.
UI
If you are onboarding a new account, you can also register by following the steps on the onscreen
- Navigate to settings/integrations/agents and select the
Create
button. - Follow the onscreen wizard for the "AWS" Platform and "Data Store + Agent" Type.
CLI
Use montecarlo agents register-aws-agent
to register.
See reference documentation here. And see here for how to install and configure the CLI. For instance:
montecarlo agents register-aws-agent \
--lambda-arn arn:aws:lambda:us-east-1:123456789:function:mcd-agent-AgentLambda \
--assumable-role arn:aws:iam::123456789:role/mcd-agent-InvocationRole-12345 \
--external-id f3840b31-772e-4fe3-8a5f-3aa5ff7e6fec
FAQs
What integrations does the Agent support?
The agent supports all integrations except for the following:
- Data Lake Query Logs from S3 Buckets are not supported: Learn more.
- Tableau requires using the connected app authentication flow: Learn more.
Note that onboarding (connecting) any supported integration using this deployment will use the agent if one is provisioned. Otherwise, any other integrations will use the cloud service to connect directly.
Some integrations, such as dbt Core, Atlan, and Airflow, either leverage our developer toolkit or are managed by a third party and do not require an agent. These integrations natively push data to Monte Carlo, so an agent is not needed.
Can I use more than one Agent?
Yes, please reach out to [email protected] or contact your account representative if you would like to use more than one.
What regions does the Agent support?
You can deploy an agent in the following AWS regions:
Supported Regions |
---|
us-east-1 |
us-east-2 |
us-west-1 |
us-west-2 |
af-south-1 |
ap-south-1 |
ap-south-2 |
ap-southeast-1 |
ap-southeast-2 |
ap-southeast-3 |
ap-southeast-4 |
ap-northeast-1 |
ap-northeast-2 |
ap-northeast-3 |
ca-central-1 |
ca-west-1 |
eu-central-1 |
eu-central-2 |
eu-west-1 |
eu-west-2 |
eu-west-3 |
eu-north-1 |
eu-south-1 |
eu-south-2 |
il-central-1 |
sa-east-1 |
Can I review agent resources and code?
Absolutely! You can find details here:
*Note that due to an AWS limitation the agent image is also uploaded and then sourced from AWS ECR when executed on Lambda.
Repository: 590183797493.dkr.ecr.*.amazonaws.com/mcd-agent
How do I retrieve registration input from CloudFormation or Terraform?
CloudFormation
Navigate to your stack and then filter for "To be used in registering" in the Outputs tab.
These three values are to be used when you register an agent with Monte Carlo.
If you prefer you can also use the AWS CLI or API to retrieve these values.
Terraform
Outputs can be retrieved via terraform output
. For instance:
% terraform output
function_arn = "arn:aws:lambda:us-east-1:123456789:function:mcd-agent-service-12345"
invoker_role_arn = "arn:aws:iam::123456789:role/mcd_agent_service_invocation_role12345"
invoker_role_external_id = "example"
How do I monitor the Agent?
The Agent automatically generates a log of all operations, which can retrieved via this CLI command or from CloudWatch. Operational metrics can similarly be retrieved from CloudWatch metrics (or the Lambda console).
See additional details here.
How do I upgrade the Agent?
Please refer to the documentation here.
Can I further constraint inbound access (ingress) to the Agent?
VPC endpoints by default
For all deployments on the V2 Platform or newer, Monte Carlo uses VPC endpoints to communicate with the AWS agent by default. This constraint is no longer applicable. Additional details and limitations can be found here.
To update the IAM invoke policy permissions to include an
aws:SourceVpce
condition, refer to this document.
Absolutely! By default this is done via the defined trust policy in the invocation role, but if you prefer you can further restrict requests via an IP allowlist. For instance you can:
- Reach out to your Monte Carlo representative or support at [email protected] for an IP Address to allowlist. All inbound requests to the agent will originate here.
- Update the IAM invoke policy permissions to include a
aws:SourceIp
condition with the IP address from step #1. If the agent was deployed via CloudFormation this the IAM role containing the policy will have a logical ID ofInvocationRole
. And for instance the updated policy can be:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"<AGENT_FUNCTION>"
],
"Effect": "Allow",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"<IP>/32"
]
}
}
}
]
}
- Test connectivity between Monte Carlo's Service and the Agent (e.g. via the health CLI command or the "test" button on the UI).
You can also do this before registering an agent if you prefer. If you do so you can skip step #3 as reachability is automatically validated during registration.
For more information on connectivity, please refer to our Network Connectivity documentation.
Can I use private endpoints to configure inbound access (ingress) to the Agent?
For all deployments on the V2 Platform or newer, Monte Carlo will use VPC endpoints to communicate with the AWS agent by default. Additional details and limitations can be found here.
You can update the IAM invoke policy permissions to include an aws:SourceVpce
condition with the endpoint provided in the table below. If the agent was deployed via CloudFormation, the IAM role containing the policy will have a logical ID of InvocationRole
. For example, the updated policy could be:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"<AGENT_FUNCTION>"
],
"Effect": "Allow",
"Condition": {
"StringEquals": {
"aws:SourceVpce": "<VPCE_ID_FROM_TABLE_BELOW>"
}
}
}
]
}
And for reference, below is the regional mapping for the V2 Platform:
Region | VPCE ID |
---|---|
us-east-1 | vpce-019013ff90cf68727 |
us-east-2 | vpce-012bbc4487e64055a |
us-west-1 | vpce-061dfb089827eef5e |
us-west-2 | vpce-0fd5fcf09921fe336 |
af-south-1 | vpce-0309c6adabc62ee18 |
ap-south-1 | vpce-0ad230c44f5e778ec |
ap-south-2 | vpce-0329e929b5b0ef117 |
ap-southeast-1 | vpce-0d852cd249f75c98f |
ap-southeast-2 | vpce-0f879dec7e0f72e4c |
ap-southeast-3 | vpce-01b00e012bf003e7f |
ap-southeast-4 | vpce-07b97088fab273a7e |
ap-northeast-1 | vpce-03119d6dacf373214 |
ap-northeast-2 | vpce-0cc30de1838047e40 |
ap-northeast-3 | vpce-037a4f877617d88ad |
ca-central-1 | vpce-017b70e4d73cd789a |
ca-west-1 | vpce-0718d13003e07f530 |
eu-central-1 | vpce-02b2728832eb229fc |
eu-central-2 | vpce-0a31bc7d13be7a02a |
eu-north-1 | vpce-074aecb112e945f49 |
eu-south-1 | vpce-0863b6ff9e264d718 |
eu-south-2 | vpce-08d11b71a2e25cfd0 |
eu-west-1 | vpce-0997154afc4527ba9 |
eu-west-2 | vpce-07b61fafd692c56bf |
eu-west-3 | vpce-069fe27d81aad21bb |
il-central-1 | vpce-0eb968f248f3aea0c |
sa-east-1 | vpce-082af521c2cc84550 |
Note that this condition cannot be used with the source IP condition outlined here as requests will not be sourced from that IP address and instead will use the AWS backbone.
For more information on connectivity, please refer to our Network Connectivity documentation.
Can I further constraint outbound access (egress) from the Agent?
If you are connecting to a VPC with limited or no outbound internet connection, you will need to include VPC endpoints for the following services:
- S3
- CloudWatch
- ECR*
- Lambda*
- CloudFormation**
*Not necessary if remote upgrades are disabled.
**Not necessary if remote upgrades are disabled or if using Terraform.
Absolutely! As with any Lambda based service you can connect to a VPC to enable connectivity to your resources.
Some scenarios where you might want to do this can include:
- You want to IP allowlist connectivity between the agent and your resource.
- You want to deploy the agent in a new VPC and peer and/or setup privatelink between services.
- You want to deploy the agent in your existing VPC.
CloudFormation
With CloudFormation deployments you can specify a VPC ID and at least two private subnets in that VPC as parameters.
To assist with scenarios 1-2, after review, you can use this (source) template to create a new VPC.
Then, specify the generated VPC and private subnets when deploying the agent.
This (source) example template further demonstrates how you can deploy an agent with a connected VPC in one CloudFormation stack by leveraging nesting:
After deploying, the outputs of this stack include a PublicIP
, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up a privatelink (e.g., via VPC endpoint service).
Terraform
With Terraform deployments you can specify at least two private subnets in the private_subnets
variable.
To assist with scenarios 1-2, after review, you can use this module to connect an agent to a VPC. And then specify the generated private subnets when deploying the agent.
The following example template demonstrates how you can deploy an agent with a connected VPC with Terraform:
data "aws_region" "current" {}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "apollo-vpc"
cidr = "10.0.0.0/16"
azs = formatlist("${data.aws_region.current.name}%s", ["a", "b"])
private_subnets = ["10.0.0.0/24", "10.0.1.0/24"]
public_subnets = ["10.0.2.0/24", "10.0.3.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
route_table_ids = concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids)
}
module "apollo" {
source = "monte-carlo-data/mcd-agent/aws"
cloud_account_id = "590183797493"
region = data.aws_region.current.name
private_subnets = module.vpc.private_subnets
}
output "function_arn" {
value = module.apollo.mcd_agent_function_arn
description = "Agent Function ARN. To be used in registering."
}
output "invoker_role_arn" {
value = module.apollo.mcd_agent_invoker_role_arn
description = "Assumable role ARN. To be used in registering."
}
output "invoker_role_external_id" {
value = module.apollo.mcd_agent_invoker_role_external_id
description = "Assumable role External ID. To be used in registering."
}
output "public_ip" {
value = module.vpc.nat_public_ips
description = "IP address from which agent resources access the Internet (e.g. for IP whitelisting)."
}
After deploying the outputs include a public_ip
, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up privatelink (e.g. via VPC endpoint service).
How do I check the reachability between Monte Carlo and the Agent?
Please refer to the documentation here.
How do I debug connectivity between the Agent and my integration?
Please refer to the documentation here.
Updated 12 days ago