AWS: Agent Deployment
How-to create and register
Prerequisites
- You are an admin in AWS (for step 1).
- You are an Account Owner (for step 2).
This guide outlines how to setup an Agent (with object storage) in your AWS cloud.
The FAQs answer common questions like how to review resources and what integrations are supported.
Steps
1. Deploy the Agent
You can deploy the agent either via CloudFormation or Terraform.
Before getting started please review the Monte Carlo AWS account your collection service is hosted in.
When deploying these resources, for all Monte Carlo accounts created after April 24th, 2024, the relevant Monte Carlo AWS Account ID will be 590183797493
. If your account was created before this date, please reach out to your Monte Carlo representative for the ID.
The AWS account ID can also be found as part of the Stack ARN on the UI here or via this CLI command and is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.
And when deploying an agent if you wish to connect to a VPC please see details here. Specifying a VPC is not strictly required to run the agent, but enables certain connectivity scenarios like when you have an IP allowlist for your resource, want to peer, or deploy in your existing VPC.
Deploy with CloudFormation
You can use the following quick-create link to deploy the agent on any supported region in your AWS account:
If you need to share with a colleague or first review the template you can download a copy here (source).
When deploying these resources, for all Monte Carlo accounts created after April 24th, 2024, the relevant Monte Carlo AWS Account ID will be 590183797493
. If your account was created before this date, please reach out to your Monte Carlo representative for the ID.
Deploy with Terraform
You can use the mcd-agent Terraform module to deploy the Agent on any supported region in your AWS account.
For instance, using the following example Terraform config:
module "apollo" {
source = "monte-carlo-data/mcd-agent/aws"
version = "0.1.3"
}
output "function_arn" {
value = module.apollo.mcd_agent_function_arn
description = "Agent Function ARN. To be used in registering."
}
output "invoker_role_arn" {
value = module.apollo.mcd_agent_invoker_role_arn
description = "Assumable role ARN. To be used in registering."
}
output "invoker_role_external_id" {
value = module.apollo.mcd_agent_invoker_role_external_id
description = "Assumable role External ID. To be used in registering."
}
When deploying these resources, for all Monte Carlo accounts created after April 24th, 2024, the relevant Monte Carlo AWS Account ID will be 590183797493
. If your account was created before this date, please reach out to your Monte Carlo representative for the ID.
You can build and deploy via:
terraform init && terraform apply
Additional module inputs, options, and defaults can be found here. And other details can be found here.
2. Register the Agent
After deploying the agent you can register either via the Monte Carlo UI or CLI.
And see here for examples on how to retrieve deployment output (i.e. registration input).
After this step is complete all supported integrations will automatically use this agent (and object store for troubleshooting and temporary data). You can add these integrations as you normally would using Monte Carlo's UI wizard or CLI.
UI
If you are onboarding a new account, you can also register by following the steps on the onscreen
- Navigate to settings/integrations/agents and select the
Create
button. - Follow the onscreen wizard for the "AWS" Platform and "Data Store + Agent" Type.
CLI
Use montecarlo agents register-aws-agent
to register.
See reference documentation here. And see here for how to install and configure the CLI. For instance:
montecarlo agents register-aws-agent \
--lambda-arn arn:aws:lambda:us-east-1:123456789:function:mcd-agent-AgentLambda \
--assumable-role arn:aws:iam::123456789:role/mcd-agent-InvocationRole-12345 \
--external-id f3840b31-772e-4fe3-8a5f-3aa5ff7e6fec
FAQs
What integrations does the Agent support?
The agent supports all integrations. Exceptions:
- Data Lake Query Logs from S3 Buckets: https://docs.getmontecarlo.com/docs/s3-events-query-logs
Note that onboarding (connecting) any supported integration will use the agent if one is provisioned. Otherwise any other integrations will use your automatically managed and hosted data collection service to connect directly.
Some integrations like dbt core, Atlan, and Airflow either leverage our developer toolkit or are managed by a 3rd party and do not require an Agent. These integrations natively push data to Monte Carlo, so an Agent is not required.
What regions does the Agent support?
You can deploy an agent in the following AWS regions:
Supported Regions |
---|
af-south-1 |
ap-northeast-2 |
ap-south-1 |
ap-southeast-1 |
ap-southeast-2 |
ca-central-1 |
eu-central-1 |
eu-north-1 |
eu-west-1 |
eu-west-2 |
us-east-1 |
us-east-2 |
us-west-2 |
If there is a region you would like supported that is not listed, please let us know!
Can I review agent resources and code?
Absolutely! You can find details here:
*Note that due to an AWS limitation the agent image is also uploaded and then sourced from AWS ECR when executed on Lambda.
Repository: 752656882040.dkr.ecr.*.amazonaws.com/mcd-agent
How do I retrieve registration input from CloudFormation or Terraform?
CloudFormation
Navigate to your stack and then filter for "To be used in registering" in the Outputs tab.
These three values are to be used when you register an agent with Monte Carlo.
If you prefer you can also use the AWS CLI or API to retrieve these values.
Terraform
Outputs can be retrieved via terraform output
. For instance:
% terraform output
function_arn = "arn:aws:lambda:us-east-1:123456789:function:mcd-agent-service-12345"
invoker_role_arn = "arn:aws:iam::123456789:role/mcd_agent_service_invocation_role12345"
invoker_role_external_id = "example"
How do I monitor the Agent?
The Agent automatically generates a log of all operations, which can retrieved via this CLI command or from CloudWatch.
Operational metrics can similarly be retrieved from CloudWatch metrics (or the Lambda console).
How do I upgrade the Agent?
If you have opted out of remote upgrades you can upgrade the agent image by setting the ImageUri
parameter in the CloudFormation stack or image
variable in the Terraform module.
Please reach out to your Monte Carlo representative or support at [email protected] for the correct tag for your deployment.
Otherwise, and by default, Monte Carlo will automatically manage upgrades for you. If this is the case and you'd still like to explicitly upgrade you can do so via the upgrade command on the CLI or "Upgrade" button on the UI.
Can I further constraint inbound access (ingress) to the Agent?
VPC endpoints by default
For all accounts created after April 24th, 2024 the Monte Carlo platform will use private endpoints to communicate to the AWS agent by default. This constraint can no longer be used.
See here for how to update the IAM invoke policy permissions to include a
aws:SourceVpce
condition.
Absolutely! By default this is done via the defined trust policy in the invocation role, but if you prefer you can further restrict requests via an IP allowlist. For instance you can:
- Reach out to your Monte Carlo representative or support at [email protected] for an IP Address to allowlist. All inbound requests to the agent will originate here.
- Update the IAM invoke policy permissions to include a
aws:SourceIp
condition with the IP address from step #1. If the agent was deployed via CloudFormation this the IAM role containing the policy will have a logical ID ofInvocationRole
. And for instance the updated policy can be:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"<AGENT_FUNCTION>"
],
"Effect": "Allow",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"<IP>/32"
]
}
}
}
]
}
- Test connectivity between Monte Carlo's Service and the Agent (e.g. via the health CLI command or the "test" button on the UI).
You can also do this before registering an agent if you prefer. If you do so you can skip step #3 as reachability is automatically validated during registration.
Can I use private endpoints to configure inbound access (ingress) to the Agent?
For all accounts created after April 24th, 2024 the Monte Carlo platform will use private endpoints to communicate to the AWS agent by default. Mapping per region:
Region | VPCE ID |
---|---|
us-east-1 | vpce-019013ff90cf68727 |
us-east-2 | vpce-012bbc4487e64055a |
us-west-1 | vpce-061dfb089827eef5e |
us-west-2 | vpce-0fd5fcf09921fe336 |
af-south-1 | vpce-0309c6adabc62ee18 |
ap-south-1 | vpce-0ad230c44f5e778ec |
ap-south-2 | vpce-0329e929b5b0ef117 |
ap-southeast-1 | vpce-0d852cd249f75c98f |
ap-southeast-2 | vpce-0f879dec7e0f72e4c |
ap-southeast-3 | vpce-01b00e012bf003e7f |
ap-southeast-4 | vpce-07b97088fab273a7e |
ap-northeast-1 | vpce-03119d6dacf373214 |
ap-northeast-2 | vpce-0cc30de1838047e40 |
ap-northeast-3 | vpce-037a4f877617d88ad |
ca-central-1 | vpce-017b70e4d73cd789a |
ca-west-1 | vpce-0718d13003e07f530 |
eu-central-1 | vpce-02b2728832eb229fc |
eu-central-2 | vpce-0a31bc7d13be7a02a |
eu-north-1 | vpce-074aecb112e945f49 |
eu-south-1 | vpce-0863b6ff9e264d718 |
eu-south-2 | vpce-08d11b71a2e25cfd0 |
eu-west-1 | vpce-0997154afc4527ba9 |
eu-west-2 | vpce-07b61fafd692c56bf |
eu-west-3 | vpce-069fe27d81aad21bb |
il-central-1 | vpce-0eb968f248f3aea0c |
sa-east-1 | vpce-082af521c2cc84550 |
For older accounts this feature is only available in certain subscription tiers and restricted by AWS region. Please reach out to your Monte Carlo representative to learn more. If supported Monte Carlo will setup a VPC endpoint for Lambda for all communication between the Monte Carlo collection service and your agent.
Afterwards you can update the IAM invoke policy permissions to include a aws:SourceVpce
condition with the endpoint provided above. If the agent was deployed via CloudFormation this the IAM role containing the policy will have a logical ID of InvocationRole
. And for instance the updated policy can be:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"<AGENT_FUNCTION>"
],
"Effect": "Allow",
"Condition": {
"StringEquals": {
"aws:SourceVpce": "<VPCE_ID>"
}
}
}
]
}
Note that this condition cannot be used with the source IP condition outlined here as requests will not be sourced from that IP address and instead will use the AWS backbone.
Can I further constraint outbound access (egress) from the Agent?
Absolutely! As with any Lambda based service you can connect to a VPC to enable connectivity to your resources.
Some scenarios where you might want to do this can include:
- You want to IP allowlist connectivity between the agent and your resource.
- You want to deploy the agent in a new VPC and peer and/or setup privatelink between services.
- You want to deploy the agent in your existing VPC.
CloudFormation
With CloudFormation deployments you can specify a VPC ID and at least two private subnets in that VPC as parameters.
To assist with scenarios 1-2, after review, you can use this (source) template to create a new VPC:
And then specify the generated VPC and private subnets when deploying the agent.
The following example template demonstrates how you can deploy an agent with a connected VPC in one CloudFormation stack by leveraging nesting:
AWSTemplateFormatVersion: '2010-09-09'
Description: Example template that deploys an agent with a connected VPC by leveraging nested stacks.
Parameters:
CloudAccountId:
Description: >
Select the Monte Carlo account your collection service is hosted in. This can be found in the
'settings/integrations/collectors' tab on the UI or via the 'montecarlo collectors list' command on the CLI.
Type: String
Default: 190812797848
AllowedValues: [ 190812797848, 799135046351, 682816785079 ]
ConcurrentExecutions:
Default: 20
Description: The number of concurrent lambda executions for the agent.
MaxValue: 200
MinValue: 0
Type: Number
ImageUri:
Default: 752656882040.dkr.ecr.*.amazonaws.com/mcd-agent:latest
Description: >
URI of the Agent container image (ECR Repo). Note that the region automatically maps to where this stack
is deployed in.
Type: String
MemorySize:
Default: 512
Description: >
The amount of memory (MB) available to the agent at runtime; this value can be any multiple of
1 MB greater than 256MB.
MinValue: 256
MaxValue: 10240
Type: Number
Outputs:
FunctionArn:
Description: Agent Function ARN. To be used in registering.
Value: !GetAtt Agent.Outputs.FunctionArn
InvocationRoleArn:
Description: Assumable role ARN. To be used in registering.
Value: !GetAtt Agent.Outputs.InvocationRoleArn
InvocationRoleExternalId:
Description: Assumable role External ID. To be used in registering.
Value: !GetAtt Agent.Outputs.InvocationRoleExternalId
PublicIP:
Description: IP address from which agent resources access the Internet (e.g. for IP whitelisting).
Value: !GetAtt Networking.Outputs.PublicIP
Resources:
Networking:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://mcd-public-resources.s3.amazonaws.com/cloudformation/basic_vpc.yaml
Agent:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://mcd-public-resources.s3.amazonaws.com/cloudformation/aws_apollo_agent.yaml
Parameters:
CloudAccountId: !Ref CloudAccountId
ConcurrentExecutions: !Ref ConcurrentExecutions
ExistingVpcId: !GetAtt Networking.Outputs.VpcId
ExistingSubnetIds: !Join [ ',', [ !GetAtt Networking.Outputs.PrivateSubnetAz1, !GetAtt Networking.Outputs.PrivateSubnetAz2 ] ]
ImageUri: !Ref ImageUri
MemorySize: !Ref MemorySize
After deploying the outputs of this stack include a PublicIP
, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up privatelink (e.g. via VPC endpoint service).
Terraform
With Terraform deployments you can specify at least two private subnets in the private_subnets
variable.
To assist with scenarios 1-2, after review, you can use this module to connect an agent to a VPC. And then specify the generated private subnets when deploying the agent.
The following example template demonstrates how you can deploy an agent with a connected VPC with Terraform:
data "aws_region" "current" {}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "apollo-vpc"
cidr = "10.0.0.0/16"
azs = formatlist("${data.aws_region.current.name}%s", ["a", "b"])
private_subnets = ["10.0.0.0/24", "10.0.1.0/24"]
public_subnets = ["10.0.2.0/24", "10.0.3.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
route_table_ids = concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids)
}
module "apollo" {
source = "monte-carlo-data/mcd-agent/aws"
version = "0.1.1"
region = data.aws_region.current.name
private_subnets = module.vpc.private_subnets
}
output "function_arn" {
value = module.apollo.mcd_agent_function_arn
description = "Agent Function ARN. To be used in registering."
}
output "invoker_role_arn" {
value = module.apollo.mcd_agent_invoker_role_arn
description = "Assumable role ARN. To be used in registering."
}
output "invoker_role_external_id" {
value = module.apollo.mcd_agent_invoker_role_external_id
description = "Assumable role External ID. To be used in registering."
}
output "public_ip" {
value = module.vpc.nat_public_ips
description = "IP address from which agent resources access the Internet (e.g. for IP whitelisting)."
}
After deploying the outputs include a public_ip
, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up privatelink (e.g. via VPC endpoint service).
How do I check the reachability between Monte Carlo and the Agent?
Reachability is automatically validated during registration, but you can also use this CLI command or "test" button on the UI to test anytime.
How do I debug connectivity between the Agent and my integration?
Even though each network configuration is unique, you can try the following to help debug connectivity:
- Double check the connection details provided to Monte Carlo, such as host, port, database, user for typos/omissions.
- Confirm that the service user you created works (e.g. you are able to log in as the service user).
- Use the MC network utilities on the integrations page. These utilities are also available via the CLI.
- Test TCP Open: Tests if a destination exists and accepts requests. Opens a TCP Socket to a specific port from the agent.
- Test Telnet: Checks if Telnet connection is usable from the agent.
Updated about 14 hours ago