AWS: Agent Deployment

How-to create and register

πŸ“

Prerequisites

  1. You are an admin in AWS (for step 1).
  2. You are an Account Owner (for step 2).

This guide outlines how to setup an Agent (with object storage) in your AWS cloud.

The FAQs answer common questions like how to review resources and what integrations are supported.

Steps

1. Provision the Agent in Monte Carlo

Before deploying any AWS resources, register the agent in Monte Carlo. The backend records the registration and generates the IAM External ID that you'll plug into your CloudFormation parameters or Terraform variables in step 2.

UI

Navigate to settings/deployments and click Add. Choose the "AWS" Platform and "Data store + agent" type, supply a deployment name, then click Provision. Monte Carlo creates the agent and routes you to the Edit page, where the generated External ID is displayed β€” copy it; you'll need it in step 2.

CLI

montecarlo agents register-aws-agent

Calling register-aws-agent with no --lambda-arn, --assumable-role, or --external-id arguments creates the agent and prints the External ID along with the next-step command:

Agent successfully registered!
AgentId: <AGENT_ID>

ExternalId generated.
AWS ExternalId: <EXTERNAL_ID>

Next steps:
  1. Configure your IAM role trust policy to require this ExternalId (sts:ExternalId condition).
  2. Deploy your IAM role and the agent Lambda function.
  3. Run `montecarlo agents register-aws-agent --agent-id <AGENT_ID> --assumable-role <ROLE_ARN> --lambda-arn <LAMBDA_ARN>` to complete the registration. The agent is enabled automatically once everything is in place.

2. Deploy the Agent

You can deploy the agent either via CloudFormation or Terraform. Supply the External ID from step 1 as a stack parameter (CloudFormation) or input variable (Terraform); the deployment wires it into the invoker role's IAM trust policy.

Before getting started please review the Monte Carlo AWS account your collection service is hosted in.

When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Collection AWS account id provided in the Account information page. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

And when deploying an agent if you wish to connect to a VPC please see details here. Specifying a VPC is not strictly required to run the agent, but enables certain connectivity scenarios like when you have an IP allowlist for your resource, want to peer, or deploy in your existing VPC.

Deploy with CloudFormation

You can use the following quick-create link to deploy the agent on any supported region in your AWS account:

If you need to share with a colleague or first review the template you can download a copy here (source).

AWS Parameters Wizard

AWS Parameters Wizard Example

The External ID stack parameter is the value Monte Carlo generated for the agent in step 1. It is wired into the invoker role's IAM trust policy so that only Monte Carlo's collection service can assume the role. The field is optional only for existing deployments that already use a CloudFormation Stack-ID-derived External ID; new deployments must supply the value generated by Monte Carlo.

When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Collection AWS account id provided in the Account information page. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

Note that the AWS account ID is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.

Deploy with Terraform

You can use the mcd-agent Terraform module to deploy the Agent on any supported region in your AWS account.

For instance, using the following example Terraform config:

module "apollo" {
  source           = "monte-carlo-data/mcd-agent/aws"
  version          = "1.0.0"
  cloud_account_id = [Collection AWS account ID from Account information page]
  external_id      = "<EXTERNAL_ID_FROM_STEP_1>"
}

output "function_arn" {
  value       = module.apollo.mcd_agent_function_arn
  description = "Agent Function ARN. To be used in registering."
}

output "invoker_role_arn" {
  value       = module.apollo.mcd_agent_invoker_role_arn
  description = "Assumable role ARN. To be used in registering."
}

The external_id variable is the value Monte Carlo generated for the agent in step 1. It is wired into the invoker role's IAM trust policy so that only Monte Carlo's collection service can assume the role.

You can build and deploy via:

terraform init && terraform apply

Additional module inputs, options, and defaults can be found here. And other details can be found here.

When provisioning resources for Monte Carlo deployments on the V2 Platform, use the Collection AWS account id provided in the Account information page. Accounts created after April 24th, 2024, will automatically be on the V2 platform or newer.

If you are using an older version of the platform, please contact your Monte Carlo representative for the ID.

Note that the AWS account ID is not the same account where you will deploy the agent. It is important to make sure this ID is the one you select as the "Monte Carlo AWS Account ID" parameter when deploying the agent as registration will fail otherwise.


3. Complete Registration

After deploying the AWS resources, complete registration by supplying the Lambda ARN and invoker role ARN. Monte Carlo auto-enables the agent on successful completion.

See here for examples on how to retrieve the values from your deployment outputs.

After this step is complete, all supported integrations using this deployment will automatically use this agent (and object store for troubleshooting and temporary data). You can add these integrations as you normally would using Monte Carlo's UI wizard or CLI.

UI

πŸ‘

If you are onboarding a new account, you can also register by following the steps on the onboarding screen


  1. Open the agent's Edit page (Monte Carlo will have routed you there after step 1; you can also navigate from settings/deployments).
  2. Fill in the AWS Lambda ARN and AWS assumable role, then click Enable. Monte Carlo validates the configuration and enables the agent.
Monte Carlo Registration Wizard UI Example

Monte Carlo Registration Wizard UI Example

CLI

Re-run register-aws-agent with the agent ID from step 1 plus the deployed Lambda ARN and invoker role ARN:

montecarlo agents register-aws-agent \
  --agent-id <AGENT_ID> \
  --lambda-arn arn:aws:lambda:us-east-1:123456789:function:mcd-agent-AgentLambda \
  --assumable-role arn:aws:iam::123456789:role/mcd-agent-InvocationRole-12345

See reference documentation here. And see here for how to install and configure the CLI.

FAQs

Why am I seeing an "account limit reached" error when I click Provision?

Monte Carlo applies a per-account limit on the number of provisioned deployments to prevent runaway resource allocation. If you hit it, the Provision action fails with an error like:

Could not register deployment

Cannot allocate new resources for a <platform> agent, account limit reached. Please contact support.

If you need to provision more deployments, reach out via our Support Agent or contact your account representative and we'll raise the limit.

What integrations does the Agent support?

The agent supports all integrations except for the following:

  • Data Lake Query Logs from S3 Buckets are not supported: Learn more.
  • Tableau requires using the connected app authentication flow: Learn more.

Note that onboarding (connecting) any supported integration using this deployment will use the agent if one is provisioned. Otherwise, any other integrations will use the cloud service to connect directly.

Some integrations, such as dbt Core, Atlan, and Airflow, either leverage our developer toolkit or are managed by a third party and do not require an agent. These integrations natively push data to Monte Carlo, so an agent is not needed.

Can I use more than one Agent?

Yes, please reach out via our Support Agent or contact your account representative if you would like to use more than one.

What regions does the Agent support?

You can deploy an agent in the following AWS regions:

Supported Regions
us-east-1
us-east-2
us-west-1
us-west-2
af-south-1
ap-south-1
ap-south-2
ap-southeast-1
ap-southeast-2
ap-southeast-3
ap-southeast-4
ap-northeast-1
ap-northeast-2
ap-northeast-3
ca-central-1
ca-west-1
eu-central-1
eu-central-2
eu-west-1
eu-west-2
eu-west-3
eu-north-1
eu-south-1
eu-south-2
il-central-1
sa-east-1

Can I review agent resources and code?

Absolutely! You can find details here:

*Note that due to an AWS limitation the agent image is also uploaded and then sourced from AWS ECR when executed on Lambda.

Repository: 590183797493.dkr.ecr.*.amazonaws.com/mcd-agent

How do I retrieve registration input from CloudFormation or Terraform?

CloudFormation

Navigate to your stack and then filter for "To be used in registering" in the Outputs tab.

AWS Outputs Example

AWS Outputs Example

The FunctionArn and InvocationRoleArn outputs are the values you'll supply in step 3 to complete registration. The InvocationRoleExternalId output is the External ID you passed in as a stack parameter in step 2 β€” it's surfaced here for reference, but you don't need to re-enter it during registration (Monte Carlo already has it from step 1).

If you prefer you can also use the AWS CLI or API to retrieve these values.

Terraform

Outputs can be retrieved via terraform output. For instance:

 % terraform output
function_arn = "arn:aws:lambda:us-east-1:123456789:function:mcd-agent-service-12345"
invoker_role_arn = "arn:aws:iam::123456789:role/mcd_agent_service_invocation_role12345"

Where does the External ID come from?

Monte Carlo generates the IAM External ID server-side during provisioning (step 1) and displays it on the deployment's Edit page and in the CLI output. You supply this value to your CloudFormation stack (as the ExternalId parameter) or Terraform module (as the external_id input variable) so it ends up in the IAM role's trust policy.

If you ever need to rotate the External ID (e.g. for compliance reasons), use the Rotate action on the deployment's Edit page or the montecarlo agents rotate-aws-external-id CLI command. After rotating, redeploy your CloudFormation stack or terraform apply to update the trust policy with the new value.

How do I monitor the Agent?

The Agent automatically generates a log of all operations, which can retrieved via this CLI command or from CloudWatch. Operational metrics can similarly be retrieved from CloudWatch metrics (or the Lambda console).

See additional details here.

How do I upgrade the Agent?

Please refer to the documentation here.

Can I further constraint inbound access (ingress) to the Agent?

πŸ‘

VPC endpoints by default

For all deployments on the V2 Platform or newer, Monte Carlo uses VPC endpoints to communicate with the AWS agent by default. This constraint is no longer applicable. Additional details and limitations can be found here.

To update the IAM invoke policy permissions to include an aws:SourceVpce condition, refer to this document.

Absolutely! By default this is done via the defined trust policy in the invocation role, but if you prefer you can further restrict requests via an IP allowlist. For instance you can:

  1. Reach out to your Monte Carlo representative or our Support Agent for an IP Address to allowlist. All inbound requests to the agent will originate here.
  2. Update the IAM invoke policy permissions to include a aws:SourceIp condition with the IP address from step #1. If the agent was deployed via CloudFormation this the IAM role containing the policy will have a logical ID of InvocationRole. And for instance the updated policy can be:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "<AGENT_FUNCTION>"
            ],
            "Effect": "Allow",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "<IP>/32"
                    ]
                }
            }
        }
    ]
}
  1. Test connectivity between Monte Carlo's Service and the Agent (e.g. via the health CLI command or the "test" button on the UI).

You can also do this before registering an agent if you prefer. If you do so you can skip step #3 as reachability is automatically validated during registration.

For more information on connectivity, please refer to our Network Connectivity documentation.

Can I use private endpoints to configure inbound access (ingress) to the Agent?

For all deployments on the V2 Platform or newer, Monte Carlo will use VPC endpoints to communicate with the AWS agent by default. Additional details and limitations can be found here.

You can update the IAM invoke policy permissions to include an aws:SourceVpce condition with the Agent VPC endpoint ID provided in the Account information page for your desired region. If the agent was deployed via CloudFormation, the IAM role containing the policy will have a logical ID of InvocationRole. For example, the updated policy could be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "<AGENT_FUNCTION>"
            ],
            "Effect": "Allow",
            "Condition": {
                "StringEquals": {
                    "aws:SourceVpce": "<VPCE_ID>"
                }
            }
        }
    ]
}

Note that this condition cannot be used with the source IP condition outlined here as requests will not be sourced from that IP address and instead will use the AWS backbone.


For more information on connectivity, please refer to our Network Connectivity documentation.

Can I further constraint outbound access (egress) from the Agent?

πŸ‘

If you are connecting to a VPC with limited or no outbound internet connection, you will need to include VPC endpoints for the following services:

  • S3
  • CloudWatch
  • ECR*
  • Lambda*
  • CloudFormation**

*Not necessary if remote upgrades are disabled.
**Not necessary if remote upgrades are disabled or if using Terraform.

Absolutely! As with any Lambda based service you can connect to a VPC to enable connectivity to your resources.

Some scenarios where you might want to do this can include:

  1. You want to IP allowlist connectivity between the agent and your resource.
  2. You want to deploy the agent in a new VPC and peer and/or setup privatelink between services.
  3. You want to deploy the agent in your existing VPC.

CloudFormation

With CloudFormation deployments you can specify a VPC ID and at least two private subnets in that VPC as parameters.

AWS Parameters Wizard

AWS Parameters Wizard Example

To assist with scenarios 1-2, after review, you can use this (source) template to create a new VPC.

Then, specify the generated VPC and private subnets when deploying the agent.

This (source) example template further demonstrates how you can deploy an agent with a connected VPC in one CloudFormation stack by leveraging nesting:

After deploying, the outputs of this stack include a PublicIP, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up a privatelink (e.g., via VPC endpoint service).

Terraform

With Terraform deployments you can specify at least two private subnets in the private_subnets variable.

To assist with scenarios 1-2, after review, you can use this module to connect an agent to a VPC. And then specify the generated private subnets when deploying the agent.

The following example template demonstrates how you can deploy an agent with a connected VPC with Terraform:

data "aws_region" "current" {}

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name               = "apollo-vpc"
  cidr               = "10.0.0.0/16"
  azs                = formatlist("${data.aws_region.current.name}%s", ["a", "b"])
  private_subnets    = ["10.0.0.0/24", "10.0.1.0/24"]
  public_subnets     = ["10.0.2.0/24", "10.0.3.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

resource "aws_vpc_endpoint" "s3" {
  vpc_id          = module.vpc.vpc_id
  service_name    = "com.amazonaws.${data.aws_region.current.name}.s3"
  route_table_ids = concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids)
}

module "apollo" {
  source           = "monte-carlo-data/mcd-agent/aws"
  cloud_account_id = [Collection AWS account ID from Account information page]
  region           = data.aws_region.current.name
  private_subnets  = module.vpc.private_subnets
  external_id      = "<EXTERNAL_ID_FROM_STEP_1>"
}

output "function_arn" {
  value       = module.apollo.mcd_agent_function_arn
  description = "Agent Function ARN. To be used in registering."
}

output "invoker_role_arn" {
  value       = module.apollo.mcd_agent_invoker_role_arn
  description = "Assumable role ARN. To be used in registering."
}

output "public_ip" {
  value       = module.vpc.nat_public_ips
  description = "IP address from which agent resources access the Internet (e.g. for IP whitelisting)."
}

After deploying the outputs include a public_ip, which can then be used for allowlisting. You can similarly output the VPC ID and other networking resources for peering and/or setting up privatelink (e.g. via VPC endpoint service).

How do I check the reachability between Monte Carlo and the Agent?

Please refer to the documentation here.

How do I debug connectivity between the Agent and my integration?

Please refer to the documentation here.