Agent Observability Platform: Installation
Configure and apply the Terraform module to deploy the platform
PrerequisitesComplete the Prerequisites first β tooling, AWS permissions, domains, and chart registry access.
Overview
The terraform-aws-ao-data-platform module provisions the EKS cluster and, by default (helm.deploy_charts = true), also deploys the ao-data-platform Helm chart β ClickHouse, the OpenTelemetry Collector, and the LLM worker β in the same terraform apply.
The kubernetes and helm providers are configured from the module's outputs, which lets Terraform defer the Kubernetes/Helm resources until after the EKS cluster exists. This is what enables a single-pass apply.
Work through the steps below in order. The required inputs are the same for both cluster paths: region, otel_collector_domain, clickhouse_domain, helm.chart_registry, and helm.chart_version.
The examples below install the public artifacts: the Terraform module from the Terraform Registry (
monte-carlo-data/ao-data-platform/aws), and theao-data-platformHelm chart andao-llm-workerimage from Docker Hub (see Prerequisites). Pulling them requires no registry authentication.
1. Configure the providers
Your root module must configure the aws, kubernetes, and helm providers. The kubernetes and helm providers are wired from the module's outputs β substitute your region for us-east-1:
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 6.0" }
kubernetes = { source = "hashicorp/kubernetes", version = "~> 2.0" }
helm = { source = "hashicorp/helm", version = "~> 2.0" }
}
}
provider "aws" {
region = "us-east-1"
}
provider "kubernetes" {
host = module.ao_data_platform.eks_cluster_endpoint
cluster_ca_certificate = module.ao_data_platform.eks_cluster_ca_certificate
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.ao_data_platform.eks_cluster_name, "--region", "us-east-1"]
}
}
provider "helm" {
kubernetes {
host = module.ao_data_platform.eks_cluster_endpoint
cluster_ca_certificate = module.ao_data_platform.eks_cluster_ca_certificate
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.ao_data_platform.eks_cluster_name, "--region", "us-east-1"]
}
}
}2. Configure the module
Choose the tab for your deployment path.
The module creates the VPC and EKS cluster with sensible defaults (cluster name monte-carlo):
module "ao_data_platform" {
source = "monte-carlo-data/ao-data-platform/aws"
version = "1.0.0"
region = "us-east-1"
otel_collector_domain = "otel.acme.com"
clickhouse_domain = "clickhouse.acme.com"
hosted_zone_id = "Z1234567890ABC"
helm = {
chart_registry = "oci://registry-1.docker.io/montecarlodata"
chart_version = "1.5.0"
llm_worker = {
image_tag = "1.0.0"
}
}
}See examples/new_cluster/ for a complete, copy-paste starting point including the provider block.
helm.chart_registryis the registry prefix only βoci://registry-1.docker.io/montecarlodata, with no chart name. The module appends/ao-data-platformitself, so adding it here (.../montecarlodata/ao-data-platform) makesterraform applyfail to pull the chart. The artifacts table in Prerequisites lists the full chart path because that's the chart's location β but as a module input, pass only the prefix.
3. Initialize Terraform
terraform initThis downloads the module and the aws, kubernetes, and helm providers.
4. Review the plan
terraform planReview the planned changes before applying. On the new-cluster path the plan creates a VPC, the EKS cluster and node groups, IAM/IRSA roles, a KMS key, Secrets Manager secrets, ACM certificates, and the Helm releases.
Check the ClickHouse node group's Availability Zone. On the new-cluster path, confirm the
clickhouse_node_group.availability_zonevalue in the plan output. EBS volumes are AZ-locked, so this must match the AZ of the ClickHouse persistent volume β see Dedicated ClickHouse node group below.
5. Apply
terraform applyReview the plan once more, then confirm. The apply provisions the AWS infrastructure and (with helm.deploy_charts = true) deploys the chart in one pass. A full apply on the new-cluster path typically takes 15β25 minutes, most of it waiting on the EKS cluster and node groups.
applymodifies your~/.kube/config. To enable the single-pass deploy, the module runsaws eks update-kubeconfiginsidelocal-execprovisioners β it has tokubectl waitfor the External Secrets Operator CRDs and apply aClusterSecretStore, which the native Terraformkubernetes/helmproviders can't do for resources created in the same apply. This adds or refreshes the cluster's context in the~/.kube/configof the machine running Terraform and makes it the current context. It happens on every apply, on both cluster paths.
To manage the Helm release yourself instead, set
helm.deploy_charts = falseand follow the self-managed Helm install.
6. Confirm what was created
When the apply completes, review the outputs:
terraform outputYou should see (among others):
| Output | What it is |
|---|---|
eks_cluster_name | The cluster name β use it to configure kubectl next |
montecarlo_namespace | The namespace (montecarlo) all components run in |
clickhouse_otel_credentials_secret_arn | Secrets Manager ARN for the ClickHouse otel user |
otel_collector_irsa_role_arn / llm_worker_irsa_role_arn | IRSA roles for the workloads |
clickhouse_node_group | The dedicated ClickHouse node group (new-cluster path), including its resolved availability_zone |
The AWS infrastructure is now provisioned and the chart is deploying. Verifying that the in-cluster components (ClickHouse, the Collector, the LLM worker) came up healthy is the first step of Deploy the agent and connect to Monte Carlo.
Dedicated ClickHouse node group
On the new-cluster path, the module automatically creates a dedicated single-AZ EKS managed node group for ClickHouse (when helm.deploy_charts = true). It is a single node, tainted dedicated=clickhouse:NoSchedule, and the module wires the ClickHouse pod's nodeSelector/tolerations to target it β no manual configuration required. This isolates ClickHouse from the OpenTelemetry Collector and other workloads, which run on the main node group.
This dedicated node group (and the matching
nodeSelector/tolerationswiring) requiresao-data-platformchart version >= 1.3.0, which is the module's minimum chart version.
EBS volumes are AZ-locked. The dedicated node group must live in the same Availability Zone as the ClickHouse persistent volume. It defaults to the region's first AZ (alphabetically); override with
clickhouse_node_group.availability_zoneif your volume is elsewhere. Always check theclickhouse_node_group.availability_zoneoutput during plan/apply review.
No high availability. ClickHouse runs as a single replica on the single-node group, so node-drain operations (EKS AMI rolls, node-group resizing, manual drains) cause brief ClickHouse downtime (~30β90s) while the pod restarts. This is expected for the single-AZ, single-replica design.
On the existing-cluster path, the module does not manage this node group β attach a tainted single-AZ node group yourself and pass matching clickhouse.nodeSelector/clickhouse.tolerations via your Helm values.
Harden network access
The ClickHouse and OpenTelemetry Collector NLBs are internal, but their allowed-source ranges (clickhouse_nlb_allowed_source_ranges / otel_collector_nlb_allowed_source_ranges) default to unrestricted. Before relying on the deployment, scope them to the networks that should reach each endpoint β see Network access in the Configuration reference.
Next steps
Continue to Deploy the agent and connect to Monte Carlo.
