📝
Prerequisites
Complete the Prerequisites first — tooling, AWS permissions, domains, and chart registry access.

Overview

The terraform-aws-ao-data-platform module provisions the EKS cluster and, by default (helm.deploy_charts = true), also deploys the ao-data-platform Helm chart — ClickHouse, the OpenTelemetry Collector, and the LLM worker — in the same terraform apply.

The kubernetes and helm providers are configured from the module's outputs, which lets Terraform defer the Kubernetes/Helm resources until after the EKS cluster exists. This is what enables a single-pass apply.

Work through the steps below in order. The required inputs are the same for both cluster paths: region, otel_collector_domain, clickhouse_domain, helm.chart_registry, and helm.chart_version.

📘
The examples below install the public artifacts: the Terraform module from the Terraform Registry (monte-carlo-data/ao-data-platform/aws), and the ao-data-platform Helm chart and ao-llm-worker image from Docker Hub (see Prerequisites). Pulling them requires no registry authentication.

1. Configure the providers

Your root module must configure the aws, kubernetes, and helm providers. The kubernetes and helm providers are wired from the module's outputs — substitute your region for us-east-1:

terraform {
  required_providers {
    aws        = { source = "hashicorp/aws", version = "~> 6.0" }
    kubernetes = { source = "hashicorp/kubernetes", version = "~> 2.0" }
    helm       = { source = "hashicorp/helm", version = "~> 2.0" }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "kubernetes" {
  host                   = module.ao_data_platform.eks_cluster_endpoint
  cluster_ca_certificate = module.ao_data_platform.eks_cluster_ca_certificate
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.ao_data_platform.eks_cluster_name, "--region", "us-east-1"]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.ao_data_platform.eks_cluster_endpoint
    cluster_ca_certificate = module.ao_data_platform.eks_cluster_ca_certificate
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", module.ao_data_platform.eks_cluster_name, "--region", "us-east-1"]
    }
  }
}

2. Configure the module

Choose the tab for your deployment path.

The module creates the VPC and EKS cluster with sensible defaults (cluster name monte-carlo):

module "ao_data_platform" {
  source  = "monte-carlo-data/ao-data-platform/aws"
  version = "2.0.0"

  region                = "us-east-1"
  otel_collector_domain = "otel.acme.com"
  clickhouse_domain     = "clickhouse.acme.com"
  hosted_zone_id        = "Z1234567890ABC"

  helm = {
    chart_registry = "oci://registry-1.docker.io/montecarlodata"
    chart_version  = "2.0.0"

    llm_worker = {
      image_tag = "1.0.1"
    }
  }
}

See examples/new_cluster/ for a complete, copy-paste starting point including the provider block.

Set cluster.create = false and networking.create_vpc = false, and provide your cluster name, VPC ID, and at least two private subnet IDs in different AZs:

module "ao_data_platform" {
  source  = "monte-carlo-data/ao-data-platform/aws"
  version = "2.0.0"

  region = "us-east-1"

  cluster = {
    create                = false
    existing_cluster_name = "my-cluster"
  }

  networking = {
    create_vpc                  = false
    existing_vpc_id             = "vpc-0abc123"
    existing_private_subnet_ids = ["subnet-aaa", "subnet-bbb"]
  }

  otel_collector_domain = "otel.acme.com"
  clickhouse_domain     = "clickhouse.acme.com"
  hosted_zone_id        = "Z1234567890ABC"

  helm = {
    chart_registry = "oci://registry-1.docker.io/montecarlodata"
    chart_version  = "2.0.0"

    llm_worker = {
      image_tag = "1.0.1"
    }

    # Set any controller already present in the cluster to false:
    # install_aws_load_balancer_controller = false
    # install_cert_manager                 = false
    # install_external_secrets_operator    = false
    # install_external_dns                 = false
  }
}

See examples/existing_cluster/ for the full configuration. If the cluster's OIDC provider was created outside Terraform, import it first.

📘
helm.chart_registry is the registry prefix only — oci://registry-1.docker.io/montecarlodata, with no chart name. The module appends /ao-data-platform itself, so adding it here (.../montecarlodata/ao-data-platform) makes terraform apply fail to pull the chart. The artifacts table in Prerequisites lists the full chart path because that's the chart's location — but as a module input, pass only the prefix.

3. Initialize Terraform

terraform init

This downloads the module and the aws, kubernetes, and helm providers.

4. Review the plan

terraform plan

Review the planned changes before applying. On the new-cluster path the plan creates a VPC, the EKS cluster and node groups, IAM/IRSA roles, a KMS key, Secrets Manager secrets, ACM certificates, and the Helm releases.

❗️
Check the ClickHouse node group's Availability Zone. On the new-cluster path, confirm the clickhouse_node_group.availability_zone value in the plan output. EBS volumes are AZ-locked, so this must match the AZ of the ClickHouse persistent volume — see Dedicated ClickHouse node group below.

5. Apply

terraform apply

Review the plan once more, then confirm. The apply provisions the AWS infrastructure and (with helm.deploy_charts = true) deploys the chart in one pass. A full apply on the new-cluster path typically takes 15–25 minutes, most of it waiting on the EKS cluster and node groups.

📝
apply modifies your ~/.kube/config. To enable the single-pass deploy, the module runs aws eks update-kubeconfig inside local-exec provisioners — it has to kubectl wait for the External Secrets Operator CRDs and apply a ClusterSecretStore, which the native Terraform kubernetes/helm providers can't do for resources created in the same apply. This adds or refreshes the cluster's context in the ~/.kube/config of the machine running Terraform and makes it the current context. It happens on every apply, on both cluster paths.

📘
To manage the Helm release yourself instead, set helm.deploy_charts = false and follow the self-managed Helm install.

6. Confirm what was created

When the apply completes, review the outputs:

terraform output

You should see (among others):

Output	What it is
`eks_cluster_name`	The cluster name — use it to configure `kubectl` next
`montecarlo_namespace`	The namespace (`montecarlo`) all components run in
`clickhouse_monte_carlo_credentials_secret_arn`	Secrets Manager ARN for the `monte_carlo` user — the credential you hand to Monte Carlo
`clickhouse_otel_credentials_secret_arn`	ARN for the `otel` ingest user (OpenTelemetry Collector)
`clickhouse_schema_owner_credentials_secret_arn` / `clickhouse_llm_worker_credentials_secret_arn`	ARNs for the `schema_owner` (migrations / MV owner) and `llm_worker` users
`clickhouse_admin_credentials_secret_arn` / `clickhouse_readonly_user_credentials_secret_arn`	ARNs for the optional `admin` and `readonly_user` — `null` when those users are disabled
`otel_collector_irsa_role_arn` / `llm_worker_irsa_role_arn`	IRSA roles for the workloads
`clickhouse_node_group`	The dedicated ClickHouse node group (new-cluster path), including its resolved `availability_zone`

The AWS infrastructure is now provisioned and the chart is deploying. Verifying that the in-cluster components (ClickHouse, the Collector, the LLM worker) came up healthy is the first step of Deploy the agent and connect to Monte Carlo.

Dedicated ClickHouse node group

On the new-cluster path, the module automatically creates a dedicated single-AZ EKS managed node group for ClickHouse (when helm.deploy_charts = true). It is a single node, tainted dedicated=clickhouse:NoSchedule, and the module wires the ClickHouse pod's nodeSelector/tolerations to target it — no manual configuration required. This isolates ClickHouse from the OpenTelemetry Collector and other workloads, which run on the main node group.

📘
This dedicated node group (and the matching nodeSelector/tolerations wiring) requires ao-data-platform chart version >= 1.3.0 — comfortably met by the module's current minimum of 2.0.0.

❗️
EBS volumes are AZ-locked. The dedicated node group must live in the same Availability Zone as the ClickHouse persistent volume. It defaults to the region's first AZ (alphabetically); override with clickhouse_node_group.availability_zone if your volume is elsewhere. Always check the clickhouse_node_group.availability_zone output during plan/apply review.

⚠️
No high availability. ClickHouse runs as a single replica on the single-node group, so node-drain operations (EKS AMI rolls, node-group resizing, manual drains) cause brief ClickHouse downtime (~30–90s) while the pod restarts. This is expected for the single-AZ, single-replica design.

On the existing-cluster path, the module does not manage this node group — attach a tainted single-AZ node group yourself and pass matching clickhouse.nodeSelector/clickhouse.tolerations via your Helm values.

Harden network access

The ClickHouse and OpenTelemetry Collector NLBs are internal, but their allowed-source ranges (clickhouse_nlb_allowed_source_ranges / otel_collector_nlb_allowed_source_ranges) default to unrestricted. Before relying on the deployment, scope them to the networks that should reach each endpoint — see Network access in the Configuration reference.

Next steps

Continue to Deploy the agent and connect to Monte Carlo.