Agent Observability Platform: Configuration Reference

Optional configuration β€” TLS, retention, ClickHouse users, and sizing

All settings on this page are optional β€” the defaults produce a working deployment. Set them as inputs to the Terraform module.

Data retention (TTL)

ClickHouse retains trace data for a fixed window, after which old data is dropped automatically.

InputDefaultDescription
clickhouse_ttl_days30Days of trace data to retain.

ClickHouse users

The platform provisions these ClickHouse SQL users:

UserAccessUsed by
otelRead/writeThe OpenTelemetry Collector (writing traces), the LLM worker, the schema-migration job, and β€” currently β€” the Monte Carlo Agent (Trace Exploration, agent monitors)
readonly_userSELECT-only (optional)External SQL clients (e.g. DataGrip)

Each user's password is generated by the module and stored in AWS Secrets Manager (KMS-encrypted), then synced into the cluster by the External Secrets Operator.

πŸ“˜

The stock ClickHouse default superuser is removed during deployment as a hardening measure β€” every client authenticates as one of the users above.

Read-only user

To provision the optional readonly_user (SELECT-only, profile readonly_settings with readonly = 2 β€” SELECT plus per-session SET, so JDBC clients such as DataGrip connect cleanly):

helm = {
  chart_registry = "oci://registry-1.docker.io/montecarlodata"
  chart_version  = "1.5.0"

  clickhouse = {
    readonly_user = {
      enabled = true
      # password = "..."  # optional; omit to auto-generate
    }
  }
}

Its password ARN is exposed as the clickhouse_readonly_user_credentials_secret_arn output.

πŸ“˜

The module requires ao-data-platform chart version >= 1.3.0. The read-only user itself has been available since chart 1.2.0; on chart versions older than that the Secrets Manager secret is created but no SQL user is provisioned.

Resource sizing

Each managed workload exposes optional Kubernetes resource requests/limits via helm.<workload>.resources. Omit a workload (or a requests/limits map) to use the chart defaults β€” start modest in development and tune up for production.

helm = {
  chart_registry = "oci://registry-1.docker.io/montecarlodata"
  chart_version  = "1.5.0"

  clickhouse = {
    resources = {
      requests = { cpu = "2", memory = "8Gi" }
      limits   = { cpu = "4", memory = "16Gi" }
    }
  }
  opentelemetry_collector = {
    resources = {
      requests = { memory = "2Gi" }
      limits   = { memory = "6Gi" }
    }
  }
  llm_worker = {
    resources = {
      requests = { cpu = "500m", memory = "1Gi" }
      limits   = { cpu = "2", memory = "4Gi" }
    }
  }
}

requests and limits are maps keyed by Kubernetes resource name (cpu, memory, ephemeral-storage, etc.); either can be omitted independently.

Node groups (new-cluster path)

On the new-cluster path, you can size the node groups the workloads run on:

InputDefaultDescription
clickhouse_node_group.instance_typer5.xlargeInstance type for the dedicated ClickHouse node.
cluster.node_instance_typet3.largeInstance type for the main node group (Collector, LLM worker, controllers).
cluster.main_node_group_size2Desired/minimum size of the main node group (set 1 for a cost-optimized single node; max 10).

Storage

ClickHouse data is stored on an EBS-backed persistent volume.

InputDefaultDescription
helm.clickhouse.storage_size500GiSize of the ClickHouse persistent volume.
clickhouse_storage_classclickhouse-gp3StorageClass for the ClickHouse volume. The module creates a dedicated clickhouse-gp3 class by default; set to an existing class name to use your own.
storage_class_clickhouse_gp3.iops3000Provisioned IOPS for the clickhouse-gp3 class (min 3000, max 16000).
storage_class_clickhouse_gp3.throughput125Throughput in MB/s for the clickhouse-gp3 class (min 125, max 1000; must be ≀ iops Γ— 0.25).

Network access

By default the ClickHouse and OpenTelemetry Collector Network Load Balancers are internal, but accept traffic from any source that can route to them. Restrict the source ranges to harden access:

InputDefaultDescription
clickhouse_nlb_allowed_source_rangesnull (unrestricted)CIDR ranges permitted to reach the ClickHouse NLB. [] restricts to the VPC.
otel_collector_nlb_allowed_source_rangesnull (unrestricted)CIDR ranges permitted to reach the OpenTelemetry Collector NLB. [] restricts to the VPC.
πŸ”’

These NLB source ranges are the primary network control for reaching ClickHouse β€” the otel user (and the optional readonly_user) accept connections from any source at the ClickHouse layer. Leaving the default unrestricted means anyone who can route to the internal NLB can attempt to connect. Set clickhouse_nlb_allowed_source_ranges to your VPC or a specific CIDR list, especially before enabling readonly_user for external SQL clients.

TLS

TLS is enabled by default. The OpenTelemetry Collector and ClickHouse NLB endpoints are TLS-terminated using ACM certificates that the module provisions for your otel_collector_domain and clickhouse_domain. Traffic between the Collector and ClickHouse inside the cluster is secured with certificates issued by cert-manager.

Evaluation (Amazon Bedrock)

The LLM worker invokes models through Amazon Bedrock. By default it targets your deployment region.

InputDefaultDescription
helm.llm_worker.bedrock_regionvar.regionAWS region for Bedrock API calls.
helm.llm_worker.image_repositoryderived from chart_registry (e.g. registry-1.docker.io/montecarlodata/ao-llm-worker)Override the LLM-worker image repository.
helm.llm_worker.image_taglatestLLM-worker image tag. Pin to a published version β€” the current release is 1.0.0.