Agent Observability Platform: Configuration Reference
Optional configuration β TLS, retention, ClickHouse users, and sizing
All settings on this page are optional β the defaults produce a working deployment. Set them as inputs to the Terraform module.
Data retention (TTL)
ClickHouse retains trace data for a fixed window, after which old data is dropped automatically.
| Input | Default | Description |
|---|---|---|
clickhouse_ttl_days | 30 | Days of trace data to retain. |
ClickHouse users
The platform provisions these ClickHouse SQL users:
| User | Access | Used by |
|---|---|---|
otel | Read/write | The OpenTelemetry Collector (writing traces), the LLM worker, the schema-migration job, and β currently β the Monte Carlo Agent (Trace Exploration, agent monitors) |
readonly_user | SELECT-only (optional) | External SQL clients (e.g. DataGrip) |
Each user's password is generated by the module and stored in AWS Secrets Manager (KMS-encrypted), then synced into the cluster by the External Secrets Operator.
The stock ClickHouse
defaultsuperuser is removed during deployment as a hardening measure β every client authenticates as one of the users above.
Read-only user
To provision the optional readonly_user (SELECT-only, profile readonly_settings with readonly = 2 β SELECT plus per-session SET, so JDBC clients such as DataGrip connect cleanly):
helm = {
chart_registry = "oci://registry-1.docker.io/montecarlodata"
chart_version = "1.5.0"
clickhouse = {
readonly_user = {
enabled = true
# password = "..." # optional; omit to auto-generate
}
}
}Its password ARN is exposed as the clickhouse_readonly_user_credentials_secret_arn output.
The module requires
ao-data-platformchart version >= 1.3.0. The read-only user itself has been available since chart 1.2.0; on chart versions older than that the Secrets Manager secret is created but no SQL user is provisioned.
Resource sizing
Each managed workload exposes optional Kubernetes resource requests/limits via helm.<workload>.resources. Omit a workload (or a requests/limits map) to use the chart defaults β start modest in development and tune up for production.
helm = {
chart_registry = "oci://registry-1.docker.io/montecarlodata"
chart_version = "1.5.0"
clickhouse = {
resources = {
requests = { cpu = "2", memory = "8Gi" }
limits = { cpu = "4", memory = "16Gi" }
}
}
opentelemetry_collector = {
resources = {
requests = { memory = "2Gi" }
limits = { memory = "6Gi" }
}
}
llm_worker = {
resources = {
requests = { cpu = "500m", memory = "1Gi" }
limits = { cpu = "2", memory = "4Gi" }
}
}
}requests and limits are maps keyed by Kubernetes resource name (cpu, memory, ephemeral-storage, etc.); either can be omitted independently.
Node groups (new-cluster path)
On the new-cluster path, you can size the node groups the workloads run on:
| Input | Default | Description |
|---|---|---|
clickhouse_node_group.instance_type | r5.xlarge | Instance type for the dedicated ClickHouse node. |
cluster.node_instance_type | t3.large | Instance type for the main node group (Collector, LLM worker, controllers). |
cluster.main_node_group_size | 2 | Desired/minimum size of the main node group (set 1 for a cost-optimized single node; max 10). |
Storage
ClickHouse data is stored on an EBS-backed persistent volume.
| Input | Default | Description |
|---|---|---|
helm.clickhouse.storage_size | 500Gi | Size of the ClickHouse persistent volume. |
clickhouse_storage_class | clickhouse-gp3 | StorageClass for the ClickHouse volume. The module creates a dedicated clickhouse-gp3 class by default; set to an existing class name to use your own. |
storage_class_clickhouse_gp3.iops | 3000 | Provisioned IOPS for the clickhouse-gp3 class (min 3000, max 16000). |
storage_class_clickhouse_gp3.throughput | 125 | Throughput in MB/s for the clickhouse-gp3 class (min 125, max 1000; must be β€ iops Γ 0.25). |
Network access
By default the ClickHouse and OpenTelemetry Collector Network Load Balancers are internal, but accept traffic from any source that can route to them. Restrict the source ranges to harden access:
| Input | Default | Description |
|---|---|---|
clickhouse_nlb_allowed_source_ranges | null (unrestricted) | CIDR ranges permitted to reach the ClickHouse NLB. [] restricts to the VPC. |
otel_collector_nlb_allowed_source_ranges | null (unrestricted) | CIDR ranges permitted to reach the OpenTelemetry Collector NLB. [] restricts to the VPC. |
These NLB source ranges are the primary network control for reaching ClickHouse β the
oteluser (and the optionalreadonly_user) accept connections from any source at the ClickHouse layer. Leaving the default unrestricted means anyone who can route to the internal NLB can attempt to connect. Setclickhouse_nlb_allowed_source_rangesto your VPC or a specific CIDR list, especially before enablingreadonly_userfor external SQL clients.
TLS
TLS is enabled by default. The OpenTelemetry Collector and ClickHouse NLB endpoints are TLS-terminated using ACM certificates that the module provisions for your otel_collector_domain and clickhouse_domain. Traffic between the Collector and ClickHouse inside the cluster is secured with certificates issued by cert-manager.
Evaluation (Amazon Bedrock)
The LLM worker invokes models through Amazon Bedrock. By default it targets your deployment region.
| Input | Default | Description |
|---|---|---|
helm.llm_worker.bedrock_region | var.region | AWS region for Bedrock API calls. |
helm.llm_worker.image_repository | derived from chart_registry (e.g. registry-1.docker.io/montecarlodata/ao-llm-worker) | Override the LLM-worker image repository. |
helm.llm_worker.image_tag | latest | LLM-worker image tag. Pin to a published version β the current release is 1.0.0. |
