Agent Observability Data Platform
Self-hosted data platform for Agent Observability β OpenTelemetry Collector and ClickHouse on EKS, deployed with Terraform
Overview
The Agent Observability data platform is a self-hosted pipeline that ingests OpenTelemetry trace data from your AI agents and stores it in ClickHouse for Monte Carlo to query. It runs entirely in your own AWS account and is deployed with a single Terraform module that provisions everything from the OpenTelemetry Collector through to ClickHouse on an EKS cluster.
Once deployed, Monte Carlo connects to the platform to power Trace Exploration and agent monitors β all queried through the Monte Carlo Agent.
This is a new deployment option that stores trace data directly in ClickHouse. It is distinct from the warehouse-based ingestion path described in Agent with OpenTelemetry Collector, which routes traces into a data warehouse (Snowflake, Databricks, BigQuery, or Athena).
AWS only. This self-hosted platform runs on Amazon EKS and is available for AWS accounts only. If your agents run on Azure or GCP, use the warehouse-based Agent with OpenTelemetry Collector path (Snowflake, Databricks, or BigQuery), or contact your Monte Carlo representative to discuss options.
Public artifacts. The platform is deployed from the
terraform-aws-ao-data-platformmodule on the Terraform Registry, with theao-data-platformHelm chart andao-llm-workerimage on Docker Hub. Pulling them requires no registry credentials β see Prerequisites.
Architecture
The platform has two independent data flows:
- Ingestion β your instrumented agents send OpenTelemetry traces to the OpenTelemetry Collector, which writes them directly into ClickHouse.
- Query β the Monte Carlo platform sends all of its SQL through the Monte Carlo Agent (a Lambda function running in the same VPC), which queries ClickHouse for Trace Exploration and agent monitors.
flowchart TB
classDef ext fill:#F8FAFC,stroke:#64748B,color:#0F172A;
classDef node fill:#FEF2F2,stroke:#DC2626,color:#7F1D1D;
classDef net fill:#FFF7ED,stroke:#EA580C,color:#7C2D12;
classDef agent fill:#EEF2FF,stroke:#4F46E5,color:#1E1B4B;
APPS["Instrumented agents / apps"]:::ext
MC["Monte Carlo platform"]:::ext
BR["Amazon Bedrock"]:::ext
subgraph ACCT["Your AWS account"]
AGENT["Monte Carlo Agent Β· Lambda, in-VPC<br/>customer-deployed, outside EKS"]:::agent
NLB1["Internal NLB Β· OTLP"]:::net
NLB2["Internal NLB Β· ClickHouse"]:::net
subgraph EKS["EKS cluster Β· montecarlo namespace"]
OTEL["OpenTelemetry Collector"]:::node
LLM["LLM worker"]:::node
CH[("ClickHouse")]:::node
end
end
APPS -->|"OTLP 4317 / 4318 Β· TLS"| NLB1
NLB1 --> OTEL
OTEL -->|"write traces"| CH
MC -->|"SQL Β· Monte Carlo initiates"| AGENT
AGENT -->|"forwards SQL Β· 8443 TLS Β· otel user"| NLB2
NLB2 --> CH
LLM -->|"read / write"| CH
LLM -->|"evaluations Β· InvokeModel"| BR
style ACCT fill:#FFFBEB,stroke:#EA580C,color:#7C2D12
style EKS fill:#ECFEFF,stroke:#0891B2,color:#164E63
Components
The Terraform module deploys the following into your AWS account:
| Component | Description |
|---|---|
| EKS cluster + VPC | A new cluster and VPC, or your existing ones. Hosts the data-plane workloads below. |
| ClickHouse | The trace data store. Deployed and managed by the Altinity ClickHouse Operator. |
| OpenTelemetry Collector | Receives OTLP traces and writes them to ClickHouse. |
| LLM worker | Runs evaluations against trace data using Amazon Bedrock (see Evaluation). |
| Cluster controllers | AWS Load Balancer Controller, cert-manager, External Secrets Operator, and external-dns. |
| Supporting AWS resources | ACM certificates, Route 53 records, IAM/IRSA roles, and Secrets Manager secrets. |
The Monte Carlo Agent (Lambda) is not part of this module β it is deployed separately and pointed at ClickHouse. See Deploy the agent and connect to Monte Carlo.
Ingestion
Instrumented agents send traces to the OpenTelemetry Collector over OTLP β gRPC on port 4317 or HTTP on port 4318, both TLS-terminated at a Network Load Balancer (NLB). The Collector batches incoming spans and writes them directly into ClickHouse using the ClickHouse exporter; there is no intermediate object storage or data warehouse.
The Collector can be hosted in-cluster (deployed by this module) or, where supported, hosted by Monte Carlo with only the data store running in your account.
Evaluation
The platform includes an LLM worker that runs evaluations against trace data. It reads pending work from ClickHouse, invokes a model through Amazon Bedrock in your account, and writes the results back to ClickHouse. The Bedrock region defaults to your deployment region and is configurable.
End-to-end setup
Getting agent traces into Monte Carlo is a four-step journey. This guide covers steps 2β3 (deploy and connect); the links below cover instrumenting your agents and creating monitors.
| Step | What you do | Where |
|---|---|---|
| 1. Instrument | Add the Monte Carlo OpenTelemetry SDK to your agents so they emit OTLP traces. | Monte Carlo SDK; the instrument-agent skill in the Agent Toolkit |
| 2. Deploy | Provision the OpenTelemetry Collector and ClickHouse in your AWS account with Terraform. | Prerequisites β Installation |
| 3. Connect | Deploy the Monte Carlo Agent and hand off the ClickHouse connection. | Connect to Monte Carlo |
| 4. Monitor | Explore traces and create agent monitors in Monte Carlo. | Agent Monitors Overview |
Deploying the platform
Work through these pages in order:
| Step | Page |
|---|---|
| 1 | Prerequisites β tooling, AWS account and permissions, domains, and chart access |
| 2 | Installation β configure and apply the Terraform module |
| 3 | Deploy the agent and connect to Monte Carlo β deploy the Monte Carlo Agent and hand off credentials |
| β | Configuration reference β TLS, retention, ClickHouse users, and resource sizing |
| β | Self-managed Helm install β advanced: manage the Helm release yourself |
| β | Troubleshooting & FAQ β common installation and runtime issues |
