AI Architecture & Data Handling
Monte Carlo takes a feature-by-feature approach to AI and data handling. Each AI capability is designed with intentional data handling based on what it needs to deliver value while protecting customer data. We don't apply a one-size-fits-all model—instead, we design each feature around the minimum data required for it to be useful.
AI features work on top of the data Monte Carlo already collects for core observability—metadata, query logs, lineage, and optionally, data samples. When you trigger an AI feature, Monte Carlo determines what information that specific feature needs:
- Metadata only: Some features only need schema, table names, query patterns, and lineage. No data values required.
- Aggregated information: Some features work with statistics (row counts, null percentages, distributions) rather than individual records.
- Sample data: Some features require seeing actual data values to detect patterns, understand formats, or identify quality issues.
Key principle: Each feature uses the minimum data types necessary.
How Data Flows Through AI Features
sequenceDiagram
autonumber
participant Platform as Core MC Platform/UI
participant Storage as Object storage
participant Job as MC Internal Job(s)
participant Prompt as Prompt
participant Memory as Agent memory
participant Bedrock as AWS Bedrock (LLM)
participant Observability Tooling as Observability Tooling
Platform-->>Job: User triggers LLM feature
Job-->>Platform: Check for relevant data
Storage-->>Job: Read & Retrieve <br/>customer data <br/>(where & how applicable)
Job-->>Prompt: Add relevant <br/>customer data to prompt <br/>(where & how applicable)
Prompt->>Memory: Prompt sent to agent
Prompt->>Observability Tooling: Prompt trace <br/>(redacted when needed)
Memory-->>Bedrock: Agent prompts LLM
Bedrock->>Memory: LLM output sent <br/>back to agent
Memory->>Observability Tooling: Output trace <br/>(redacted when needed)
Memory-->>Platform: Results rendered to UI
When you trigger an AI feature:
-
Monte Carlo prepares context - Gathers the specific data needed (metadata, aggregated statistics, or samples) from internal systems and object storage
-
Prompt construction - Builds a prompt with feature-specific instructions, the prepared data, and conversation history if applicable
-
LLM processes request - Sent to AWS Bedrock (Anthropic Claude models) for analysis and recommendations
-
Data handling safeguards:
- Samples sent to LLM are immediately discarded after processing and excluded from traces
- Only aggregates or metadata in prompts get full trace logging
- No persistent storage in AWS Bedrock
-
Response delivered - AI-generated insights returned to you in the Monte Carlo UI
-
Conversation memory (for conversational features) - Context maintained up to 30 days in Monte Carlo's AWS environment for follow-up questions
Data Storage & Handling
| Data Type | Where It Lives | Duration | Used For |
|---|---|---|---|
| Metadata (schemas, table names) | Monte Carlo platform | Persistent | Core observability, AI analysis |
| Query logs | Monte Carlo platform | Persistent | Lineage, usage analytics, AI context |
| Data samples | Object storage | Configurable | Data quality monitoring, AI features |
| AI conversation history | Monte Carlo AWS environment | Up to 30 days | Conversational features |
| LLM processing | AWS Bedrock | Transient only | Real-time AI inference |
Important:
- AI features don't collect new data—they use what's already gathered for core observability
- Data samples only collected when data sampling is enabled
- Warehouse credentials are stored encrypted in Monte Carlo, never exposed to AI features
- AWS Bedrock processes requests with no persistent storage
Updated about 6 hours ago
