Troubleshooting Agent Technical Overview

Note: This feature is in public preview.

Overview

Monte Carlo's Troubleshooting Agent is an internally developed application that uses licensed models from Anthropic (accessed via Amazon Bedrock) to identify root causes of data + AI reliability issues.

We've created this section to provide a technical overview of the Troubleshooting Agent and answer frequently asked questions. For technical specs, such as Model Types and Versions click here.

Note: Data Sampling functionality can be enabled in order to enhance the Troubleshooting Agent.

Primary & Intended Use

The Troubleshooting Agent was built to accelerate time-to-resolution for incidents by automatically investigating telemetry and surfacing likely root causes. The optimal conditions for this feature are data warehouses and lakehouses with data sampling enabled, full lineage instrumentation, integrations with GitHub, GitLab, dbt, Airflow, and query history available.

This data sampling capabilities of the feature will not operate if data sampling is not enabled and performance degrades when data sources or lineage inputs are incomplete.

How the Troubleshooting Agent Works

The Troubleshooting Agent is designed to help users quickly investigate and resolve incidents by providing real-time, interactive support. When an incident occurs, the system checks if there’s already an existing conversation (or “thread”) about that incident. If not, it starts a new one. Users interact with the agent through a web interface, where they can ask follow-up questions and receive continuous updates as the investigation progresses. The agent accesses internal data—like incident history, anomalies, and code changes—by securely connecting to the organization’s servers. Each user sees only their own incident threads.

Data Privacy & Data Retention

The Troubleshooting Agent relies on AI Models provided by Amazon Bedrock (within AWS), in addition to a monitoring application provided by LangChain (LangSmith). They are noted on our Subprocessors page.

Processing Details

As noted above, in order for the Troubleshooting Agent to provide more intelligent insights, historical data is needed.

If using the Troubleshooting Agent with data sampling enabled, some aggregated row level data will be retrieved from the customer data store and processed. This data is also stored within a database in the Monte Carlo AWS environment for up to 30 days.

See Privacy Considerations for the Troubleshooting Agent for additional information.

Updated about 2 months ago