Troubleshooting Agent Technical Overview
Note: This feature is in public preview.
Overview
Monte Carlo's Troubleshooting Agent is an internally developed application that uses licensed models from Anthropic (accessed via Amazon Bedrock) to identify root causes of data + AI reliability issues.
We've created this section to provide a technical overview of the Troubleshooting Agent and answer frequently asked questions. For technical specs, such as Model Types and Versions click here.
Note: Data Sampling functionality can be enabled in order to enhance the Troubleshooting Agent.
Primary & Intended Use
The Troubleshooting Agent was built to accelerate time-to-resolution for incidents by automatically investigating telemetry and surfacing likely root causes. The optimal conditions for this feature are data warehouses and lakehouses with data sampling enabled, full lineage instrumentation, integrations with GitHub, GitLab, dbt, Airflow, and query history available.
This data sampling capabilities of the feature will not operate if data sampling is not enabled and performance degrades when data sources or lineage inputs are incomplete.
How the Troubleshooting Agent Works
The Troubleshooting Agent is designed to help users quickly investigate and resolve incidents by providing real-time, interactive support. When an incident occurs, the system checks if thereβs already an existing conversation (or βthreadβ) about that incident. If not, it starts a new one. Users interact with the agent through a web interface, where they can ask follow-up questions and receive continuous updates as the investigation progresses. The agent accesses internal dataβlike incident history, anomalies, and code changesβby securely connecting to the organizationβs servers. Each user sees only their own incident threads.
Data Privacy & Data Retention
The Troubleshooting Agent relies on AI Models provided by Amazon Bedrock (within AWS), in addition to a monitoring application provided by LangChain (LangSmith). They are noted on our Subprocessors page.
Processing Details
As noted above, in order for the Troubleshooting Agent to provide more intelligent insights, historical data is needed.
If using the Troubleshooting Agent with data sampling enabled, some customer data will be retrieved from the customer data store and be processed in a new job specific table within a database for up to one hour post agent interaction.
This database resides within the Monte Carlo US AWS environment. Upon completion of the job, the job specific table that contains the processing information is programmatically deleted.
See Privacy Considerations for the Troubleshooting Agent for additional information.
Updated 1 day ago