Troubleshooting agent (public preview)
In previewThis feature is in preview. See Integration & Feature Lifecycles documentation for more information on what this means.
When receiving an alert from Monte Carlo, the troubleshooting agent can automatically work through 100s of hypothesis and highlight the ones that are most likely to have caused the issue. It will consider changes in the data, system issues (e.g. Airflow or dbt failures) and code changes when analyzing an alert, and will automatically traverse lineage to identify the root cause.

Agent context
The agent relies on the same metadata, query logs, and metrics collected from various integrations for monitoring in order to quickly rule out or further investigate many hypotheses of root cause. The optimal conditions for the agent are data warehouses and lakehouses with data sampling enabled, full lineage instrumentation, query history, and active integrations like GitHub, GitLab, dbt, Databricks Workflows, and Airflow.
Agent investigations & hypotheses
Below are a few examples of investigation paths that the agent can perform.
Type | Hypothesis investigated |
---|---|
Row count changes | Has there been similar row count changes upstream? |
Query changes | Was there a query that usually runs that was modified? |
Job failure | Has a dbt model, Databricks job, or Airflow DAG failed? |
Failed queries | Did a query that usually runs fail/error? |
Missing query | Is a query that usually runs missing in the logs? |
Non-writing query | Is a query that usually writes or updates data writing zero updates? |
Additional query | Did a query that hasn't been seen before run? |
Validation failure | Has a validation on this asset recently failed? |
Pull requests | Was there a recent pull request? |
Data analysis | Are there underlying correlations in the data of affected records? (Currently only available for cloud Deployments) |
Preview caveats
During the preview phase, the following situations are not yet supported:
- Data analysis using Data Sampling
- For Hybrid Deployments and Cloud deployments hosted in the EU
- For Metric Monitors alerts
- Some monitor alert types (Custom SQL, Comparison, and merged alerts)
Security & data privacy
No customer data will be used for current or future model training.
The Troubleshooting Agent relies on AI Models provided by Amazon Bedrock, in addition to a monitoring application provided by LangChain (LangSmith). These vendors are noted on our Subprocessors page.
For more detail on agent security and data privacy, see the AI Features and Technical Information documentation.
Feedback
Have feedback or requests on the troubleshooting agent? Reach out to Monte Carlo support or [email protected].
Updated 1 day ago