Troubleshooting agent (public preview)

πŸ“˜

In preview

This feature is in preview. See Integration & Feature Lifecycles documentation for more information on what this means.

When receiving an alert from Monte Carlo, the troubleshooting agent can automatically work through 100s of hypothesis and highlight the ones that are most likely to have caused the issue. It will consider changes in the data, system issues (e.g. Airflow or dbt failures) and code changes when analyzing an alert, and will automatically traverse lineage to identify the root cause.

Agent context

The agent relies on the same metadata, query logs, and metrics collected from various integrations for monitoring in order to quickly rule out or further investigate many hypotheses of root cause. The optimal conditions for the agent are data warehouses and lakehouses with data sampling enabled, full lineage instrumentation, query history, and active integrations like GitHub, GitLab, dbt, Databricks Workflows, and Airflow.

Agent investigations & hypotheses

Below are a few examples of investigation paths that the agent can perform.

TypeHypothesis investigated
Row count changesHas there been similar row count changes upstream?
Query changesWas there a query that usually runs that was modified?
Job failureHas a dbt model, Databricks job, or Airflow DAG failed?
Failed queriesDid a query that usually runs fail/error?
Missing queryIs a query that usually runs missing in the logs?
Non-writing queryIs a query that usually writes or updates data writing zero updates?
Additional queryDid a query that hasn't been seen before run?
Validation failureHas a validation on this asset recently failed?
Pull requestsWas there a recent pull request?
Data analysisAre there underlying correlations in the data of affected records? (Currently only available for cloud Deployments)

Preview caveats

During the preview phase, the following situations are not yet supported:

Security & data privacy

No customer data will be used for current or future model training.

The Troubleshooting Agent relies on AI Models provided by Amazon Bedrock, in addition to a monitoring application provided by LangChain (LangSmith). These vendors are noted on our Subprocessors page.

For more detail on agent security and data privacy, see the AI Features and Technical Information documentation.

Feedback

Have feedback or requests on the troubleshooting agent? Reach out to Monte Carlo support or [email protected].