Triage agent

The Triage Agent automatically prioritizes and scores Monte Carlo alerts, helping data teams cut through alert noise and focus on the incidents that matter most. It evaluates each alert on two dimensions — incident likelihood (how likely is this a real problem?) and alert impact (how large is the downstream blast radius?) — and assigns a HIGH, MEDIUM, or LOW score to each.

How it works

When triggered, the Triage Agent runs a multi-step pipeline for each alert:

  1. Assessment — The agent gathers context about the alert in parallel: anomaly data, failed or futile queries, downstream impact via lineage, 60-day alert history, business criticality of affected assets, recent query changes, orchestrator issues (Airflow, dbt, Databricks), and monitor configuration details.

  2. Scoring — Using the gathered context, the agent scores the alert on two axes:

    • Incident likelihood — Is this anomaly likely to be a real data quality issue? (HIGH / MEDIUM / LOW)
    • Alert impact — If this is a real issue, how large is the downstream blast radius? (HIGH / MEDIUM / LOW)
  3. Classification — The agent classifies the alert into one of several categories: intentional change, natural variation, possible incident, resolved, verified ongoing, or other.

  4. Conditional deep investigation — If either score is HIGH, or both scores are MEDIUM, the agent automatically invokes the Troubleshooting Agent for deeper root cause analysis. Lower-scored alerts skip this step.

  5. Findings — The agent creates findings that appear in the Agentic Platform findings feed with the assessment, scoring rationale, and classification for each alert.

Accessing the Triage Agent

In the Monte Carlo UI

There are two ways to trigger the Triage Agent from the UI:

  • Alerts page — Click the Triage alerts button to run the agent on your currently filtered alerts (up to 50 at a time). The agent processes alerts in batch and buckets them into high, medium, and low priority groups. Each triaged alert gets a View triage reasoning icon showing the agent's scoring rationale.
  • Operations Agent — Ask the Operations Agent to triage your alerts in natural language, and it routes to the Triage Agent automatically.

Results appear in the Agentic Platform findings feed, where you can filter by triage agent findings. In Slack, triaged alerts receive a colored indicator on the title -- 🔴 high, 🟠 medium, 🔵 low -- with a brief summary posted as a threaded comment.

info

When used from the UI or MCP, the Triage Agent scores and classifies alerts but does not take automated actions (such as updating alert statuses or assigning owners). Automated actions are available when the Triage Agent runs as part of a configured automated workflow.

Via the Agent Toolkit (MCP)

The automated-triage skill in the Agent Toolkit exposes triage through MCP tools in AI coding agents like Claude Code and Cursor. You can trigger it with natural language:

  • "Triage my alerts"
  • "What alerts are firing?"
  • "Triage alert <uuid>"

The skill provides these MCP tools: get_alerts, alert_assessment, run_troubleshooting_agent, get_troubleshooting_agent_results, update_alert, set_alert_owner, create_or_update_alert_comment, and mark_event_as_normal.

Automated workflow actions

When the Triage Agent runs as part of an automated workflow, it can take additional actions based on its assessment:

ActionDescription
Update alert statusSet status to NO_ACTION_NEEDED, EXPECTED, FIXED, or ACKNOWLEDGED
Declare incident severityEscalate to SEV-1 through SEV-4
Assign alert ownersRoute the alert to the appropriate team member
Post triage commentsAdd assessment reasoning directly on the alert
Mark events as normalRecalibrate ML thresholds when an anomaly is expected behavior
Post Slack summariesShare triage results in your configured Slack channels

Security & data privacy

For more detail on security and data privacy, see the AI Features and Technical Information documentation.