The troubleshooting agent now presents supporting evidence in a structured timeline format, including PR diffs, making it easier to follow the reasoning behind each recommendation.

You can also point to any specific piece of evidence and ask the agent to reconsider it, with the option to include additional context or clarification. This gives you more control over the troubleshooting process and helps you arrive at the answer you need faster.


Agent evaluation monitors and metric monitor prompt configurations now support custom LLM model names.

You can type any model name directly into the model selector instead of being limited to a predefined list, making it easy to use the latest models as soon as they are available.

The dropdown still shows all predefined options for convenience, with custom values clearly labeled. This gives your team the flexibility to route LLM requests to any model identifier your environment requires.

You can now create production-ready LLM-as-a-judge evaluations by simply describing what you want to measure. Type a short description of the dimension you care about, hit Generate, and get a complete eval prompt ready for production.

Starter templates are included for common evaluation dimensions like answer relevance, helpfulness, task completion, language match, clarity, prompt adherence, and semantic similarity. Advanced controls let you fine-tune scoring criteria and strictness levels to match your specific requirements.


You can now track and manage individual breached rows from Custom SQL and Validation monitors directly in Monte Carlo. When a primary key is configured on a monitor, breached rows are tracked across runs in a new Exceptions tab. From there, you can assign an owner, set a resolution status, add comments, take bulk actions, and track how long each exception has been open.

Learn more here: https://docs.getmontecarlo.com/docs/exception-management

Agent assets now include out-of-the-box dashboards showing trace volume, latency distributions across P50/P95/P99, token consumption trends, and error rates. All with automatic period-over-period comparisons. No configuration is required; connect your OpenTelemetry traces and the views are ready.

Whether you need to spot spikes in token usage, catch latency drifting upward, or confirm your agents are behaving as expected, these dashboards give your team immediate visibility from day one with a natural path to production-grade alerting as your agents mature.

You can now define FireHydrant Incident Tags on any FireHydrant notification channel within an Audience. Configure key/value or key-only tags, and they are automatically included on every alert sent through that channel.

Incident tags flow through automatically, ensuring incidents arrive in FireHydrant with the right routing and categorization without any extra manual steps.

Learn more here: https://docs.getmontecarlo.com/docs/firehydrant#firehydrant-incident-tags


Agent monitors can now be cloned: duplicate an existing monitor's configuration and point it at a different span.

When you're monitoring multiple spans with similar setups, cloning saves you from repetitive configuration and lets you scale monitor coverage across your agents faster.


Metric monitors now support custom prompt-based evaluations on your tables — no external pipelines, no SQL hacks.

Write a prompt and plug in any table fields as variables (e.g., "Is {{SUMMARY}} an accurate summary of {{TRANSCRIPTION}}?"), pick your output type (string, numeric, or boolean), choose your LLM, and configure sampling. Monte Carlo handles the rest.

Whether your warehouse holds AI-generated summaries, extracted features, model scores, or anything in between — you have a native way to catch bad outputs in your tables before they become bad decisions. Available for Snowflake, Databricks, and BigQuery.

Learn more here: https://docs.getmontecarlo.com/docs/metric-monitors#configuring-a-prompt-transformation


Propose monitors, collaborate on the right configuration, and review before anything goes live — now available for Multi-table Metric Monitors.

We recently launched Draft Monitors to give teams more control over the monitor creation process. Viewers can propose monitors and share them with teammates for feedback. Editors, Admins and Owners can review the configuration, make changes if needed, and enable them when the time is right.