Reducing noise from Monitors
When the conditions of a SQL and Validation Monitor are breached, it creates an Incident. This Incident will trigger a notification, based on configurations in Settings > Notifications.
Some monitors may remain in a ‘breached’ state for an extended period of time. This can result in repetitive notifications, especially if the monitor runs on a frequent schedule. This can cause undesirable alert fatigue.
Monte Carlo offers ways to minimize repetitive notifications during these scenarios. Options are made available in the Send notifications section of SQL and Validation Monitor creation. See here:
For SQL and Validation Monitors
Select 1 from the following set of options:
While the threshold stays violated, send a notification and then:
- Reduce noise: send another notification every [ X ] runs of the monitor
- Reduce noise: send another notification only if the value or count of breached rows changes
- Notify every time
Note: In SQL Rules, these options are only available when an Absolute threshold is selected. They are not available for rules with Automatic or Relative thresholds, as those are less prone to repetitive notifications.
Here is an example of how those options would play out over the course of 25 runs of a hypothetical SQL Rule whose breach condition is to notify if count is > 0. In this scenario, Option 1 is set to send another notification every 4 consecutive breached runs.
In Monitors as Code
The configuration options for Reduce Noise are called event_rollup_count
and event_rollup_until_changed
. Visit the Monitors as Code documentation for full details.
Additional notes
Regardless of the configuration selected by the user, Monte Carlo will not group more than 100 successive breaches of a monitor. This is to avoid monitors silently breaching for weeks-on-end, when it may be wiser to disable the monitor to avoid racking up compute. Once 100 successive breaches of a monitor are grouped together, the next breach will generate a new incident.
Updated 4 months ago