Marking and tracking incidents is best practice for data teams looking to improve data quality and trust across the business. Incidents should be communicated out to stakeholders when appropriate and reviewed on a monthly cadence to determine where gaps in data quality may lie. Furthermore, severity levels of incidents are useful for understanding impact quickly and setting priorities for data teams.

This change is gradually rolling out to Monte Carlo workspaces over the next week. Learn more about marking incidents and severity here in the docs!

Didn't everything used to be called an incident in Monte Carlo? Yes! See more about that recent change under Introducing: Alerts.

The "Investigating" Alert Status has been renamed to "Acknowledged." This is a minor change ahead of the Incident management changes coming soon. Read more about incident management changes on Introducing: Alerts.

Note: As of now, this is a UI change as well as Slack / MS Teams. The Monte Carlo API and Insight Reports will still show the value as Investigating. In the future, this will be changed to Acknowledged when new APIs are released.

Automated freshness, volume, schema change monitoring are now available in beta for SQL Server. Please reach out to the MC team if you are interested in getting access.

For existing SQL Server connections, an extra permission GRANT VIEW SERVER STATE is needed for the MC service account users per docs here

Tags from dbt models (incl. meta configs) are now shown in dbt alerts!

Our lineage now shades all unmonitored tables gray. This can help highlight where you might have coverage gaps in your monitoring strategy.

Assets which are "lineage only" -- those that cannot be actively monitored, such as BI Reports -- will also show up with a white background.

To add monitoring coverage, a few options:

  • Click on the unmonitored table node, and our recommendation engine might suggest the best way to monitor, such as with a Data Product
  • Create a Data Product that includes the unmonitored table, and monitor that table
  • Or go straight to our usage page, and add custom rules for monitoring configuration

We have added an option to create volume rules from the asset page!

  • Our asset page now includes three widgets: Time since last updateTime since last row count change and Change in row count with their respective graphs.
  • We've simplified the process of creating volume rules by enabling users to set manual thresholds through the two new widgets: Time since last row count change and Change in row count.
  • We've improved the graphs related to these metrics.

More detail on these changes can be found at Volume monitor

We've said goodbye to the "Uptime" charts on the Data Reliability and Data Product Dashboards. We're working on something new instead, so let us know if you have needs around this type of reporting!

📘

Progressive rollout in the coming weeks

These changes will be rolled out to Workspaces in several phases over the coming weeks. Emails will be sent to all users in Account Owners, Domain Managers and Editors roles with additional details when these changes will take place in their Workspace and any necessary actions required.

New Behavior for Unmonitored Tables

Custom monitors configured to run against unmonitored or muted tables will now fail with an "Error" status. This change ensures that monitoring is consistently applied to all relevant tables, improving the accuracy and effectiveness of your custom monitors.

Enhanced User Interface

When creating new monitors, unmonitored tables will be clearly identified in the table selection process. A modal will also be displayed when creating SQL monitors to highlight any unmonitored tables being referenced.


Viewing monitors with unmonitored tables

Prior to these changes being enabled in your Workspace, we recommend you address any monitors with Unmonitored or Muted tables.

To view Monitors with Unmonitored or Muted tables:

  1. Navigate to Monitors

  2. There will be a new column labeled "Unmonitored tables" or "Muted tables".

  3. This provides a sortable column of the number of tables referenced by that monitor that are currently not monitored. We recommend you either enabled monitoring on these tables or disable the monitor if no longer in use. There are several ways to resolve this:

    1. If the monitor is no longer useful, disable or delete the monitor. This is done by going to the three dot menu under Actions and clicking "Disable" or "Delete"

    2. If the monitor is still useful and you want to ensure you continue to receive any alerts raised by the monitor, enable monitoring for the table(s) referenced by this monitor.

      1. Click into the monitor to verify what table(s) are "Not monitored" or "Muted".
      2. Navigate to the necessary tables to either enable monitoring on them or un-mute them. Refer to Recommended monitoring strategies for tables

This update reinforces our commitment to providing comprehensive monitoring capabilities and encourages users to enable monitoring for all relevant tables by either un-muting tables or enable monitoring for them through the Usage UI.

We heard clearly from our customers that not every alert/notification from MC is an "Incident." We also know that reporting and retrospectives on "alerts" is not always necessary, but these activities on true data incidents is critical to the success of data teams.

We're better aligning Monte Carlo to accepted industry tooling and terminology: triage alerts and escalate them to incidents where it makes sense, report on those incidents, and communicate them out to stakeholders if appropriate.

The past state of “Incidents” in Monte Carlo has been repurposed as “Alerts.” The use of “Severity” also split into “Priority” for Monitors and Alerts, and “Severity” will remain as the way to mark an Alert as an Incident.

  • "Incidents" today become "Alerts" going forward.
  • Custom Monitors can have a pre-set "Priority," replacing the pre-set "Severity" today.
  • "Priority" from a Custom Monitor is inherited to an Alert, but is not changeable on the Alert.
  • "Alerts" can be marked as "Incidents" by utilizing "Severity." This workflow will be further improved in the coming weeks.

More detail on these changes can be found at Introducing: Alerts.