Interacting with Incidents

Discover and troubleshoot anomalous events happening in the data assets within your data ecosystem from the Incidents pages in Monte Carlo.

Exploring the Incidents Feed

On Incidents, you see a feed of all incidents, past and ongoing. Filters on the left allow you to filter the incidents by:

  • Status
  • Incident Type
  • Owner
  • Table
  • Dataset
  • Tag
  • Severity

Each row in this table is a summary of the incident with a few, quick-access incident management tools that allow you to:

  • Assign an owner to the incident
  • Classify severity
  • Manage investigation status

Each row can be expanded to see the details of an incident.

Incidents feed with filters applied and one incident expanded

Incidents feed with filters applied and one incident expanded

For information on the various types of incidents found in the Incident feed, refer to the Intro to monitors section.

Select multiple incidents by checking the box at the start of each row. This allows you to assign owner, severity, and status to multiple incidents at the same time.

Incidents feed with multiple incidents selected

Incidents feed with multiple incidents selected

Using Incident IQ

The Incident IQ page is accessible from the Incident feed by clicking the incident title.

Upon click, you are taken to the Summary page in the Incident IQ.

Incident IQ Page

Incident IQ Page

On the left-hand side, there is a incident timeline that shows the list of incident tables and their anomalous events. Incidents are grouped together if they are potentially relevant, so you can see the full impact of an incident.

Incident Grouping

Incidents from different tables are grouped:

  • If they are in the same schema and occurred in a 5-hour window
  • If they connected by lineage and occurred in a 5-hour window, even if they are across different schemas
  • If you opted into grouping repetitive dbt model errors into the same incident
  • If you opted into grouping repetitive dbt test failures into the same incident

Note that operational incidents are only grouped with operational incidents, and data incidents are only grouped with data incidents. Grouped operational incidents include freshness anomalies, volume not updating, dbt and airflow failures. Grouped data incidents include volume changes and field anomalies.

Summary

The Summary menu is the landing page of Incident IQ and contains a quick high-level view of the incident details.

Incident Summary

Incident Summary

Here, the following information is provided:

  • Tables - a list of tables involved in the incident. Clicking on a table in this list will take you to the Catalog page for that table.
  • Notification Channels - a list of Notification Channels which were alerted to this incident. Clicking on a Slack Notification Channel will take you to that channel in your Slack instance.
  • Linked Issues - Jira or ServiceNow tickets created for the incident.
  • Downstream Reports - list of potentially impacted downstream BI reports and their users.
Potentially Impacted Reports

Potentially Impacted Reports

Incident Management

From Incident IQ, there are several features available to aid in incident management.

Note that each of these features is also accessible from the incident feed.

Owner & Severity

Assign an owner to make clear who is responsible for investigating the incident, and a severity to classify the incident.

Assigning Severity

Assigning Severity

Status

Update the status of an incident to track progress of the investigation. Updating status is also helpful for analytics and reporting and can aid in defining/meeting SLA's.

Status Update

Status Update

Comments & Activity Log

Add comments to the incident to track notes and findings. Any severity, status, or owner updates are logged here as well.

Add comments on the incident

Add comments on the incident

Incident feedback

Provide feedback about the incident to help Monte Carlo better serve you.

Share feedback on incidents

Share feedback on incidents

Clicking on either of the feedback options helps in two different ways -

  • Feedback is funneled directly to the Monte Carlo Product & Engineering teams who use it to improve the product.
  • If you choose to do so, the machine learning models working in your environment will be tuned accordingly. For example, clicking the positive feedback icon presents the following menu:

There are other ways you can tune the ML models in your environment. For more information, please refer to the following sections:

Incident Chart

Under each event from the incident timeline is a graph that provides visual insight into why the incident was raised.

Graph on Summary menu of Incident IQ

Chart on an anomaly where the table is not changing in size

In this example, the incident originated from an automated (out of the box) monitor which tracks patterns in Volume change -

  • The blue line on the graph represents the change in Volume over the previous week.
  • The section highlighted in red indicates a deviation from the normal pattern.

In this case, the deviation is a halt in Volume changes. In other words, the table has not changed in size as expected based on historical trends.


Frequently Asked Questions

We see flags on some of our incidents that reference correlation or query insights, what is this?

Correlation and query insights represent automated findings that Monte Carlo produced to facilitate the discovery of the root cause of a particular data incident. Click here to learn more about this great feature.

The dbt menu item was not mentioned here, what is that?

If you have a dbt integration set up, you can access information about the dbt model related to the affected table within the dbt menu. Click here to learn more about our dbt integration.