undefined

Despite the short month, the team continued to ship a bunch of great features across the app!

What's new

  • SQL Rule Circuit Breakers:To help prevent data issues, you can now use Monte Carlo circuit breakers to stop pipelines when a SQL Rule check fails. We developed multiple mechanisms to integrate seamlessly with your pipelines including an Airflow provider, Python operator, and direct API support. See docs here
  • Expanded Reproducing and Sample query support:To speed up incident investigations we now show queries to reproduce an anomaly and to sample anomalous data in the Incident cards and in Incident IQ
  • Importance score & Key Asset Notification filters:To improve the relevance of notifications you can now filter notifications by Monte Carlo's computed importance score for tables
  • Automatic thresholding for SQL Rules: Instead of explicitly defining a threshold for each SQL Rule, we now offer a ML-based threshold detector that will notify you of abnormal activity in number of rows returned by the SQL Rule

Improvements and fixes

  • Weekly seasonality support for volume detectors: When a FH monitor is applied to a table, we also deploy volume monitors that look for more detailed volume changes. Those monitors now incorporate seasonality to provide more reliable detections.
  • Add support for editing integrations: Users can more easily edit Tableau and Looker integration settings from the UI
  • Fix new Impact Radius module rendering issues: Fixed a bug that was causing issues rendering the Impact Radius module for some customers

What's next

  • Airflow error logs: Integrate Airflow task error logs into Incident IQ to help users investigate pipeline issues for incidents
undefined

We just launched a series of new features in Incident IQ to help customers quickly evaluate the priorities of incidents and investigate them!

What's new

  • Impact radius: we now display an impact radius diagram for each incident in the Incident IQ page. The diagram aggregates stats on users, queries, and reports affected by the incident to help users determine the priorities of incidents (explainer video here. Users can see any dbt error, warning, or failed tests for tables involved in incidents. Users can also check dbt model and test run logs from the UI (See dbt integration setup instructions here and explainer video here. 
  • Sampling queries: queries to sample the anomalous rows are provided in incident IQ page as well as on incident cards for field health anomalies with metrics including % unique, % null, % negative, %zero. Sampled records are provided in Incident IQ for freshness anomalies and volume anomalies (explainer video for the feature here. 
  • Reproducing queries: queries to reproduce anomalies are now provided in incident IQ for 2 incident types: field health anomalies and dimension tracking anomalies. Users can run the provided queries in warehouse to reproduce the anomalous m etrics that Monte Carlo caught (explainer video for the feature here.

Improvements and fixes

  • IAM policy automation: IAM policies can now be auto-generated via our CLI with the relevant values and permissions derived. Athena and Glue are supported by the policy generator. See doc here. 
  • IAM role creation with CloudFormation template: released a command that derives and auto-builds a CloudFormation template to create an IAM role compatible with MC Data Collector. See docs here.
  • Rule notes Slack tip: added a tip to SQL rule notes creation in monitors creation view on how to tag Slack users in rule notes. 
  • Lineage in incident IQ: added table lineage view in incident IQ page so users can quickly identify the immediate upstream and downstream dependencies for each table involved in an incident.
  • Runbook removal: removed the runbook tab in incident IQ.
  • Rule notes in root cause analysis: SQL rule notes are added to the Root Cause Analysis module in incident IQ for SQL rule breaches so users can easily reference the rule notes for context for incident investigation.
  • High correlation insights in root cause analysis: any high correlation between volume anomalies and field dimensions are included as insights in Root Cause Analysis module in incident IQ for volume anomalies (see explainer video here.
  • Query log in root cause analysis: query logs are added to Root Cause Analysis module in incident IQ so users can check for any query changes that can provide clues for incident investigations.

What's next

  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
  • Airflow error logs: integrate Airflow task error logs into Incident IQ to help users investigate pipeline issues for incidents.
undefined

What's new

  • DBT metadata in Catalog: the catalog page now imports dbt information including models, tags, descriptions, so users can manage all metadata centrally in Monte Carlo. See dbt integration setup instructions here.
  • Rule notes: customers are now able to create notes for each SQL rule monitor and receive such notes in rule breach notifications, so teams can refer to such notes for context. 
  • Heavy Queries insight report: released a new insight report that shows the heaviest queries from each warehouse/user every week, so customers can preemptively stop data issues and reduce warehouse load.
  • Network recommender for onboarding: added CLI capability to help with onboarding by analyzing resource and collector configurations and making step-by-step recommendations on how to connect. See docs here.

Improvements and fixes

  • Deteriorating Queries insight: added visualized trends in query execution time as html file in insight report "Deteriorating Queries".
  • Monitor as Code bug fix: shipped a bug fix so that when new configuration is applied via monitor as code any monitors previously defined via code are no longer displayed as "UPDATED" as long as no attributes are changed for those monitors.
  • Key assets in Slack: key assets in slack notifications will have a star emoji next to them so they can more easily spotted to help users prioritize incidents.
  • Incident IQ CTA: redesigned the Incident IQ button on incident card to make that CTA more obvious and clear to users.

What's next

  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Root Cause Analysis module: summary of incident investigation pointers in Incident IQ to help users check and eliminate possible root causes for given data incidents.
  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • "PyCarlo" - Monte Carlo's Python SDK: we just released an alpha Python SDK! All available queries and mutations via the APIs today will be supported via the SDK. This will be the foundation for customers to easily access Monte Carlo capabilities programmatically going forward. 
  • Schema changes daily digest: Users are now able to receive notifications on schema changes in the form of a daily digest via email. Slack support will follow shortly. This can be configured in notifications setting under "Delivery Cadence".

Improvements and fixes

  • Pipelines chart improvement: The pipelines chart can now load nodes with up to 100,000 upstream or downstream dependencies, significantly expanding on the previous limit of 1,000 nodes.
  • Monitor status: on the monitors page, added a new column on Monitor Status, which shows if each monitor is in error, training, etc. Users can filter for monitors by status and can easily see if any monitors are misconfigured. 
  • Network connectivity test: customers can now specifically test data collector's network connectivity issues separately from other connection problems (i.e. timeouts / permissions). Network testing is available both in onboarding wizard and under integrations settings.  
  • Email group UX fix: email notifications created with multiple recipients are now treated as a single notification, so that users no longer have to edit notifications setting for each individual email separately.

What's next

  • Rule notes: users will soon be able to add notes to each SQL rule monitor, so that when rule breaches happen teams can reference the notes, i.e. for troubleshooting.
  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • Anomalous rows in SQL rule breach: when SQL rule breaches happen, Monte Carlo now displays the rows in breach to help customers investigate the incidents. Such data is stored in the data collector only and not in Monte Carlo's cloud services.
  • RCA Insight in BigQuery: This feature identifies high correlations between volume anomalies and field dimensions, which are shown as insights to provide clues for incident investigations. The availability of this feature is now expanded from Snowflake and Redshift to also BigQuery customers.

Improvements and fixes

  • User invite: users can now invite other users who have existing Monte Carlo accounts to join their organization accounts.
  • Incident details rendering fix: domain filters are now removed from incident details page, so that users will not run into unavailable incident details page when a different domains is selected.
  • WHERE filter edit bug fix: the WHERE filter edit button in the monitor details view now links to the correct editing screen.

What's next

  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Circuit breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.

undefined insight reports via Snowflake data share. This capability is also supported in multiple regions across cloud providers.

  • SQL rule monitor warehouse selection fix: in the edit view of a SQL rule monitor, the corresponding warehouse now remains selected.
  • Freshness thresholds display fix: thresholds for freshness issues are now only shown in catalog view, not in incident chart.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • Incident History Insight Report: includes incidents detected by Monte Carlo over the last 6 months to help customers report incident status, track response time, and spot trends.
  • BI Dashboard Analytics Insight Report: provides dashboard importance score based on number of report views and access days.
  • UI Access during Onboarding: new customers that finish technical setup in onboarding can now access the MC UI right away, initially to monitors page and catalog page

Improvements and fixes

  • Incident feed filter bug: fixed a bug on dataset filters for incidents from SQL rule breaches. SQL rule breaches were previously not listed when a dataset was selected and now they are correctly filtered.
  • User login case insensitivity: enabled input case insensitivity for non-SSO user logins.
  • Volume SLI setup summary text error: corrected summary texts for volume SLI absolute monitor setup page. The text previously erroneously described thresholds on volume delta; now it is corrected to define total volume thresholds.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

Ho ho ho...two big gifts from Santa just in time for Xmas: custom monitor bulk recommendations and interactive features via MS teams channels.

What's new

  • Custom monitors bulk creation: users can now bulk create field health and dimension tracking monitors in the UI from a list of recommendations.
  • Status updates & snooze via MS Teams channels: users can now update incident status and snooze incidents from MS Team channels.

Improvements and fixes

  • Lineage node deletion API: released an API mutation to delete lineage nodes and their connected edges.
  • Distribution anomaly description fix: more accurately describe what the distribution percentages typically are as compared to anomaly percentages for dimension tracking monitors.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.

Spark lineage available in beta, UI improvements across notifications and onboarding flows.

What's new

  • Spark Lineage Beta: beta version of Spark lineage is now ready for testing! Please reach out to your point of contact at Monte Carlo to get access.
  • Delete integrations via UI: users can now remove connections from the integrations settings page.

Improvements and fixes

  • User invite during onboarding: user invite feature can now be accessed from the top navigation bar, so customers can invite others during the onboarding process as well.
  • Slack channel refresh: in notifications setting, added the ability to refresh the list of Slack channels so customers with Slack integration can access any newly updated Slack channels from Monte Carlo.
  • Dataset selector in notifications settings: in notifications setting, updated dataset selector setting so that if rules is selected as as the only type of incident then dataset selector is disabled, but if other types of incidents are selected along with rules then data selector is enabled.
  • Affected reports loading bug: fixed a bug to show a loading indicator before affected reports are loaded in catalog, instead of showing "no affected reports" previously.

What's next

  • MS Teams integration - status updates: in addition to alert routing, we will soon let users update incident status and snooze incidents from MS Team channels.
  • Custom Monitors Bulk Creation: users will soon be able to bulk create field health and dimension tracking monitors in the UI from a list of recommendations.
undefined

Improvements and fixes

  • Monitor as Code source control: disabled edit and removal of monitors from UI if the monitors are created via code.
  • Pipeline download improvement: moved the download button for Pipelines to the header so users can download Pipeline files even if the graph could not be rendered.
  • Data lake volume chart display fix: fixed incident chart display in the scenario where anomaly lasts for more than 3 days.
  • Notifications dataset selection bug: fixed a bug in notification setting so that the custom dataset/tag selection from previous session would not persist to the next session.

What's next

  • MS Teams integration - status updates: in addition to alert routing, we will soon let users update incident status and snooze incidents from MS Team channels.
  • Spark lineage: For our Spark customers, we're working on surfacing the lineage between tables/fields from Spark jobs.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.