undefined

What's new

  • Dashboard tab redesign with Incident graph: Completely redesigned the Dashboard tab to help analyze incident trends and evaluate MC coverage
  • Editable Key Assets: Manage which tables are considered Key Assets, view table authors and importance score
  • Alation integration: Enable the Alation integration to automatically send Monte Carlo Incidents into Alation to display on the table's resource page
  • Incident feed tag filters: Added support for filtering the Incident feed with tags

Improvements and fixes

  • Differentiate between SQL Rules and SLI Incidents: Incident cards now differentiate between SQL Rule breaches and Volume or Freshness SLI breaches
  • Monitor link from Incident IQ: View the custom monitor that generated an incident event right from Incident IQ
  • Add table counts to Schema Muting UI: When muting schemas, we now show the count of tables within the selected schema that will be muted
  • Looker lineage improvements: Improved support for variable replacement in LookML files and user attributes in the lineage parser
  • Field-level cleanup suggestions: A new downloadable Insight report lists fields that have no downstream activity and thus are good candidates for deletion
  • Added timezone to all timestamps: All timestamps across the app now specify which timezone is being used

What's next

  • Domains-based Access Controls: Define user groups to assign read only or write access per domain
  • Airflow integration: View airflow task logs from within Monte Carlo to support investigations and orchestration context
undefined

You can now set up integration with dbt Cloud in 10min!

What's new

  • dbt Cloud Integration: Released a new dbt Cloud integration which allows customers to use the data collector to automate the integration process. Customers simply need to use the Monte Carlo CLI to perform a one-time installation process, and the data collector will automatically sync data from the dbt Cloud accounts to Monte Carlo. This will also ensure that the data is comprehensive and current from dbt. Docs can be found here.
  • Compare Queries: Users can now compare two queries side by side to analyze the differences. You can access this feature by going to the Query Log section in either Catalog or Incident IQ, toggle on "compare multiple queries", click on the query graph for each of the queries of interest, and hit "Compare Queries".
  • Filter by domains: Customers who have set up domains can now use those as dataset filters to route notifications.

Improvements and fixes

  • Bulk muting: On the "Mute Datasets" and "Mute Tables" tabs under notifications settings, users can now search by regex and mute or unmute all visible search results in bulk. 
  • Looker report links: The "Metadata" section in Catalog page for Looker dashboards, Looks, and Explores now include links to those reports in Looker. 
  • Monitor label improvement: Updated the color labelling for each monitor type on the monitors page as well as in Incident IQ page under "Active monitors".
  • DW Users improvement: The details tables for DW Users metric on incident Impact Radius are now organized by data warehouse usernames. 
  • Insights on docs site: Added a docs page for each Insight report from Dashboard. The docs include definition for every field included in each Insight report.

What's next

  • Edit Key Asset: Soon users will be able to overwrite the automatically designated Key Assets to customize which tables are treated as Key Assets.
  • Airflow error logs: Integrate Airflow task error logs into Incident IQ to help users investigate pipeline issues for incidents
undefined

Despite the short month, the team continued to ship a bunch of great features across the app!

What's new

  • SQL Rule Circuit Breakers:To help prevent data issues, you can now use Monte Carlo circuit breakers to stop pipelines when a SQL Rule check fails. We developed multiple mechanisms to integrate seamlessly with your pipelines including an Airflow provider, Python operator, and direct API support. See docs here
  • Expanded Reproducing and Sample query support:To speed up incident investigations we now show queries to reproduce an anomaly and to sample anomalous data in the Incident cards and in Incident IQ
  • Importance score & Key Asset Notification filters:To improve the relevance of notifications you can now filter notifications by Monte Carlo's computed importance score for tables
  • Automatic thresholding for SQL Rules: Instead of explicitly defining a threshold for each SQL Rule, we now offer a ML-based threshold detector that will notify you of abnormal activity in number of rows returned by the SQL Rule

Improvements and fixes

  • Weekly seasonality support for volume detectors: When a FH monitor is applied to a table, we also deploy volume monitors that look for more detailed volume changes. Those monitors now incorporate seasonality to provide more reliable detections.
  • Add support for editing integrations: Users can more easily edit Tableau and Looker integration settings from the UI
  • Fix new Impact Radius module rendering issues: Fixed a bug that was causing issues rendering the Impact Radius module for some customers

What's next

  • Airflow error logs: Integrate Airflow task error logs into Incident IQ to help users investigate pipeline issues for incidents
undefined

We just launched a series of new features in Incident IQ to help customers quickly evaluate the priorities of incidents and investigate them!

What's new

  • Impact radius: we now display an impact radius diagram for each incident in the Incident IQ page. The diagram aggregates stats on users, queries, and reports affected by the incident to help users determine the priorities of incidents (explainer video here. Users can see any dbt error, warning, or failed tests for tables involved in incidents. Users can also check dbt model and test run logs from the UI (See dbt integration setup instructions here and explainer video here. 
  • Sampling queries: queries to sample the anomalous rows are provided in incident IQ page as well as on incident cards for field health anomalies with metrics including % unique, % null, % negative, %zero. Sampled records are provided in Incident IQ for freshness anomalies and volume anomalies (explainer video for the feature here. 
  • Reproducing queries: queries to reproduce anomalies are now provided in incident IQ for 2 incident types: field health anomalies and dimension tracking anomalies. Users can run the provided queries in warehouse to reproduce the anomalous m etrics that Monte Carlo caught (explainer video for the feature here.

Improvements and fixes

  • IAM policy automation: IAM policies can now be auto-generated via our CLI with the relevant values and permissions derived. Athena and Glue are supported by the policy generator. See doc here. 
  • IAM role creation with CloudFormation template: released a command that derives and auto-builds a CloudFormation template to create an IAM role compatible with MC Data Collector. See docs here.
  • Rule notes Slack tip: added a tip to SQL rule notes creation in monitors creation view on how to tag Slack users in rule notes. 
  • Lineage in incident IQ: added table lineage view in incident IQ page so users can quickly identify the immediate upstream and downstream dependencies for each table involved in an incident.
  • Runbook removal: removed the runbook tab in incident IQ.
  • Rule notes in root cause analysis: SQL rule notes are added to the Root Cause Analysis module in incident IQ for SQL rule breaches so users can easily reference the rule notes for context for incident investigation.
  • High correlation insights in root cause analysis: any high correlation between volume anomalies and field dimensions are included as insights in Root Cause Analysis module in incident IQ for volume anomalies (see explainer video here.
  • Query log in root cause analysis: query logs are added to Root Cause Analysis module in incident IQ so users can check for any query changes that can provide clues for incident investigations.

What's next

  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
  • Airflow error logs: integrate Airflow task error logs into Incident IQ to help users investigate pipeline issues for incidents.
undefined

What's new

  • DBT metadata in Catalog: the catalog page now imports dbt information including models, tags, descriptions, so users can manage all metadata centrally in Monte Carlo. See dbt integration setup instructions here.
  • Rule notes: customers are now able to create notes for each SQL rule monitor and receive such notes in rule breach notifications, so teams can refer to such notes for context. 
  • Heavy Queries insight report: released a new insight report that shows the heaviest queries from each warehouse/user every week, so customers can preemptively stop data issues and reduce warehouse load.
  • Network recommender for onboarding: added CLI capability to help with onboarding by analyzing resource and collector configurations and making step-by-step recommendations on how to connect. See docs here.

Improvements and fixes

  • Deteriorating Queries insight: added visualized trends in query execution time as html file in insight report "Deteriorating Queries".
  • Monitor as Code bug fix: shipped a bug fix so that when new configuration is applied via monitor as code any monitors previously defined via code are no longer displayed as "UPDATED" as long as no attributes are changed for those monitors.
  • Key assets in Slack: key assets in slack notifications will have a star emoji next to them so they can more easily spotted to help users prioritize incidents.
  • Incident IQ CTA: redesigned the Incident IQ button on incident card to make that CTA more obvious and clear to users.

What's next

  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Root Cause Analysis module: summary of incident investigation pointers in Incident IQ to help users check and eliminate possible root causes for given data incidents.
  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • "PyCarlo" - Monte Carlo's Python SDK: we just released an alpha Python SDK! All available queries and mutations via the APIs today will be supported via the SDK. This will be the foundation for customers to easily access Monte Carlo capabilities programmatically going forward. 
  • Schema changes daily digest: Users are now able to receive notifications on schema changes in the form of a daily digest via email. Slack support will follow shortly. This can be configured in notifications setting under "Delivery Cadence".

Improvements and fixes

  • Pipelines chart improvement: The pipelines chart can now load nodes with up to 100,000 upstream or downstream dependencies, significantly expanding on the previous limit of 1,000 nodes.
  • Monitor status: on the monitors page, added a new column on Monitor Status, which shows if each monitor is in error, training, etc. Users can filter for monitors by status and can easily see if any monitors are misconfigured. 
  • Network connectivity test: customers can now specifically test data collector's network connectivity issues separately from other connection problems (i.e. timeouts / permissions). Network testing is available both in onboarding wizard and under integrations settings.  
  • Email group UX fix: email notifications created with multiple recipients are now treated as a single notification, so that users no longer have to edit notifications setting for each individual email separately.

What's next

  • Rule notes: users will soon be able to add notes to each SQL rule monitor, so that when rule breaches happen teams can reference the notes, i.e. for troubleshooting.
  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Circuit breakers: trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • Anomalous rows in SQL rule breach: when SQL rule breaches happen, Monte Carlo now displays the rows in breach to help customers investigate the incidents. Such data is stored in the data collector only and not in Monte Carlo's cloud services.
  • RCA Insight in BigQuery: This feature identifies high correlations between volume anomalies and field dimensions, which are shown as insights to provide clues for incident investigations. The availability of this feature is now expanded from Snowflake and Redshift to also BigQuery customers.

Improvements and fixes

  • User invite: users can now invite other users who have existing Monte Carlo accounts to join their organization accounts.
  • Incident details rendering fix: domain filters are now removed from incident details page, so that users will not run into unavailable incident details page when a different domains is selected.
  • WHERE filter edit bug fix: the WHERE filter edit button in the monitor details view now links to the correct editing screen.

What's next

  • Impact radius: help customers assess the impact of each incident by aggregating metrics on relevant users, queries, and downstream dashboards.
  • Circuit breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.

undefined insight reports via Snowflake data share. This capability is also supported in multiple regions across cloud providers.

  • SQL rule monitor warehouse selection fix: in the edit view of a SQL rule monitor, the corresponding warehouse now remains selected.
  • Freshness thresholds display fix: thresholds for freshness issues are now only shown in catalog view, not in incident chart.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

What's new

  • Incident History Insight Report: includes incidents detected by Monte Carlo over the last 6 months to help customers report incident status, track response time, and spot trends.
  • BI Dashboard Analytics Insight Report: provides dashboard importance score based on number of report views and access days.
  • UI Access during Onboarding: new customers that finish technical setup in onboarding can now access the MC UI right away, initially to monitors page and catalog page

Improvements and fixes

  • Incident feed filter bug: fixed a bug on dataset filters for incidents from SQL rule breaches. SQL rule breaches were previously not listed when a dataset was selected and now they are correctly filtered.
  • User login case insensitivity: enabled input case insensitivity for non-SSO user logins.
  • Volume SLI setup summary text error: corrected summary texts for volume SLI absolute monitor setup page. The text previously erroneously described thresholds on volume delta; now it is corrected to define total volume thresholds.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.
undefined

Ho ho ho...two big gifts from Santa just in time for Xmas: custom monitor bulk recommendations and interactive features via MS teams channels.

What's new

  • Custom monitors bulk creation: users can now bulk create field health and dimension tracking monitors in the UI from a list of recommendations.
  • Status updates & snooze via MS Teams channels: users can now update incident status and snooze incidents from MS Team channels.

Improvements and fixes

  • Lineage node deletion API: released an API mutation to delete lineage nodes and their connected edges.
  • Distribution anomaly description fix: more accurately describe what the distribution percentages typically are as compared to anomaly percentages for dimension tracking monitors.

What's next

  • SQL Rule monitor breached rows: users will soon be able to see which rows breached SQL rule monitor condition in addition to just the number of rows.
  • Circuit Breakers: Trigger Monte Carlo data quality checks and validate incidents with code to stop problematic jobs before they pass data downstream.