Performance
Surface and troubleshoot slow performance of queries and ETL jobs.
How to use the dashboard to solve performance issues
The dashboard can help users proactively identify performance issues, or investigate a known performance issue. It aims to help answer questions such as
- "One of my Snowflake warehouses' cost increased this past week, I want to know what caused it."
- "I want to identify any slow dbt models and optimize the ones that are bottlenecking my dbt job."
- "I want to understand the experience of data product consumers and identify if any dashboard queries need to be optimized to improve the end consumers' experience."
- "I want to know if any Airflow DAG is running slow, running slower overtime, or frequently failing. And I want to know why."
Example 1: Investigate high-cost queries
To illustrate how to leverage the dashboard to answer some of these questions, let's take an example, "I noticed higher costs for my Snowflake warehouse TRANSFORMING_2 in mid-June and I wonder what caused it".
First, select the one-week time period covering the cost increase, filter for queries in that warehouse TRANSFORMING_2, set the metrics toggle on "total" (as hourly aggregation metric), and then look for the slowest queries. You notice the runtime spikes across numerous queries on 6/15.
In order to understand why that happened, you can click into the slowest query to see its details. In the Details view you can see via the synced volume chart that no obvious volume increase happened around that date and the table volume trend looks relatively smooth, so it's likely not caused by rows added to the table.
However, notice from the runtime chart breakdown that the query queue time took up most of the total runtime on that day, so the slowness is coming from queries on that warehouse being queued up.
You can also find the peak run of the query and dive into the metadata details for it, to check for detailed queue time info, cache scanned, partitions info etc for other insights.
You could also drill into a few other slowest queries in this warehouse to see if they are all caused by queue time, then figure out if it might require upsizing the cluster or moving some of the queries to other clusters. This process helps drill into the details of a performance issue to find causes and then optimize.
Example 2: Investigate query performance in context of dbt job
You might have noticed a query for a dbt model has slowed down recently, and you want to know if it's bottlenecking the dbt job that the model is part of. You can click into the query details drawer to identify the dbt job and the dbt job run a query is part of, and click into the dbt run to investigate.

That will take you to a gantt view for the dbt job run, where you can easily see what are the upstream and downstream dependencies for the slow model. Then you can determine if any models upstream are blocking the model of interest, or if any downstream models are bottlenecked by it, so that you can decide whether you need to optimize the runtime of the model.

The gantt chart on asset page is available as long as you have dbt integration set up. The ability to link queries to dbt job runs requires query tagging per instructions here for dbt core, and here for dbt cloud.
Example 3: Understand end consumer experience
If I want to know if my Looker users are experiencing long wait time when they refresh dashboards, I can leverage the Read Queries (SELECT queries) tab to identify slow queries under the "Looker" User filter. In the example below, I see a spike on Sept 13th where the 99th percentile query ran for over 500 seconds and 95th percentile ran over 300 seconds, which means users were waiting for 5min+ for the looker dashboard to refresh. The table below lists those slowest queries and I can expand each row to investigate and optimize as needed.

Example 4: Find slowing Airflow DAGs and why
The job performance page (currently in beta) helps you get a birds eye view on the health of your ETL jobs, find troublesome jobs and investigate them.
If you want to find the slowest running Airflow DAGs, sort the table by column "average run time" to see the longest running DAGs. You can then click on a DAG to open up a drawer that lets you investigate why the DAG is running long: the runs tab shows you a gantt view for each run to identify if the DAG is slow because one or more tasks are slow.
If you want to find Airflow DAGs running slower overtime, sort the table by "Trend (7d)", which compares average run time from the past 7 days to the 7 day prior (completed runs only), to see the fastest deteriorating DAGs week over week. You can then click on a DAG to investigate: the tasks tab can tell you if the DAG is deteriorating because all tasks are deteriorating or just one.
Or If you want to find frequently failing jobs, sort the table by "failure rate" to see the most common failed jobs.
This view currently includes Airflow DAGs, dbt jobs, Databricks Workflows, ADF pipelines.
Access
The Performance Dashboard is accessible to users View Query Performance action. These are users with the role of Account Owner, Domain Manager, Editor, Responder and Viewer. Learn more about these roles under Managed Roles and Groups.
Current Limitations
- The Writes, Reads performance dashboards support Snowflake, Bigquery, Redshift, Databricks, and Teradata integrations currently. Other Integrations and data lakes are not supported
- The queries are not filtered by domains.
- Access to the Performance Dashboard can be restricted to only those with "All data" for Allowed Domains using the API request togglePerformanceDashboardForDomainRestrictedUsers
- Only the most recent 4 weeks of queries are available currently
- The Job Performance dashboard supports Airflow DAGs, dbt jobs, Databricks Workflows, Azure Data Factory Pipelines.
Updated 7 days ago