Query Performance Monitor
Monitor query runtime performance and alert on regressions.
Overview
Catch query performance regressions before they blow your SLAs or warehouse budget. The query performance monitor tracks SQL execution times and alerts on runtime changes relative to a baseline, individual slow queries, or consecutive threshold breaches.
Reference scopeThis page covers MaC YAML configuration. For how query performance monitoring works, see Query Performance.
Query performance monitors have no schedule field β they run on Monte Carlo's internal collection cycle. MaC key: query_performance. This monitor does not target a specific table β it uses a selection array to filter which queries are monitored by database, schema, or table.
A query performance monitor requires exactly threealert_conditionsβ one each forruntime,runtime_change, andconsecutive_breach_count. You tune (or effectively disable, by setting a high threshold) each metric, but all three must be present orapplyfails withThere must be exactly 3 comparisons.
Quick Start
montecarlo:
query_performance:
- name: core_model_runtime
description: Alert when core dbt model queries slow down significantly
warehouse: prod-snowflake
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- CORE
alert_conditions:
- metric: runtime
threshold: 300000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 3
domains:
- my-domainConfiguration
array of objects Β· required
Each entry defines a performance metric and the threshold that triggers an alert. You must supply exactly three entries β one each for runtime, runtime_change, and consecutive_breach_count. This schema uses additionalProperties: false β only the fields listed below are accepted.
alert_conditions:
- metric: runtime
threshold: 300000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 3Properties
metric β performance metric to evaluate
enum Β· required
Accepted values: runtime Β· runtime_change Β· consecutive_breach_count (lowercase)
Each of the three values must appear exactly once across the alert_conditions array. See Available Metrics for threshold meanings and required companion fields.
metric: runtime_changethreshold β threshold value
number Β· required
Meaning depends on the metric: multiplier for runtime_change, milliseconds for runtime, count for consecutive_breach_count. Must be greater than 0.
threshold: 2.0baseline_agg_function β baseline aggregation function
enum Β· required, for runtime_change
Accepted values: MEDIAN Β· AVG (uppercase)
Aggregation function for the historical baseline.
baseline_agg_function: MEDIANthreshold_lookback_minutes β lookback window in minutes
integer Β· one of threshold_lookback_minutes or threshold_lookback_data_points is required for runtime_change
Historical lookback window in minutes for computing the baseline. Mutually exclusive with threshold_lookback_data_points β set exactly one.
threshold_lookback_minutes: 10080threshold_lookback_data_points β lookback window in data points
integer Β· optional
Number of historical data points for the baseline. Mutually exclusive with threshold_lookback_minutes.
threshold_lookback_data_points: 100comparison_type β comparison type
string Β· optional
Use QUERY_PERFORMANCE if set.
comparison_type: QUERY_PERFORMANCEoperator β comparison operator
enum Β· optional
Accepted values: GT
Only GT is supported.
operator: GTarray of objects Β· optional
Filters which queries are included in performance monitoring. Each entry narrows the scope by a specific dimension. Multiple entries are combined with AND logic.
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- COREProperties
field_name β dimension to filter on
string Β· required
Common values: database, schema, table.
field_name: databasevalues β allowed values for the dimension
array of strings Β· optional
Queries matching any value are included.
values:
- ANALYTICS
- PRODUCTIONoperator β filter operator
enum Β· optional Β· default: include
Accepted values: include Β· exclude (lowercase)
Whether matching queries are included in or excluded from the monitored scope.
operator: includearray of strings (exactly one entry) Β· required, on all accounts created after January 2025
Set default_domain in montecarlo.yml to avoid repeating it on every monitor.
domains:
- my-domainstring Β· optional (yes if multiple warehouses)
Warehouse UUID or name. Overrides default_resource from montecarlo.yml.
warehouse: prod-snowflakestring Β· required
Required for monitors created after Jan 29, 2024 (existing monitors keep working). Changing the name creates a new monitor and deletes the old one β incident history does not transfer.
name: core_model_runtime_regressionstring Β· required
Max 512 characters.
description: Detect runtime regressions in core analytics modelsstring Β· optional
Visible in the Monte Carlo UI. Not included in notifications.
notes: Owned by the analytics team. Reviewed quarterly.array of strings Β· optional
Audience names linking this monitor to channels defined in Notifications as Code.
audiences:
- data-engineering-alerts
- etl-opsarray of strings Β· optional
Separate audiences for run-failure notifications. Falls back to audiences if not set.
failure_audiences:
- oncall-alertsboolean Β· optional
notify_run_failure: trueenum Β· optional
Accepted values: P1 Β· P2 Β· P3 Β· P4 Β· P5
priority: P2array of objects Β· optional
| Property | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tag key |
value | string | no | Tag value |
tags:
- name: team
value: analytics
- name: environment
value: productionenum Β· optional
Accepted values: ACCURACY Β· COMPLETENESS Β· CONSISTENCY Β· TIMELINESS Β· UNIQUENESS Β· VALIDITY
data_quality_dimension: TIMELINESSboolean Β· optional Β· default: false
Creates the monitor in a paused state. Omitting this on a later update resets to false (active) due to PUT semantics β always include it if you want the monitor to stay in draft.
is_draft: truestring Β· optional
Include the UUID of an existing monitor to update it instead of creating a new one.
uuid: 0dae7702-0950-45c7-909c-7e183bddca19Deprecated fields
| Field | Use instead |
|---|---|
resource | warehouse |
domain | domains |
domain_uuids | domains |
labels | audiences |
notify_rule_run_failure | notify_run_failure |
Available Metrics
All three metrics must be present in every monitor (see the note above). Tune each threshold to the scope you care about.
| Metric | Description | Required Fields | Threshold Meaning |
|---|---|---|---|
runtime_change | Detects queries whose runtime has increased relative to a historical baseline. | baseline_agg_function, plus either threshold_lookback_minutes or threshold_lookback_data_points | Multiplier. A threshold of 2.0 means "alert when runtime exceeds 2x the baseline." |
runtime | Alerts when any query execution exceeds a fixed duration. | None beyond threshold | Milliseconds. A threshold of 300000 means "alert on queries running longer than 5 minutes." |
consecutive_breach_count | Alerts when a query breaches its threshold for N consecutive runs. | None beyond threshold | Count. A threshold of 3 means "alert after 3 consecutive breaches." |
Examples
Runtime change detection with median baseline
Tuned to flag queries in the CORE schema that take more than 2x their median runtime over the past week. All three metrics are configured (required); the runtime and consecutive_breach_count thresholds are set permissively so runtime_change drives the alerting.
montecarlo:
query_performance:
- name: core_model_runtime_regression
description: Detect runtime regressions in core analytics models
warehouse: prod-snowflake
selection:
- field_name: database
values:
- ANALYTICS
- field_name: schema
values:
- CORE
alert_conditions:
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: runtime
threshold: 600000
- metric: consecutive_breach_count
threshold: 5
audiences:
- data-engineering-alerts
priority: P2
data_quality_dimension: TIMELINESS
tags:
- name: team
value: analytics
domains:
- my-domainAbsolute runtime threshold
Tuned to flag any query in the billing schema that takes longer than 10 minutes (600000 ms). The runtime_change and consecutive_breach_count conditions are still required, so they are included with permissive thresholds.
montecarlo:
query_performance:
- name: billing_slow_queries
description: Alert on billing queries exceeding 10 minutes
warehouse: prod-snowflake
selection:
- field_name: database
values:
- PRODUCTION
- field_name: schema
values:
- BILLING
alert_conditions:
- metric: runtime
threshold: 600000
- metric: runtime_change
threshold: 3.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
- metric: consecutive_breach_count
threshold: 5
priority: P3
domains:
- my-domainConsecutive breach detection
Tuned to alert after a query breaches its runtime threshold 3 times in a row, reducing noise from one-off spikes. As always, all three metrics are present.
montecarlo:
query_performance:
- name: etl_consecutive_slowdowns
description: Alert after 3 consecutive slow runs in ETL pipeline queries
warehouse: prod-snowflake
selection:
- field_name: database
values:
- RAW
alert_conditions:
- metric: consecutive_breach_count
threshold: 3
- metric: runtime
threshold: 600000
- metric: runtime_change
threshold: 2.0
baseline_agg_function: MEDIAN
threshold_lookback_minutes: 10080
audiences:
- etl-ops
priority: P2
domains:
- my-domainTroubleshooting
Scheduling and scope
- Adding a
schedulefield. Query performance monitors run on Monte Carlo's internal collection cycle. There is no schedule field. Adding one causes a validation error. - Using
tableinstead ofselection. Query performance monitors do not have atablefield. Useselectionwithfield_name: tableto target specific tables.
Alert conditions and metrics
- Supplying fewer than three
alert_conditions. A query performance monitor must define exactly three conditions β one each forruntime,runtime_change, andconsecutive_breach_count. Any other count fails withThere must be exactly 3 comparisons. Tune thresholds permissively for metrics you don't want to drive alerts. - Using uppercase metric names or lowercase
baseline_agg_function. Metric values are lowercase (runtime,runtime_change,consecutive_breach_count);baseline_agg_functionvalues are uppercase (MEDIAN,AVG). Getting the case wrong fails validation. - Omitting
baseline_agg_functionforruntime_change. This metric requires a baseline aggregation function (MEDIANorAVG). Omitting it causes a validation error. It also requires one ofthreshold_lookback_minutesorthreshold_lookback_data_points. - Setting both
threshold_lookback_minutesandthreshold_lookback_data_points. These are mutually exclusive. Use one or the other to define the baseline window. - Passing extra fields in
alert_conditions. The alert conditions schema for query performance monitors hasadditionalProperties: false. Including a field that belongs to other monitor types (likefields) causes a validation error. Note thatoperatoris a recognized field but only acceptsGThere β supplyingoperator: AUTOfails with an enum error, not anadditionalPropertiesone. - Setting
thresholdto 0 or negative. All query performance metrics require a threshold greater than 0.
Other fields and update semantics
- Wrong
tagsformat. Tags must be objects withnameand optionalvaluekeys. Writingtags: ["my-tag"]fails validation. - Forgetting PUT semantics on updates. When updating a monitor by including
uuid, every field you omit reverts to its default β it is not left unchanged. Always specify the complete desired configuration.
Updated about 2 hours ago
