Snowflake
What is Snowflake?
Snowflake is a cloud-based data warehouse tool that offers organizations a flexible and scalable storage system. It is great for hosting data that can then be queried and accessed by a business intelligence solution.
Snowflake is provided as a self-managed service that runs completely on cloud infrastructure. This means that all three layers of Snowflake’s architecture (storage, compute, and cloud services) are deployed and managed entirely on a selected cloud platform.
Why Connect Snowflake to Monte Carlo?
Connecting your Snowflake account to Monte Carlo, and giving us access to the data within is essential for Monte Carlo to function. Monte Carlo needs to be able to collect the metadata of your tables in snowflake to enable our OOTB Freshness and Volume Monitors. With access to the query logs of those tables, we will be able to build table and field lineage. With the ability to run SQL queries on those tables, Monte Carlo can run advanced monitors to analyze the data in those tables.
Monte Carlo is a Data Observability tool after all, so the biggest step in getting value from it is letting it observe your data!
After connecting Snowflake here is what you can expect to see Out of the Box:
Freshness and Volume:
Our freshness and volume monitors are active out of the box. By collecting metadata on your tables, Monte Carlo’s machine learning algorithm can set automatic thresholds to alert when a table has gone too long without an update, and when an abnormal amount of rows are added or deleted.
Query Logs:
When giving access to query logs Monte Carlo is able to show the amount of read and write queries on a table, and give you access to the details of those queries. Monte Carlo can even highlight when a query appears to have failed.
Table Lineage:
Monte Carlo is able to map the connections between your tables, and can show upstream and downstream connections. We can even connect anomalies that are detected in upstream tables to anomalies in downstream tables.
Tag Ingestion
Monte Carlo has the ability to ingest tags that are placed on Snowflake tables. Tags are displayed on the tables General Information page. By default tag ingestion is not enabled. Please reach out to Monte Carlo Support if you would like to enable tag ingestion.
Connecting Snowflake
Now that we know the value connecting Snowflake to Monte Carlo can bring, here are the necessary steps to set up the integration.
Admin credentials required
To make the read only service user for Monte Carlo, you will need permissions to use the
ACCOUNTADMIN
role on Snowflake.
This guide explains how to create a read-only service user for Monte Carlo on Snowflake.
To review all steps necessary to integrate a data warehouse with Monte Carlo, please see here.
Creating a service user
Please use the SQL snippet below to create a service user for Monte Carlo. Before using the snippet, you will need to populate the following:
<your_database>
: Snowflake database to be monitored by Monte Carlo.
Please note:
- This script will create a Service User that requires a private key for authentication. After running this script to create the user and permissions, you will need to create a private and public key then modify the user to assign the public key.
- If you need to monitor multiple databases in your Snowflake account, please run the snippet multiple times, changing the value of
<your_database>
on each run. - Monte Carlo will use an XS warehouse by default. If you expect heavy monitoring workloads, you may consider a larger warehouse, or assigning Monte Carlo's service user to an existing larger warehouse in your account. This setting can be modified later if necessary.
- We recommend using Snowflake's worksheet interface to run the snippet. Please copy the code, make any necessary modifications and use the All Queries option to run it in full.
Schema vs Database Future Grants
Applying schema level future grants in a Snowflake account where there are only database future grants can break existing roles! From Snowflake's docs:
"When future grants are defined at both the database and schema level, the schema level grants take precedence over the database level grants, and the database level grants are ignored. An important point to note here is that as long as there is a SCHEMA level future grants, ALL DATABASE levels will be ignored, even for the roles that are NOT defined in the SCHEMA level future grants."
The below script checks if there are any SCHEMA level future grants before creating new SCHEMA level grants. If there aren't any we assume you're using DATABASE level future grants, and create the new grants on the DATABASE level instead.
Please see here for more information: https://community.snowflake.com/s/article/DB-Level-Future-Grants-Overridden-by-Schema-Level-Future-Grants
-- Configuration
set mc_username='MONTE_CARLO';
set mc_warehouse_size='XSMALL';
set mc_warehouse_name='MONTE_CARLO_WH';
set mc_role_name='MONTE_CARLO_ROLE';
set database_to_monitor='<your_database>';
-- Set role for grants
USE ROLE ACCOUNTADMIN;
-- Create warehouse for Monte Carlo's monitoring workload
CREATE WAREHOUSE IF NOT EXISTS identifier($mc_warehouse_name) WAREHOUSE_SIZE=$mc_warehouse_size INITIALLY_SUSPENDED=TRUE
AUTO_SUSPEND = 5 AUTO_RESUME = TRUE;
-- Create the role Monte Carlo will use
CREATE ROLE IF NOT EXISTS identifier($mc_role_name);
-- Create Monte Carlo's service user and grant access to role
CREATE USER IF NOT EXISTS identifier($mc_username) DEFAULT_ROLE=$mc_role_name TYPE=SERVICE;
GRANT ROLE identifier($mc_role_name) TO USER identifier($mc_username);
-- Grant permissions to use the new warehouse
GRANT OPERATE, USAGE, MONITOR ON WAREHOUSE identifier($mc_warehouse_name) TO ROLE identifier($mc_role_name);
-- Grant privileges to allow access to query history
GRANT IMPORTED PRIVILEGES ON DATABASE "SNOWFLAKE" TO ROLE identifier($mc_role_name);
-- Grant metadata privileges to database to be monitored
GRANT USAGE,MONITOR ON DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT USAGE,MONITOR ON ALL SCHEMAS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
USE DATABASE identifier($database_to_monitor);
CREATE OR REPLACE PROCEDURE GRANT_REFERENCES_TO_MONTE_CARLO()
RETURNS VARCHAR
LANGUAGE javascript
EXECUTE AS CALLER
AS
$$
// If a Snowflake account has only database future grants, applying schema level future grants can break existing roles!
//
// "When future grants are defined at both the database and schema level, the schema level grants take precedence over
// the database level grants, and the database level grants are ignored. An important point to note here is that as
// long as there is a SCHEMA level future grants, ALL DATABASE levels will be ignored, even for the roles that are
// NOT defined in the SCHEMA level future grants."
// See: https://docs.snowflake.com/en/sql-reference/sql/grant-privilege.html#considerations
//
// This is why the following script checks if there are any SCHEMA level future grants before creating new SCHEMA level
// grants. If there aren't any we assume you're using DATABASE level future grants, and create the new grants on the
// DATABASE level instead.
//
// Please see here for more information: https://community.snowflake.com/s/article/DB-Level-Future-Grants-Overridden-by-Schema-Level-Future-Grants
//
snowflake.createStatement({sqlText: `use database identifier($database_to_monitor)`}).execute();
var show_future_grants = snowflake.createStatement({sqlText: `SHOW FUTURE GRANTS IN DATABASE identifier($database_to_monitor)`}).execute();
var schema_future_grants = snowflake.createStatement({sqlText: `select * from TABLE(RESULT_SCAN('${show_future_grants.getQueryId()}')) where "grant_on" = 'SCHEMA'`}).execute();
if (schema_future_grants.getRowCount() > 0) {
var schemas_to_grant = snowflake.createStatement({ sqlText:`select * from information_schema.SCHEMATA where SCHEMA_NAME <> 'INFORMATION_SCHEMA'`}).execute();
var granted_schemas = "";
while(schemas_to_grant.next()) {
table_schema = schemas_to_grant.getColumnValue("SCHEMA_NAME");
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON ALL TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON ALL VIEWS IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON ALL EXTERNAL TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT MONITOR ON ALL DYNAMIC TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON FUTURE TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON FUTURE VIEWS IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT MONITOR ON FUTURE DYNAMIC TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
granted_schemas += table_schema + "; "
}
return `Granted references for schemas ${granted_schemas}`;
}
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON ALL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON ALL VIEWS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT MONITOR ON ALL DYNAMIC TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON FUTURE TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON FUTURE VIEWS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText: `GRANT MONITOR ON FUTURE DYNAMIC TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
return `Granted references for database`;
$$;
CALL GRANT_REFERENCES_TO_MONTE_CARLO();
-- Grant read-only privileges to database to be monitored
GRANT SELECT ON ALL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT SELECT ON ALL VIEWS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT SELECT ON ALL EXTERNAL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT SELECT ON ALL STREAMS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT SELECT ON ALL DYNAMIC TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
GRANT USAGE,MONITOR ON FUTURE SCHEMAS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name);
USE DATABASE identifier($database_to_monitor);
CREATE OR REPLACE PROCEDURE GRANT_SELECT_FUTURES_TO_MONTE_CARLO()
RETURNS VARCHAR
LANGUAGE javascript
EXECUTE AS CALLER
AS
$$
// If a Snowflake account has only database future grants, applying schema level future grants can break existing roles!
//
// "When future grants are defined at both the database and schema level, the schema level grants take precedence over
// the database level grants, and the database level grants are ignored. An important point to note here is that as
// long as there is a SCHEMA level future grants, ALL DATABASE levels will be ignored, even for the roles that are
// NOT defined in the SCHEMA level future grants."
// See: https://docs.snowflake.com/en/sql-reference/sql/grant-privilege.html#considerations
//
// This is why the following script checks if there are any SCHEMA level future grants before creating new SCHEMA level
// grants. If there aren't any we assume you're using DATABASE level future grants, and create the new grants on the
// DATABASE level instead.
//
// Please see here for more information: https://community.snowflake.com/s/article/DB-Level-Future-Grants-Overridden-by-Schema-Level-Future-Grants
//
snowflake.createStatement({sqlText: `use database identifier($database_to_monitor)`}).execute();
var show_future_grants = snowflake.createStatement({sqlText: `SHOW FUTURE GRANTS IN DATABASE identifier($database_to_monitor)`}).execute();
var schema_future_grants = snowflake.createStatement({sqlText: `select * from TABLE(RESULT_SCAN('${show_future_grants.getQueryId()}')) where "grant_on" = 'SCHEMA'`}).execute();
if (schema_future_grants.getRowCount() > 0) {
var schemas_to_grant = snowflake.createStatement({ sqlText:`select SCHEMA_NAME from information_schema.SCHEMATA where SCHEMA_NAME <> 'INFORMATION_SCHEMA'`}).execute();
var granted_schemas = "";
while(schemas_to_grant.next()) {
table_schema = schemas_to_grant.getColumnValue("SCHEMA_NAME");
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE VIEWS IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE EXTERNAL TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE STREAMS IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE DYNAMIC TABLES IN SCHEMA ${table_schema} TO ROLE identifier($mc_role_name)`}).execute();
granted_schemas += table_schema + ";"
}
return `Granted future select for schemas ${granted_schemas}`;
}
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE VIEWS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE EXTERNAL TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE STREAMS IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
snowflake.createStatement({ sqlText:`GRANT SELECT ON FUTURE DYNAMIC TABLES IN DATABASE identifier($database_to_monitor) TO ROLE identifier($mc_role_name)`}).execute();
return `Granted future select for database`;
$$;
CALL GRANT_SELECT_FUTURES_TO_MONTE_CARLO();
Creating a key-pair for the service user
Follow Snowflake's key-pair authentication guide to create a private and public key for the service user you just created.
- The private key will be needed during onboarding in the Monte Carlo UI.
- Monte Carlo supports using an encrypted or unencrypted private key.
- The public key will be assigned to the service user in Snowflake
Use this SQL command to assign the public key to the service user in Snowflake:
ALTER USER MONTE_CARLO SET RSA_PUBLIC_KEY='<public key>';
Exclude the public key delimiters in the SQL statement
When you generate the public key the output will be in PEM format and look like:
-----BEGIN PUBLIC KEY----- MIIBIj... -----END PUBLIC KEY-----
In the SQL command do not include the delimiters, it will look like:
ALTER USER MONTE_CARLO SET RSA_PUBLIC_KEY='MIIBIj...';
Monitoring multiple databases with Monte Carlo
To monitor multiple databases, you will grant Monte Carlo's service user the necessary permissions to each of them. Monte Carlo will automatically monitor all databases to which it has access.
To grant permissions, please run the snippet in the previous section for each of the databases you would like to monitor. You will modify the database_to_monitor parameter in each run.
Monitor & Lineage Support
Below are the monitors and lineage support for the Snowflake integration today. Please reach out to your Monte Carlo representative if you have requirements for additional monitors.
Category | Monitor / Lineage Capabilities | Support |
---|---|---|
Table Monitor | Freshness | ✅ |
Table Monitor | Volume | ✅ |
Table Monitor | Schema Changes | ✅ |
Metric Monitor | Metric | ✅ |
Validation Monitor | Custom SQL | ✅ |
Validation Monitor | Comparison | ✅ |
Validation Monitor | Validation | ✅ |
Job Monitor | Query performance | ✅ |
Lineage | Lineage | ✅ |
More information on monitors in Monte Carlo.
*We do not support custom monitors on fields and tables named with Snowflake reserved words.
Providing service user credentials during onboarding
When using Monte Carlo's onboarding wizard, please provide the following details to validate and complete the integration:
- User: The name of the service user created, MONTE_CARLO by default.
- Private key and optional Private key passphrase for the encrypted private key.
- Account: This would be the part that precedes snowflakecomputing.com in your Snowflake console URL. For example, if you use the URL https://xyz12345.us-east-1.snowflakecomputing.com/ to access Snowflake, your account name will be xyz12345.us-east-1.
- Warehouse: Please provide the name of the warehouse you chose while creating the service user, MONTE_CARLO_WH by default.
Removing Permissions
If you want to remove permissions from a subset of warehouses, schemas, or tables, run the following commands in the appropriate database, replacing <DATABASE>
with the database name and <SCHEMA>
with the schema name.
To revoke permissions for a schema:
REVOKE ALL PRIVILEGES ON SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON ALL VIEWS IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE VIEWS IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
As this script removes the USAGE
permission from the schema, that's enough to prevent our service user from accessing any objects (tables, views, etc.) in the schema.
To revoke permissions to specific table:
REVOKE ALL PRIVILEGES ON TABLE <DATABASE>.<SCHEMA>.<TABLE> FROM MONTE_CARLO_ROLE
To revoke access to the entire database:
REVOKE ALL PRIVILEGES ON DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON ALL TABLES IN DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON ALL VIEWS IN DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE TABLES IN DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE VIEWS IN DATABASE <DATABASE> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
REVOKE ALL PRIVILEGES ON FUTURE VIEWS IN SCHEMA <DATABASE>.<SCHEMA> FROM MONTE_CARLO_ROLE;
MONTE_CARLO_ROLE
is usually the role assigned to the user used by Monte Carlo but it might need to be updated too
Updating the Service User To Use Key-Pair Authentication
Follow these steps if you need to switch your Monte Carlo service user from user/password to key-pair authentication.
- Follow Snowflake's key-pair authentication guide to create a private and public key for the service user to use.
- Generating a private key
- The private key will be needed during onboarding in the Monte Carlo UI.
- Monte Carlo supports using an encrypted or unencrypted private key.
- Generating a public key
- The public key will be assigned to the service user in Snowflake
- Generating a private key
- Using the public key created above, alter the service user in snowflake with this command:
ALTER USER MONTE_CARLO SET TYPE = SERVICE, RSA_PUBLIC_KEY='MIIBIj...';
- Edit your Snowflake connection through the UI
Go to Settings -> Integration, find the snowflake connection, click on the three dots per docs here. Under "authentication type", switch the toggle from "Service account" to "Key pair authentication", then upload the private key you created in step 1.
How Do I Know it Worked? (Snowflake Connection Validation Tests)
After the initial integration of Snowflake and Monte Carlo, you should see your assets in Monte Carlo within a few minutes. If they are still missing you can run a validation test on your Snowflake connection by going to the Integration tab on the Settings page, and selecting Test from the menu next to your Snowflake Connection:
When running the tests we expect to see all green checkmarks as below. Any error that is encountered should come with steps to fix it:
If assets from a specific database or schema are missing you can use the Data Asset validation to test our ability to access them:
Support for observing Snowflake components
Views: Monte Carlo supports Views -- they show up as assets, have opt-in automatic and custom monitors, and are in lineage
External Tables: Monte Carlo supports External Tables -- they show up as assets, have both automatic and custom monitors, and are in lineage
Dynamic Tables: Monte Carlo supports Snowflake Dynamic Tables -- they show up as assets, have both automatic and custom monitors, and are in lineage
Streams: Monte Carlo supports Snowflake Streams -- they show up as assets, have schema-change and custom monitors, and are in lineage.
Tasks: Monte Carlo does not currently collect or represent Snowflake Tasks. Please reach out for further information.
Adding a new query-engine connection
Querying connections enable the creation of custom monitors that leverage multiple different connections. For example, you can now use a larger Snowflake warehouse for resource-intensive field health checks or SQL queries without increasing overall costs or placing additional load on other queries. By default, both the dashboard and APIs will use the primary query engine (i.e., the first query engine added to a warehouse) for all operations.
To add a new querying engine to a Snowflake warehouse, navigate to the Settings > Integrations page, click the actions menu next to the target warehouse, and select "Add querying connection".
A drawer will open, prompting you to input details for setting up the querying connection. This process is similar to setting up a new connection, so refer to the documentation above for more information.
Updated 2 days ago