ClickHouse (public preview)
What is ClickHouse
ClickHouse is an open-source column-oriented database management system that is designed for online analytical processing (OLAP) and real-time analytics on large datasets. It is well known for having extremely fast query performance, making it ideal for data warehousing, business intelligence, and real-time analytics applications.
Monitor & Lineage Support
Integrating Monte Carlo with ClickHouse allows you to monitor your data assets through capabilities listed below, which can be created in either the UI wizard and/or programmatically via monitors as code (API/SDK too). Note that Monte Carlo will add the capabilities below over time. Please reach out to your Monte Carlo representative if you have requirements for additional monitors.
Category | Monitor / Lineage Capabilities | Support |
---|---|---|
Table Monitor | Freshness | |
Table Monitor | Volume | |
Table Monitor | Schema Changes | ✅ |
Metric Monitor | Metric | |
Metric Monitor | Comparison | |
Validation Monitor | SQL Custom | ✅ |
Validation Monitor | Validation | |
Job Monitor | Query performance | |
Lineage | Lineage |
Connecting to ClickHouse
Here are the necessary steps to set up the integration.
PrerequisitesTo create a service account, you will need admin credentials to your ClickHouse database.
This guide explains how to create a read-only service account for Monte Carlo in your ClickHouse database.
To review all steps necessary to integrate a data warehouse with Monte Carlo, please see here.
Create a read-only service account
Please run the SQL snippet below on your ClickHouse database to create a read-only service account for Monte Carlo.
-- Create a user for Monte Carlo.
CREATE USER montecarlo IDENTIFIED BY '<PASSWORD>'
-- Create a read-only role for Monte Carlo.
CREATE ROLE montecarlo_read_only;
-- Grant access to system tables necessary for metadata collection.
GRANT SELECT ON system.tables TO montecarlo_read_only
GRANT SELECT ON system.parts TO montecarlo_read_only
-- Grant access to information_schema views necessary for metadata collection.
GRANT SELECT ON information_schema.schemata TO montecarlo_read_only;
GRANT SELECT ON information_schema.tables TO montecarlo_read_only;
GRANT SELECT ON information_schema.columns TO montecarlo_read_only;
-- Grant access to all tables. Repeat for all relevant databases (e.g. default, etc.).
-- Select permissions are necessary to create assets and custom SQL monitors.
--
-- To get a list of all databases available:
-- SELECT name FROM system.databases;
--
-- Use the wildcard (*) to grant access to all tables in a database, including
-- tables created after the grant is applied.
GRANT SELECT ON <database>.* TO montecarlo_read_only;
-- Grant montecarlo_read_only role to montecarlo user.
GRANT montecarlo_read_only TO montecarlo;
Provide service account credentials to Monte Carlo
In the settings page of the Monte Carlo UI, go to tile "Integrations", then click "Add" integration, select "Clickhouse" from "Warehouse and lakes". Then provide the following details to validate and complete the integration:
- Integration Name: Friendly name for your new connection, which will be referenced and shown throughout the Monte Carlo UI. For example,
clickhouse_prod
. - User and Password: Please provide the credentials you chose while creating the service account.
- Host: The endpoint of your instance.
- Port: The HTTP or HTTPS port for your instance. (Not the Native Protocol, or TCP, port.)
- Database: Please provide the name of the database.
Alternatively, we also support setup through our CLI:
- Please follow this guide to install and configure the CLI.
- Please use the command
montecarlo integrations add-clickhouse
to connect. For reference, see help here.
montecarlo integrations add-clickhouse \
--name clickhouse_prod \
--user montecarlo \
--password -1 \
--host clickhouse.example.com \
--port 8443 \
--database default
How can I validate Monte Carlo has access to my assets?
After completing the Clickhouse integration setup, your Clickhouse assets should appear in the Monte Carlo Assets page between a few minutes and one hour. If you don't see your assets after this time period, you can run a validation test to troubleshoot the connection:
- Navigate to Settings → Integrations
- Find your Clickhouse integration and click on it
- Select the specific connection you want to test
- Click the Test button from the connection menu
This validation test will help identify any configuration issues that might be preventing your assets from appearing.
When running the validation test, you should see all green checkmarks. If any errors are encountered, the system will provide specific steps to resolve them.
Excluded Databases
The following system created databases are excluded from our metadata collection:
- system
- INFORMATION_SCHEMA
FAQs
Is ClickHouse supported in both self-hosted and managed deployments?
Yes, Monte Carlo supports both self-hosted ClickHouse and managed services like ClickHouse Cloud. The setup process is the same for both deployment types.
If I am using ClickHouse Cloud, will monitors work while my service is idle?
Yes, the monitor query will wake the service automatically and the query will complete before the timeout expires. The slight delay while the service wakes should not cause any issues.
If I am using ClickHouse Cloud, can I add a connection while my service is idle?
If your service is idle while setting up the integration, the initial connection test may time out. If that happens, please wake the service and try adding the connection again.
If I use a full table identifier like<database>.<schema>.<table_name>
in an SQL query, why does the query fail?
ClickHouse can identify tables using either <database>.<table_name>
or <table_name>
. It does not support the <database>.<schema>.<table_name>
format.
Updated 7 days ago