ClickHouse (public preview)
What is ClickHouse
ClickHouse is an open-source column-oriented database management system that is designed for online analytical processing (OLAP) and real-time analytics on large datasets. It is well known for having extremely fast query performance, making it ideal for data warehousing, business intelligence, and real-time analytics applications.
Monitor & Lineage Support
Integrating Monte Carlo with ClickHouse allows you to monitor your data assets through capabilities listed below, which can be created in either the UI wizard and/or programmatically via monitors as code (API/SDK too). Note that Monte Carlo will add the capabilities below over time. Please reach out to your Monte Carlo representative if you have requirements for additional monitors.
Category | Monitor / Lineage Capabilities | Support |
---|---|---|
Table Monitor | Freshness | |
Table Monitor | Volume | |
Table Monitor | Schema Changes | ✅ |
Metric Monitor | Metric | |
Metric Monitor | Comparison | |
Validation Monitor | SQL Custom | ✅ |
Validation Monitor | Validation | |
Job Monitor | Query performance | |
Lineage | Lineage |
Connecting to ClickHouse
Here are the necessary steps to set up the integration.
PrerequisitesTo create a service account, you will need admin credentials to your ClickHouse database.
This guide explains how to create a read-only service account for Monte Carlo in your ClickHouse database.
To review all steps necessary to integrate a data warehouse with Monte Carlo, please see here.
Create a read-only service account
Please run the SQL snippet below on your ClickHouse database to create a read-only service account for Monte Carlo.
-- Create a user for Monte Carlo.
CREATE USER montecarlo IDENTIFIED BY '<PASSWORD>'
-- Create a read-only role for Monte Carlo.
CREATE ROLE montecarlo_read_only;
-- Grant access to system tables necessary for metadata collection.
GRANT SELECT ON system.tables TO montecarlo_read_only
GRANT SELECT ON system.parts TO montecarlo_read_only
-- Grant access to information_schema views necessary for metadata collection.
GRANT SELECT ON information_schema.schemata TO montecarlo_read_only;
GRANT SELECT ON information_schema.tables TO montecarlo_read_only;
GRANT SELECT ON information_schema.columns TO montecarlo_read_only;
-- Grant access to all tables. Repeat for all relevant databases (e.g. default, etc.).
-- Select permissions are necessary to create assets and custom SQL monitors.
--
-- To get a list of all databases available:
-- SELECT name FROM system.databases;
--
-- Use the wildcard (*) to grant access to all tables in a database, including
-- tables created after the grant is applied.
GRANT SELECT ON <database>.* TO montecarlo_read_only;
-- Grant montecarlo_read_only role to montecarlo user.
GRANT montecarlo_read_only TO montecarlo;
Provide service account credentials to Monte Carlo
In the settings page of the Monte Carlo UI, go to tile "Integrations", then click "Add" integration, select "Clickhouse" from "Warehouse and lakes". Then provide the following details to validate and complete the integration:
- Integration Name: Friendly name for your new connection, which will be referenced and shown throughout the Monte Carlo UI. For example,
clickhouse_prod
. - User and Password: Please provide the credentials you chose while creating the service account.
- Host: The endpoint of your instance.
- Port: The HTTP or HTTPS port for your instance. (Not the Native Protocol, or TCP, port.)
- Database: Please provide the name of the database.
Alternatively, we also support setup through our CLI:
- Please follow this guide to install and configure the CLI.
- Please use the command
montecarlo integrations add-clickhouse
to connect. For reference, see help here.
montecarlo integrations add-clickhouse \
--name clickhouse_prod \
--user montecarlo \
--password -1 \
--host clickhouse.example.com \
--port 8443 \
--database default
FAQs
Does Monte Carlo support cloud-based, managed ClickHouse services, like ClickHouse Cloud?
Yes, Monte Carlo supports ClickHouse Cloud, and other cloud-based, managed ClickHouse services. The setup steps are the same as for self-managed ClickHouse.
If I am using ClickHouse Cloud, will monitors work while my service is idle?
Yes, the monitor query will wake the service automatically and the query will complete before the timeout expires. The slight delay while the service wakes should not cause any issues.
If I am using ClickHouse Cloud, can I add a connection while my service is idle?
If your service is idle while setting up the integration, the initial connection test may time out. If that happens, please wake the service and try adding the connection again.
If I use a full table identifier like<database>.<schema>.<table_name>
in an SQL query, why does the query fail?
ClickHouse can identify tables using either <database>.<table_name>
or <table_name>
. It does not support the <database>.<schema>.<table_name>
format.
Updated 1 day ago