ClickHouse (public preview)

What is ClickHouse

ClickHouse is an open-source column-oriented database management system that is designed for online analytical processing (OLAP) and real-time analytics on large datasets. It is well known for having extremely fast query performance, making it ideal for data warehousing, business intelligence, and real-time analytics applications.

Monitor & Lineage Support

Integrating Monte Carlo with ClickHouse allows you to monitor your data assets through capabilities listed below, which can be created in either the UI wizard and/or programmatically via monitors as code (API/SDK too). Note that Monte Carlo will add the capabilities below over time. Please reach out to your Monte Carlo representative if you have requirements for additional monitors.

CategoryMonitor / Lineage CapabilitiesSupport
Table MonitorFreshness
Table MonitorVolume
Table MonitorSchema Changes
Metric MonitorMetric
Metric MonitorComparison
Validation MonitorSQL Custom
Validation MonitorValidation
Job MonitorQuery performance
LineageLineage

Connecting to ClickHouse

Here are the necessary steps to set up the integration.

📘

Prerequisites

To create a service account, you will need admin credentials to your ClickHouse database.

This guide explains how to create a read-only service account for Monte Carlo in your ClickHouse database.

To review all steps necessary to integrate a data warehouse with Monte Carlo, please see here.

Create a read-only service account

Please run the SQL snippet below on your ClickHouse database to create a read-only service account for Monte Carlo.

-- Create a user for Monte Carlo.
CREATE USER montecarlo IDENTIFIED BY '<PASSWORD>'

-- Create a read-only role for Monte Carlo.
CREATE ROLE montecarlo_read_only;

-- Grant access to system tables necessary for metadata collection.
GRANT SELECT ON system.tables TO montecarlo_read_only
GRANT SELECT ON system.parts TO montecarlo_read_only

-- Grant access to information_schema views necessary for metadata collection.
GRANT SELECT ON information_schema.schemata TO montecarlo_read_only;
GRANT SELECT ON information_schema.tables TO montecarlo_read_only;
GRANT SELECT ON information_schema.columns TO montecarlo_read_only;

-- Grant access to all tables. Repeat for all relevant databases (e.g. default, etc.).
-- Select permissions are necessary to create assets and custom SQL monitors.
--
-- To get a list of all databases available:
--     SELECT name FROM system.databases;
--
-- Use the wildcard (*) to grant access to all tables in a database, including
-- tables created after the grant is applied.
GRANT SELECT ON <database>.* TO montecarlo_read_only;

-- Grant montecarlo_read_only role to montecarlo user.
GRANT montecarlo_read_only TO montecarlo;

Provide service account credentials to Monte Carlo

In the settings page of the Monte Carlo UI, go to tile "Integrations", then click "Add" integration, select "Clickhouse" from "Warehouse and lakes". Then provide the following details to validate and complete the integration:

  • Integration Name: Friendly name for your new connection, which will be referenced and shown throughout the Monte Carlo UI. For example, clickhouse_prod.
  • User and Password: Please provide the credentials you chose while creating the service account.
  • Host: The endpoint of your instance.
  • Port: The HTTP or HTTPS port for your instance. (Not the Native Protocol, or TCP, port.)
  • Database: Please provide the name of the database.

Alternatively, we also support setup through our CLI:

  1. Please follow this guide to install and configure the CLI.
  2. Please use the command montecarlo integrations add-clickhouse to connect. For reference, see help here.
montecarlo integrations add-clickhouse \
    --name clickhouse_prod \
    --user montecarlo \
    --password -1 \
    --host clickhouse.example.com \
    --port 8443 \
    --database default

How can I validate Monte Carlo has access to my assets?

After completing the Clickhouse integration setup, your Clickhouse assets should appear in the Monte Carlo Assets page between a few minutes and one hour. If you don't see your assets after this time period, you can run a validation test to troubleshoot the connection:

  1. Navigate to Settings → Integrations
  2. Find your Clickhouse integration and click on it
  3. Select the specific connection you want to test
  4. Click the Test button from the connection menu

This validation test will help identify any configuration issues that might be preventing your assets from appearing.

When running the validation test, you should see all green checkmarks. If any errors are encountered, the system will provide specific steps to resolve them.

Excluded Databases

The following system created databases are excluded from our metadata collection:

  • system
  • INFORMATION_SCHEMA

FAQs

Is ClickHouse supported in both self-hosted and managed deployments?

Yes, Monte Carlo supports both self-hosted ClickHouse and managed services like ClickHouse Cloud. The setup process is the same for both deployment types.

If I am using ClickHouse Cloud, will monitors work while my service is idle?

Yes, the monitor query will wake the service automatically and the query will complete before the timeout expires. The slight delay while the service wakes should not cause any issues.

If I am using ClickHouse Cloud, can I add a connection while my service is idle?

If your service is idle while setting up the integration, the initial connection test may time out. If that happens, please wake the service and try adding the connection again.

If I use a full table identifier like<database>.<schema>.<table_name> in an SQL query, why does the query fail?

ClickHouse can identify tables using either <database>.<table_name> or <table_name>. It does not support the <database>.<schema>.<table_name> format.