Hive SQL
To connect Monte Carlo to a Hive to run data health SQL queries, follow these steps:
- Create a Hive cluster for Monte Carlo's data health queries. You can use AWS EMR, GCP Dataproc, or any existing HiveServer2-compatible cluster in your environment.
- Ensure that Monte Carlo's data collector has network connectivity to the cluster (VPC peering is required in most cases).
- Create a service account on Hive if necessary.
- Provide service account credentials to Monte Carlo.
Prerequisites
- To complete this guide, you will need permissions to create users on Hive (if authentication is enabled).
- Hive version must be 2.x.
Authentication modes
Monte Carlo supports two authentication modes when connecting to HiveServer2:
| CLI flag | Protocol | When to use |
|---|---|---|
--auth-mode SASL (default) | PLAIN (username/password) | Use when HiveServer2 requires PLAIN authentication, such as GCP Dataproc with username/password or AWS EMR with LDAP. |
--auth-mode NOSASL | No authentication | Use when HiveServer2 has authentication disabled. |
We recommend using a dedicated service account with the appropriate permissions for the Monte Carlo integration.
Providing account credentials to Monte Carlo
You will provide connection details for Hive (SQL) using Monte Carlo's CLI:
- Please follow this guide to install and configure the CLI.
- Please use the command
montecarlo integrations add-hiveto set up Hive connectivity. For reference, see help for this command below:
$ montecarlo integrations add-hive --host ec2-18-188-136-250.us-east-2.compute.amazonaws.com --port 10000 --database default --auth-mode NOSASL --help
Usage: montecarlo integrations add-hive [OPTIONS]
Setup a Hive SQL integration. For health queries.
Options:
--host TEXT Hostname. [required]
--database TEXT Name of database.
--port INTEGER HTTP port. [default: 10000]
--user TEXT Username with access to hive. [required]
--auth-mode [SASL|NOSASL] Hive authentication mode. [default: SASL]
--name TEXT Friendly name of the warehouse which the
connection will belong to.
--collector-id UUID ID for the data collector. To disambiguate
accounts with multiple collectors.
--skip-validation Skip all connection tests. This option cannot be
used with 'validate-only'.
--validate-only Run connection tests without adding. This option
cannot be used with 'skip-validation'.
--auto-yes Skip any interactive approval.
--option-file FILE Read configuration from FILE.
--help Show this message and exit.Updated 5 days ago
