Example: IOMETE
Overview
This guide explains how to set up an IOMETE integration with Monte Carlo.
IOMETE is a data lakehouse platform built on Apache Spark and Apache Iceberg. It provides a unified environment for data engineering, analytics, and AI workloads, with support for SQL querying via a Spark-compatible Thrift server.
Monte Carlo's IOMETE integration uses a hybrid approach. You push metadata and query logs to Monte Carlo via the Push Ingest API, giving you full control over what observability data is shared. SQL monitors (custom SQL, field health, etc.) run directly against IOMETE through a native Spark Thrift connection. Lineage is automatically inferred from the query logs you push. No persistent catalog access is required. Specifically:
- Metadata is pushed to Monte Carlo via the Push Ingest API — table schemas, columns, row counts, freshness timestamps
- Query logs are pushed via the Push Ingest API — SQL query history with timing and row counts
- Lineage is automatically inferred from pushed query logs — Monte Carlo parses the SQL to identify source and destination tables. You can also push Lineage if you prefer!
- SQL monitors (custom SQL, field health, etc.) run directly against IOMETE through a native Spark Thrift connection
Feature Support
| Category | Monitor / Lineage Capabilities | Support |
|---|---|---|
| Table Monitor | Freshness (via opt-in volume monitor) | ✅ |
| Table Monitor | Volume (opt-in) | ✅ |
| Table Monitor | Schema Changes | ✅ |
| Table Monitor | JSON Schema Changes | ❌ |
| Metric Monitor | Metric | ✅ |
| Metric Monitor | Comparison | ✅ |
| Validation Monitor | Custom SQL | ✅ |
| Validation Monitor | Validation | ❌ |
| Job Monitor | Query performance | ❌ |
| Lineage | Lineage | ✅* |
*Lineage is inferred automatically from pushed query logs. Monte Carlo parses the SQL statements to identify source and destination tables. You can also push Lineage directly!
More information on monitors in Monte Carlo.
Prerequisites
Before setting up the IOMETE integration, ensure you have:
- A Monte Carlo account with permissions to add integrations
- A Monte Carlo Ingestion Key (scope: "Ingestion") for pushing metadata and query logs — this is separate from your standard API key. See Push Ingest API for how to create one.
- Access to the IOMETE Spark Thrift server (for SQL monitors)
- Network connectivity between Monte Carlo's service (or agent) and the IOMETE Thrift endpoint
- Monte Carlo SDK (
pycarlov0.12.251+) installed for pushing metadata and query logs
Permissions
Monte Carlo requires:
- For SQL monitors: A Spark user with read access to the databases and tables you want to monitor via the Thrift server
- For metadata and query log push: A Monte Carlo Ingestion Key (not a standard API key). Create one via the
createIntegrationKeyGraphQL mutation or CLI — see below. You will also need the warehouse UUID returned when creating the integration.
Notes / Recommendations
- We recommend creating a dedicated service account in IOMETE for Monte Carlo rather than using personal credentials.
- If deploying behind a firewall or private network, ensure Monte Carlo has network access to the IOMETE Thrift server endpoint. See IP Allowlisting for the IP addresses to allowlist for your deployment. You may also prefer to use a collection agent. Learn more about the deployment options here.
Installation
Setting up IOMETE with Monte Carlo involves three parts:
- Create the integration using the
createOrUpdateCustomIntegrationGraphQL mutation - Push metadata using the Monte Carlo SDK
- Push query logs using the Monte Carlo SDK
AI-Assisted SetupMonte Carlo provides AI skills that can help you build and run the collection scripts for pushing metadata, query logs, and lineage. The push ingestion skill generates warehouse-specific collection scripts, pushes data to Monte Carlo, and validates the results — all from your editor.
Install the mc-agent-toolkit plugin and use commands like
/mc-build-metadata-collectorand/mc-build-lineage-collectorto automate the push workflow. See the push ingestion skill documentation for details.
How Do I use the API?Visit the API Explorer in the Monte Carlo UI (learn more about the API Explorer here).
Alternatively, you can generate an API key and use tools such as cURL or Postman to make API calls.
Step 1: Create the Custom Integration
IOMETE integrations use the custom integration framework (CUSTOM_INTEGRATION warehouse type). This creates a warehouse where each capability (metadata, query logs, lineage, monitors) can be independently configured as "collect" (via a native connection), "reuse" (share another capability's connection), or left unconfigured (push via API).
For IOMETE, only monitors use a collect connection (Spark); everything else is pushed.
1. Test and save Spark credentials
First, test the connection to your IOMETE Spark Thrift server and save the credentials using testSparkCredentials. IOMETE uses HTTP mode for its Thrift server:
mutation testSparkCredentials{
testSparkCredentials(
httpMode: {
url: "http://iomete-lakehouse.example.com:10000/cliservice"
username: "monte_carlo_service"
password: "<your-service-account-password>"
}
connectionOptions: {
dcId: "<your-deployment-uuid>"
skipValidation: false
skipPermissionTests: false
}
) {
key
success
}
}
The url should point to your IOMETE Thrift server's HTTP endpoint (default port 10000, path cliservice). For example: http://localhost:10000/cliservice.
If the test succeeds, the response returns a key — this is a temporary credentials key you will use in the next step.
Binary ModeIf your IOMETE Thrift server uses binary transport instead of HTTP, use
binaryModewithhost,port,username,password, anddatabasefields instead ofhttpMode.
2. Create the custom integration
Use the credentials key from the previous step to create the integration using createOrUpdateCustomIntegration:
mutation {
createOrUpdateCustomIntegration(
name: "IOMETE Production"
monitors: {
mode: COLLECT
connectionType: "spark"
credentialsKey: "<key-from-testSparkCredentials>"
deploymentId: "<data-collector-uuid>"
}
) {
result {
warehouseUuid
connections {
capability
connection {
uuid
type
jobTypes
name
}
}
}
}
}In this configuration:
- metadata: Not specified (pushed via API)
- queryLogs: Not specified (pushed via API)
- lineage: Automatically inferred from pushed query logs
- monitors: Uses the Spark Thrift connection for running SQL queries
Save the warehouseUuid from the response — you will need it as the warehouse UUID for pushing metadata and query logs.
Create an Ingestion Key
Push API calls require an Ingestion Key, which is different from a standard API key. Create one using the GraphQL API or CLI using createIntegrationKey:
GraphQL:
mutation {
createIntegrationKey(
description: "IOMETE push ingestion key"
scope: Ingestion
warehouseIds: ["<warehouse-uuid-from-step-1>"]
) {
key { id secret }
}
}CLI:
montecarlo integrations create-key \
--scope Ingestion \
--description "IOMETE push ingestion key"
Save Credentials ImmediatelyThe key secret is shown only once at creation time. Save both the ID and secret to a secure secrets manager before proceeding.
Set the credentials as environment variables for the push scripts:
export MCD_INGEST_ID=<your-ingestion-key-id>
export MCD_INGEST_TOKEN=<your-ingestion-key-secret>For more details, see the Push Ingest API documentation.
Step 2: Push Metadata
Metadata tells Monte Carlo about your IOMETE tables — their schemas, columns, row counts, and freshness timestamps. You push metadata to the POST /ingest/v1/metadata endpoint using either the pycarlo SDK or direct HTTP calls.
For full details on the metadata push API, payload format, and authentication, see the Push Ingest API documentation.
Push metadata using the SDK
from datetime import datetime, timezone
from pycarlo.core import Client, Session
from pycarlo.features.ingestion import IngestionService
from pycarlo.features.ingestion.models import (
AssetField,
AssetFreshness,
AssetMetadata,
AssetVolume,
RelationalAsset,
)
client = Client(
session=Session(
mcd_id="<ingestion-key-id>",
mcd_token="<ingestion-key-secret>",
scope="Ingestion",
)
)
service = IngestionService(mc_client=client)
events = [
RelationalAsset(
type="TABLE",
metadata=AssetMetadata(
name="my_table",
database="my_database",
schema="default",
),
fields=[
AssetField(name="id", type="INTEGER"),
AssetField(name="name", type="VARCHAR(255)"),
AssetField(name="created_at", type="TIMESTAMP"),
],
volume=AssetVolume(row_count=100_000),
freshness=AssetFreshness(
last_update_time=datetime.now(timezone.utc).isoformat(),
),
),
]
result = service.send_metadata(
resource_uuid="<warehouse-uuid-from-step-1>",
resource_type="spark",
events=events,
)
invocation_id = service.extract_invocation_id(result)
print(f"Invocation ID: {invocation_id}")Repeat this for each table in your IOMETE catalog. You can push multiple tables in a single call by adding more RelationalAsset entries to the events list.
What to push
Connect to IOMETE's Spark Thrift server, query the Spark catalog metadata (databases, tables, columns), and push the results to Monte Carlo. Each metadata push should include:
- Database and table names (using the two-part
database.tablenaming convention) - Column definitions (name, type, description)
- Row counts and byte counts (for volume monitoring)
- Freshness timestamps (for freshness monitoring)
The warehouse UUID from Step 1 identifies which integration receives the metadata.
SchedulingWe recommend scheduling metadata pushes on a recurring basis (e.g., hourly or daily) to keep Monte Carlo's catalog up to date with changes in your IOMETE environment.
Push freshness and volume data at least once per hour for reliable anomaly detection.
Anomaly Detection TrainingFreshness detectors require approximately 7 samples with changed timestamps (~2 weeks) before they activate. Volume detectors require 10–48 samples (~42 days). Plan accordingly when first setting up the integration.
Step 3: Push Query Logs
Query logs enable lineage inference in Monte Carlo. When you push SQL query history, Monte Carlo parses the SQL statements to identify source and destination tables, building table-level and field-level lineage automatically.
Push query logs to the POST /ingest/v1/querylogs endpoint. For full details, see the Push Ingest API documentation.
from datetime import datetime, timezone
from pycarlo.features.ingestion import IngestionService
from pycarlo.features.ingestion.models import QueryLogEntry
logs = [
QueryLogEntry(
query_text="SELECT * FROM my_database.my_table",
start_time=datetime(2024, 1, 1, tzinfo=timezone.utc),
end_time=datetime(2024, 1, 1, 0, 0, 1, 500000, tzinfo=timezone.utc), # 1.5s later
returned_rows=100,
)
]
service = IngestionService(mc_client=client) # client must have scope set
service.send_query_logs(
resource_uuid="<data-collector-uuid>",
log_type="databricks-metastore-sql-warehouse",
events=logs,
)
invocation_id = service.extract_invocation_id(result)
print(f"Invocation ID: {invocation_id}")Each query log entry includes the SQL statement, start time, end time, and optional row count. The API returns an invocation_id you can use to track processing.
Log TypeUse
databricks-metastore-sql-warehouseas thelog_typefor IOMETE query logs. This routes the logs through the Spark SQL-compatible normalizer, which correctly parses the SQL and extracts lineage.
BatchingFor large volumes of query logs, batch your pushes into groups of approximately 500 events. Compressed request bodies must not exceed 1 MB. Query log processing typically completes within 1 hour.
Step 4: Configure Monitors (Optional)
Once metadata is pushed and the Spark connection is active, you can configure monitors on your IOMETE tables:
- Navigate to the table you want to monitor in Monte Carlo
- Click Monitors
- Click Enable to set up freshness and volume monitoring
- Configure additional monitors (Custom SQL, Field Health, etc.) as needed
Freshness and volume monitoring for IOMETE requires opt-in SQL monitors, similar to other Spark-based integrations.
For detailed instructions, see SQL Rules documentation.
Connection Details
| Field | Description | Example |
|---|---|---|
| Host | IOMETE Spark Thrift server hostname | iomete-lakehouse.example.com |
| Port | Thrift server port | 10000 |
| Username | Spark user with read access | monte_carlo_service |
| Password | Password for the Spark user | <your-service-account-password> |
| HTTP Path | Thrift HTTP transport path | cliservice |
FAQs
What is the push model?
In most integrations, Monte Carlo pulls metadata on a schedule using a data collector or agent. The push model inverts this: you send metadata, query logs, and lineage data directly to Monte Carlo via the Push Ingest API.
The push model exists to cover gaps where the pull model cannot operate — for example, when Monte Carlo cannot directly access the metadata catalog, when native collection is not supported for certain artifacts, or when you already have the data available in your own systems and want full control over what is shared.
For IOMETE, the push model handles metadata and query logs, while a native Spark connection handles SQL monitors.
How does lineage work with IOMETE?
Lineage is automatically inferred from the query logs you push. Monte Carlo's SQL parser identifies source and destination tables in each query, building table-level and field-level lineage graphs. You can also push lineage directly via the POST /ingest/v1/lineage endpoint if you have authoritative lineage data available (see Push Ingest API).
The more complete your query log coverage, the more comprehensive your lineage will be. Note that column lineage pushed via the API expires after 10 days and must be re-pushed to persist.
What Spark SQL syntax is supported?
IOMETE uses Apache Spark SQL, which follows a two-part naming convention: database.table (e.g., my_database.my_table). The default catalog (spark_catalog) is implicit and should not be included in queries.
Can I use an agent instead of the cloud connection?
Yes. If your IOMETE instance runs in a private network, you can deploy a Monte Carlo agent in the same network. The agent handles the Spark Thrift connection for SQL monitors, and you continue to push metadata and query logs via the SDK.
Are there any known limitations?
- Query performance monitoring is not supported.
- Metadata and query logs must be pushed externally — Monte Carlo does not pull them from IOMETE directly.
Updated about 2 hours ago
