Example: IOMETE

Overview

This guide explains how to set up an IOMETE integration with Monte Carlo.

IOMETE is a data lakehouse platform built on Apache Spark and Apache Iceberg. It provides a unified environment for data engineering, analytics, and AI workloads, with support for SQL querying via a Spark-compatible Thrift server.

Monte Carlo's IOMETE integration uses a hybrid approach. You push metadata and query logs to Monte Carlo via the Push Ingest API, giving you full control over what observability data is shared. SQL monitors (custom SQL, field health, etc.) run directly against IOMETE through a native Spark Thrift connection. Lineage is automatically inferred from the query logs you push. No persistent catalog access is required. Specifically:

  • Metadata is pushed to Monte Carlo via the Push Ingest API — table schemas, columns, row counts, freshness timestamps
  • Query logs are pushed via the Push Ingest API — SQL query history with timing and row counts
  • Lineage is automatically inferred from pushed query logs — Monte Carlo parses the SQL to identify source and destination tables. You can also push Lineage if you prefer!
  • SQL monitors (custom SQL, field health, etc.) run directly against IOMETE through a native Spark Thrift connection

Feature Support

CategoryMonitor / Lineage CapabilitiesSupport
Table MonitorFreshness (via opt-in volume monitor)
Table MonitorVolume (opt-in)
Table MonitorSchema Changes
Table MonitorJSON Schema Changes
Metric MonitorMetric
Metric MonitorComparison
Validation MonitorCustom SQL
Validation MonitorValidation
Job MonitorQuery performance
LineageLineage✅*

*Lineage is inferred automatically from pushed query logs. Monte Carlo parses the SQL statements to identify source and destination tables. You can also push Lineage directly!

More information on monitors in Monte Carlo.

Prerequisites

Before setting up the IOMETE integration, ensure you have:

  • A Monte Carlo account with permissions to add integrations
  • A Monte Carlo Ingestion Key (scope: "Ingestion") for pushing metadata and query logs — this is separate from your standard API key. See Push Ingest API for how to create one.
  • Access to the IOMETE Spark Thrift server (for SQL monitors)
  • Network connectivity between Monte Carlo's service (or agent) and the IOMETE Thrift endpoint
  • Monte Carlo SDK (pycarlo v0.12.251+) installed for pushing metadata and query logs

Permissions

Monte Carlo requires:

  • For SQL monitors: A Spark user with read access to the databases and tables you want to monitor via the Thrift server
  • For metadata and query log push: A Monte Carlo Ingestion Key (not a standard API key). Create one via the createIntegrationKey GraphQL mutation or CLI — see below. You will also need the warehouse UUID returned when creating the integration.

Notes / Recommendations

  • We recommend creating a dedicated service account in IOMETE for Monte Carlo rather than using personal credentials.
  • If deploying behind a firewall or private network, ensure Monte Carlo has network access to the IOMETE Thrift server endpoint. See IP Allowlisting for the IP addresses to allowlist for your deployment. You may also prefer to use a collection agent. Learn more about the deployment options here.

Installation

Setting up IOMETE with Monte Carlo involves three parts:

  1. Create the integration using the createOrUpdateCustomIntegration GraphQL mutation
  2. Push metadata using the Monte Carlo SDK
  3. Push query logs using the Monte Carlo SDK
👍

AI-Assisted Setup

Monte Carlo provides AI skills that can help you build and run the collection scripts for pushing metadata, query logs, and lineage. The push ingestion skill generates warehouse-specific collection scripts, pushes data to Monte Carlo, and validates the results — all from your editor.

Install the mc-agent-toolkit plugin and use commands like /mc-build-metadata-collector and /mc-build-lineage-collector to automate the push workflow. See the push ingestion skill documentation for details.

👍

How Do I use the API?

Visit the API Explorer in the Monte Carlo UI (learn more about the API Explorer here).

Alternatively, you can generate an API key and use tools such as cURL or Postman to make API calls.

Step 1: Create the Custom Integration

IOMETE integrations use the custom integration framework (CUSTOM_INTEGRATION warehouse type). This creates a warehouse where each capability (metadata, query logs, lineage, monitors) can be independently configured as "collect" (via a native connection), "reuse" (share another capability's connection), or left unconfigured (push via API).

For IOMETE, only monitors use a collect connection (Spark); everything else is pushed.

1. Test and save Spark credentials

First, test the connection to your IOMETE Spark Thrift server and save the credentials using testSparkCredentials. IOMETE uses HTTP mode for its Thrift server:

mutation testSparkCredentials{
    testSparkCredentials(
        httpMode: {
          url: "http://iomete-lakehouse.example.com:10000/cliservice"
      		username: "monte_carlo_service"
      		password: "<your-service-account-password>"
        }
        connectionOptions: {
          dcId: "<your-deployment-uuid>"
          skipValidation: false
          skipPermissionTests: false
      }
    ) {
        key
        success
    }
}

The url should point to your IOMETE Thrift server's HTTP endpoint (default port 10000, path cliservice). For example: http://localhost:10000/cliservice.

If the test succeeds, the response returns a key — this is a temporary credentials key you will use in the next step.

📘

Binary Mode

If your IOMETE Thrift server uses binary transport instead of HTTP, use binaryMode with host, port, username, password, and database fields instead of httpMode.

2. Create the custom integration

Use the credentials key from the previous step to create the integration using createOrUpdateCustomIntegration:

mutation {
  createOrUpdateCustomIntegration(
    name: "IOMETE Production"
    monitors: {
      mode: COLLECT
      connectionType: "spark"
      credentialsKey: "<key-from-testSparkCredentials>"
      deploymentId: "<data-collector-uuid>"
    }
  ) {
    result {
        warehouseUuid
        connections {
            capability
            connection {
                uuid
                type
                jobTypes
                name
            }
        }
    }
  }
}

In this configuration:

  • metadata: Not specified (pushed via API)
  • queryLogs: Not specified (pushed via API)
  • lineage: Automatically inferred from pushed query logs
  • monitors: Uses the Spark Thrift connection for running SQL queries

Save the warehouseUuid from the response — you will need it as the warehouse UUID for pushing metadata and query logs.

Create an Ingestion Key

Push API calls require an Ingestion Key, which is different from a standard API key. Create one using the GraphQL API or CLI using createIntegrationKey:

GraphQL:

mutation {
  createIntegrationKey(
    description: "IOMETE push ingestion key"
    scope: Ingestion
    warehouseIds: ["<warehouse-uuid-from-step-1>"]
  ) {
    key { id secret }
  }
}

CLI:

montecarlo integrations create-key \
  --scope Ingestion \
  --description "IOMETE push ingestion key"
⚠️

Save Credentials Immediately

The key secret is shown only once at creation time. Save both the ID and secret to a secure secrets manager before proceeding.

Set the credentials as environment variables for the push scripts:

export MCD_INGEST_ID=<your-ingestion-key-id>
export MCD_INGEST_TOKEN=<your-ingestion-key-secret>

For more details, see the Push Ingest API documentation.

Step 2: Push Metadata

Metadata tells Monte Carlo about your IOMETE tables — their schemas, columns, row counts, and freshness timestamps. You push metadata to the POST /ingest/v1/metadata endpoint using either the pycarlo SDK or direct HTTP calls.

For full details on the metadata push API, payload format, and authentication, see the Push Ingest API documentation.

Push metadata using the SDK

from datetime import datetime, timezone

from pycarlo.core import Client, Session
from pycarlo.features.ingestion import IngestionService
from pycarlo.features.ingestion.models import (
    AssetField,
    AssetFreshness,
    AssetMetadata,
    AssetVolume,
    RelationalAsset,
)

client = Client(
    session=Session(
        mcd_id="<ingestion-key-id>",
        mcd_token="<ingestion-key-secret>",
        scope="Ingestion",
    )
)
service = IngestionService(mc_client=client)

events = [
    RelationalAsset(
        type="TABLE",
        metadata=AssetMetadata(
            name="my_table",
            database="my_database",
            schema="default",
        ),
        fields=[
            AssetField(name="id", type="INTEGER"),
            AssetField(name="name", type="VARCHAR(255)"),
            AssetField(name="created_at", type="TIMESTAMP"),
        ],
        volume=AssetVolume(row_count=100_000),
        freshness=AssetFreshness(
            last_update_time=datetime.now(timezone.utc).isoformat(),
        ),
    ),
]

result = service.send_metadata(
    resource_uuid="<warehouse-uuid-from-step-1>",
    resource_type="spark",
    events=events,
)
invocation_id = service.extract_invocation_id(result)
print(f"Invocation ID: {invocation_id}")

Repeat this for each table in your IOMETE catalog. You can push multiple tables in a single call by adding more RelationalAsset entries to the events list.

What to push

Connect to IOMETE's Spark Thrift server, query the Spark catalog metadata (databases, tables, columns), and push the results to Monte Carlo. Each metadata push should include:

  • Database and table names (using the two-part database.table naming convention)
  • Column definitions (name, type, description)
  • Row counts and byte counts (for volume monitoring)
  • Freshness timestamps (for freshness monitoring)

The warehouse UUID from Step 1 identifies which integration receives the metadata.

❗️

Scheduling

We recommend scheduling metadata pushes on a recurring basis (e.g., hourly or daily) to keep Monte Carlo's catalog up to date with changes in your IOMETE environment.

Push freshness and volume data at least once per hour for reliable anomaly detection.

⚠️

Anomaly Detection Training

Freshness detectors require approximately 7 samples with changed timestamps (~2 weeks) before they activate. Volume detectors require 10–48 samples (~42 days). Plan accordingly when first setting up the integration.

Step 3: Push Query Logs

Query logs enable lineage inference in Monte Carlo. When you push SQL query history, Monte Carlo parses the SQL statements to identify source and destination tables, building table-level and field-level lineage automatically.

Push query logs to the POST /ingest/v1/querylogs endpoint. For full details, see the Push Ingest API documentation.

from datetime import datetime, timezone
from pycarlo.features.ingestion import IngestionService
from pycarlo.features.ingestion.models import QueryLogEntry

logs = [
    QueryLogEntry(
        query_text="SELECT * FROM my_database.my_table",
        start_time=datetime(2024, 1, 1, tzinfo=timezone.utc),
        end_time=datetime(2024, 1, 1, 0, 0, 1, 500000, tzinfo=timezone.utc),  # 1.5s later
        returned_rows=100,
    )
]

service = IngestionService(mc_client=client)  # client must have scope set
service.send_query_logs(
    resource_uuid="<data-collector-uuid>",
    log_type="databricks-metastore-sql-warehouse",
    events=logs,
)
invocation_id = service.extract_invocation_id(result)
print(f"Invocation ID: {invocation_id}")

Each query log entry includes the SQL statement, start time, end time, and optional row count. The API returns an invocation_id you can use to track processing.

📘

Log Type

Use databricks-metastore-sql-warehouse as the log_type for IOMETE query logs. This routes the logs through the Spark SQL-compatible normalizer, which correctly parses the SQL and extracts lineage.

📘

Batching

For large volumes of query logs, batch your pushes into groups of approximately 500 events. Compressed request bodies must not exceed 1 MB. Query log processing typically completes within 1 hour.

Step 4: Configure Monitors (Optional)

Once metadata is pushed and the Spark connection is active, you can configure monitors on your IOMETE tables:

  1. Navigate to the table you want to monitor in Monte Carlo
  2. Click Monitors
  3. Click Enable to set up freshness and volume monitoring
  4. Configure additional monitors (Custom SQL, Field Health, etc.) as needed

Freshness and volume monitoring for IOMETE requires opt-in SQL monitors, similar to other Spark-based integrations.

For detailed instructions, see SQL Rules documentation.

Connection Details

FieldDescriptionExample
HostIOMETE Spark Thrift server hostnameiomete-lakehouse.example.com
PortThrift server port10000
UsernameSpark user with read accessmonte_carlo_service
PasswordPassword for the Spark user<your-service-account-password>
HTTP PathThrift HTTP transport pathcliservice

FAQs

What is the push model?

In most integrations, Monte Carlo pulls metadata on a schedule using a data collector or agent. The push model inverts this: you send metadata, query logs, and lineage data directly to Monte Carlo via the Push Ingest API.

The push model exists to cover gaps where the pull model cannot operate — for example, when Monte Carlo cannot directly access the metadata catalog, when native collection is not supported for certain artifacts, or when you already have the data available in your own systems and want full control over what is shared.

For IOMETE, the push model handles metadata and query logs, while a native Spark connection handles SQL monitors.

How does lineage work with IOMETE?

Lineage is automatically inferred from the query logs you push. Monte Carlo's SQL parser identifies source and destination tables in each query, building table-level and field-level lineage graphs. You can also push lineage directly via the POST /ingest/v1/lineage endpoint if you have authoritative lineage data available (see Push Ingest API).

The more complete your query log coverage, the more comprehensive your lineage will be. Note that column lineage pushed via the API expires after 10 days and must be re-pushed to persist.

What Spark SQL syntax is supported?

IOMETE uses Apache Spark SQL, which follows a two-part naming convention: database.table (e.g., my_database.my_table). The default catalog (spark_catalog) is implicit and should not be included in queries.

Can I use an agent instead of the cloud connection?

Yes. If your IOMETE instance runs in a private network, you can deploy a Monte Carlo agent in the same network. The agent handles the Spark Thrift connection for SQL monitors, and you continue to push metadata and query logs via the SDK.

Are there any known limitations?

  • Query performance monitoring is not supported.
  • Metadata and query logs must be pushed externally — Monte Carlo does not pull them from IOMETE directly.