Overview

Modern data architectures rely heavily on object storage—such as Amazon S3, Azure Blob Storage, or Google Cloud Storage—to store raw, semi-structured, or historical data. These file-based data lakes serve as the foundation for analytics, machine learning, and operational reporting, providing a cost-effective and flexible storage layer for massive volumes of data. While object storage is ideal for durability and scale, it can be challenging to monitor: file-level changes, schema evolution, and data quality issues are often hidden until they impact downstream analytics.

To provide visibility into file-based data lakes, Monte Carlo can monitor external tables that represent files in object storage. By mapping raw files into structured external tables, Monte Carlo can track schema changes, volume, freshness, and quality metrics—giving teams actionable observability over their file storage.

When you’re ready to get started, refer to the examples and guides linked at the bottom of this page. You can also access them from the sidebar.

👍
Does Monte Carlo require direct connectivity to object storage?
No. Monte Carlo monitors files through external tables in your cloud data warehouse. Those external tables are the structured “mapping” that points to files in object storage. This eliminates the need to maintain metadata about files on object storage in multiple systems and ensures consistency between your production pipelines and your data observability checks.

👍
What if I don’t use external tables for object storage files?
That’s okay! This guide includes examples you can reference to set up external tables in various warehouse and lake environments. If needed, you can easily create a cost efficient lake dedicated for this purpose, and scale it appropriately.

Feature Support

Category	Capability	Support
Table Monitor	Freshness (via opt-in volume monitor)	✅
Table Monitor	Volume (opt-in)	✅
Table Monitor	Schema Changes	✅
Table Monitor	JSON Schema Changes	✅*
Metric Monitor	Metric	✅
Metric Monitor	Comparison	✅
Validation Monitor	Custom SQL	✅
Validation Monitor	Validation	✅
Job Monitor	Query performance	❌
Lineage	Lineage	❌

*JSON Schema monitors are only supported in our AWS Redshift, Snowflake and GCP BigQuery integrations.

Object Storage Support

Depending on the cloud object storage provider you are using there are multiple cloud warehouses or lakes you can use to monitor your files. Monte Carlo is able to integrate with all of these cloud warehouses.

	Databricks	Snowflake External Tables	AWS Redshift Spectrum External Tables	Azure Synapse (Dedicated SQL Pool) External Tables	GCP BigQuery BigLake External Tables	GCP BigQuery Non-BigLake External Tables	AWS Glue and Athena
AWS S3	✅	✅	✅	❌	✅	❌	✅
Azure Blob Storage	✅	✅	❌	✅	✅	❌	❌
Azure Data Lake Storage Gen2	✅	✅	❌	✅	✅	❌	❌
Azure General-purpose v2	✅	✅	❌	✅	✅	❌	❌
Google Cloud Storage	✅	✅	❌	❌	✅	✅	❌

File Type Support

Each cloud warehouse supports different file formats when creating external tables. Refer to the table below to see which file formats are supported as an external table source for each warehouse.

File format support is determined by the warehouse provider (for example, Snowflake, Redshift, Databricks, or BigQuery). Monte Carlo monitors the data exposed through these external tables, but does not control which file formats are supported. If you need support for additional file types, contact your warehouse provider.

File Type	Databricks	Snowflake	AWS Redshift	Azure Synapse (Dedicated SQL Pool)	GCP BigQuery BigLake	GCP BigQuery Non-BigLake	AWS Glue
Delta	✅	✅	✅	❌	✅	❌	✅
Iceberg	✅	✅	✅	❌	✅	❌	✅
CSV	✅	✅	✅	✅ (Hadoop only)	✅	✅	✅
JSON	✅	✅	✅	✅ (Hadoop only)	✅	✅	✅
Parquet	✅	✅	✅	✅ (Hadoop and Native)	✅	✅	✅
Avro	✅	✅	✅	❌	✅	✅	✅
ORC	✅	✅	✅	✅ (Hadoop only)	✅	✅	✅
XML	❌	❌	❌	❌	❌	❌	✅
Ion	❌	❌	❌	❌	❌	❌	✅
grokLog	❌	❌	❌	❌	❌	❌	✅
Hive RCFile (Record Columnar File)	❌	❌	❌	✅ (Hadoop only)	❌	❌	❌
RCFile (Record Columnar File)	❌	❌	❌	✅ (Hadoop only)	❌	❌	❌

FAQs

Does object storage monitoring mean Monte Carlo is connecting directly to the file storage?

No. Monte Carlo monitors files through external tables in your cloud data warehouse. Those external tables are the structured “mapping” that points to files in object storage. This eliminates the need to maintain metadata about files on object storage in multiple systems and ensures consistency between your production pipelines and your data observability checks.

What are external tables and why would I use them for monitoring?

External tables let you query data stored in object storage (such as S3, ADLS, or GCS) without loading it into a warehouse. They provide a consistent, structured schema layer over raw files—making it possible to monitor schema changes, freshness, volume, and other data quality signals over time. They represent a convenient way to store and manage metadata about where particular tables are located, what formats are used, what schema they should have, etc.

Do I need to ingest or copy my data into to allow monitoring?

No. All monitoring is performed against metadata exposed through external tables. Your data remains in your object storage, and the warehouse simply provides a query interface.

Do I need to already have one of these cloud warehouses deployed to get started?

Not necessarily. If you don’t already use a cloud warehouse, you can set one up specifically for external table monitoring, if you’d like, and scale it appropriately.

Is there extra cost associated with monitoring external tables?

No. In Monte Carlo, monitoring an external table is priced the same as monitoring a permanent table—there’s no separate external-table fee. However, your cloud warehouse may incur compute and query costs to create and query external tables. Refer to your warehouse vendor’s pricing documentation for details.