Data Collection: Details per Integration

See below for information regarding where & how Monte Carlo collects data per integration

Data Warehouses

Integration	Metadata	Query Logs	Freshness	Volume
Redshift	Collected every hour from SVV views	Collected every 10 minutes from STL views	Calculated from query logs, based on queries that are deemed to update tables	Taken from metadata information every hour
Snowflake	Collected every hour from information schema views	Collected every hour from the QUERY_HISTORY view	Taken from metadata information every hour	Taken from metadata information every hour
BigQuery	Collected every hour from a combination of metadata views and the BigQuery API	Collected every hour from the BigQuery API	Taken from metadata information every hour	Taken from metadata information every hour
Databricks	Collected every hour from metastore	Collected every hour from the query history system table	Taken from metadata information every hour	Row-volume with metadata every hour
Teradata	Collected every hour from DBC Views	Collected every 15 minutes from DBC Views	Calculated from query logs, based on queries that are deemed to update tables	Taken from metadata information every hour
Data Lakes on s3	Collected every hour from metastore (Glue/Hive)	Collected every hour from Hive logs (on s3), Presto logs (on s3), or Athena logs	Taken from metadata information every hour	Taken from metadata information every hour
Azure Synapse	Collected every hour from SYS tables	not supported	not supported	Taken from metadata information every hour
MotherDuck	Collected every 12 hours from information schema	not supported	not supported	not supported
Dremio	Collected every 12 hours from information schema	not supported	not supported	not supported

Transactional Databases

Integration	Metadata	Freshness	Volume
SQL Server	Collected every hour from SYS tables	Taken from metadata information every hour	Taken from metadata information every hour
SAP HANA	Collected every hour from SYS tables	Taken from metadata information every hour	Taken from metadata information every hour
Azure SQL Database	Collected every hour from SYS tables	not supported	Taken from metadata information every hour
MySQL	Collected every 12 hours from information schema	not supported	not supported
Oracle DB	Collected every 12 hours from ALL_ALL_TABLES	not supported	not supported
Postgres	Collected every 12 hours from information_schema	not supported	not supported

Vector Databases

Integration	Metadata
Pinecone	Collected every hour from Pinecone API

Orchestration & Transformation

Integration	Metadata
dbt Cloud	Collected every hour from dbt cloud API
dbt Core	No set interval - metadata is pushed to Monte Carlo when the CLI command is run
Airflow	No set interval - metadata is pushed to Monte Carlo when DAGs run
Fivetran	Collected every hour from the Fivetran API
Informatica	Collected every 12 hours from the Informatica API
Kafka Cluster	Collected every hour from the Confluent Cloud API
Kafka Connect Cluster	Collected every hour from the Kafka Connect API
Prefect	No set interval - metadata is pushed to Monte Carlo when a Prefect flow is run
Azure Data Factory	Collected every hour from the Azure API

Business Intelligence

Integration	Metadata
Tableau	Collected every 12 hours from the Tableau Metadata API
Looker (git)	Collected every 12 hours from cloud hosted repos
Looker (API)	Collected every 4 days from the Looker API. Note: The 4 day collection interval is due to Looker API limits
PowerBI	Collected every 12 hours from the PowerBI API
Mode BI	Metadata is extracted from the query logs of your data warehouse/lake integration
Periscope/Sisense	Metadata is extracted from the query logs of your data warehouse/lake integration
Sigma	Metadata is extracted from the query logs of your data warehouse/lake integration

Updated 4 months ago