Data Collection: Details per Integration
See below for information regarding where & how Monte Carlo collects data per integration
Data Warehouses
Integration | Metadata | Query Logs | Freshness | Volume |
---|---|---|---|---|
Redshift | Collected every hour from SVV views | Collected every 10 minutes from STL views | Calculated from query logs, based on queries that are deemed to update tables | Taken from metadata information every hour |
Snowflake | Collected every hour from information schema views | Collected every hour from the QUERY_HISTORY view | Taken from metadata information every hour | Taken from metadata information every hour |
BigQuery | Collected every hour from a combination of metadata views and the BigQuery API | Collected every hour from the BigQuery API | Taken from metadata information every hour | Taken from metadata information every hour |
Databricks | Collected every hour from metastore | Collected every hour from the query history system table | Taken from metadata information every hour | Row-volume with metadata every hour |
Teradata | Collected every hour from DBC Views | Collected every 15 minutes from DBC Views | Calculated from query logs, based on queries that are deemed to update tables | Taken from metadata information every hour |
Data Lakes on s3 | Collected every hour from metastore (Glue/Hive) | Collected every hour from Hive logs (on s3), Presto logs (on s3), or Athena logs | Taken from metadata information every hour | Taken from metadata information every hour |
Azure Synapse | Collected every hour from SYS tables | not supported | not supported | Taken from metadata information every hour |
MotherDuck | Collected every 12 hours from information schema | not supported | not supported | not supported |
Dremio | Collected every 12 hours from information schema | not supported | not supported | not supported |
Transactional Databases
Integration | Metadata | Freshness | Volume |
---|---|---|---|
SQL Server | Collected every hour from SYS tables | Taken from metadata information every hour | Taken from metadata information every hour |
SAP HANA | Collected every hour from SYS tables | Taken from metadata information every hour | Taken from metadata information every hour |
Azure SQL Database | Collected every hour from SYS tables | not supported | Taken from metadata information every hour |
MySQL | Collected every 12 hours from information schema | not supported | not supported |
Oracle DB | Collected every 12 hours from ALL_ALL_TABLES | not supported | not supported |
Postgres | Collected every 12 hours from information_schema | not supported | not supported |
Vector Databases
Integration | Metadata |
---|---|
Pinecone | Collected every hour from Pinecone API |
Orchestration & Transformation
Integration | Metadata |
---|---|
dbt Cloud | Collected every hour from dbt cloud API |
dbt Core | No set interval - metadata is pushed to Monte Carlo when the CLI command is run |
Airflow | No set interval - metadata is pushed to Monte Carlo when DAGs run |
Fivetran | Collected every hour from the Fivetran API |
Informatica | Collected every 12 hours from the Informatica API |
Kafka Cluster | Collected every hour from the Confluent Cloud API |
Kafka Connect Cluster | Collected every hour from the Kafka Connect API |
Prefect | No set interval - metadata is pushed to Monte Carlo when a Prefect flow is run |
Azure Data Factory | Collected every hour from the Azure API |
Business Intelligence
Integration | Metadata |
---|---|
Tableau | Collected every 12 hours from the Tableau Metadata API |
Looker (git) | Collected every 12 hours from cloud hosted repos |
Looker (API) | Collected every 4 days from the Looker API. Note: The 4 day collection interval is due to Looker API limits |
PowerBI | Collected every 12 hours from the PowerBI API |
Mode BI | Metadata is extracted from the query logs of your data warehouse/lake integration |
Periscope/Sisense | Metadata is extracted from the query logs of your data warehouse/lake integration |
Sigma | Metadata is extracted from the query logs of your data warehouse/lake integration |
Updated about 2 months ago