Data Collection: Details per Integration

See below for information regarding where & how Monte Carlo collects data per integration

Data Warehouses

IntegrationMetadataQuery LogsFreshnessVolume
RedshiftCollected every hour from SVV viewsCollected every 10 minutes from STL viewsCalculated from query logs, based on queries that are deemed to update tablesTaken from metadata information every hour
SnowflakeCollected every hour from information schema viewsCollected every hour from the QUERY_HISTORY viewTaken from metadata information every hourTaken from metadata information every hour
BigQueryCollected every hour from a combination of metadata views and the BigQuery APICollected every hour from the BigQuery APITaken from metadata information every hourTaken from metadata information every hour
DatabricksCollected every hour from metastoreCollected every hour from the query history system tableTaken from metadata information every hourRow-volume with metadata every hour
TeradataCollected every hour from DBC ViewsCollected every 15 minutes from DBC ViewsCalculated from query logs, based on queries that are deemed to update tablesTaken from metadata information every hour
Data Lakes on s3Collected every hour from metastore (Glue/Hive)Collected every hour from Hive logs (on s3), Presto logs (on s3), or Athena logsTaken from metadata information every hourTaken from metadata information every hour
Azure SynapseCollected every hour from SYS tablesnot supportednot supportedTaken from metadata information every hour
MotherDuckCollected every 12 hours from information schemanot supportednot supportednot supported
DremioCollected every 12 hours from information schemanot supportednot supportednot supported

Transactional Databases

IntegrationMetadataFreshnessVolume
SQL ServerCollected every hour from SYS tablesTaken from metadata information every hourTaken from metadata information every hour
SAP HANACollected every hour from SYS tablesTaken from metadata information every hourTaken from metadata information every hour
Azure SQL DatabaseCollected every hour from SYS tablesnot supportedTaken from metadata information every hour
MySQLCollected every 12 hours from information schemanot supportednot supported
Oracle DBCollected every 12 hours from ALL_ALL_TABLESnot supportednot supported
PostgresCollected every 12 hours from information_schemanot supportednot supported

Vector Databases

IntegrationMetadata
PineconeCollected every hour from Pinecone API

Orchestration & Transformation

IntegrationMetadata
dbt CloudCollected every hour from dbt cloud API
dbt CoreNo set interval - metadata is pushed to Monte Carlo when the CLI command is run
AirflowNo set interval - metadata is pushed to Monte Carlo when DAGs run
FivetranCollected every hour from the Fivetran API
InformaticaCollected every 12 hours from the Informatica API
Kafka ClusterCollected every hour from the Confluent Cloud API
Kafka Connect ClusterCollected every hour from the Kafka Connect API
PrefectNo set interval - metadata is pushed to Monte Carlo when a Prefect flow is run
Azure Data FactoryCollected every hour from the Azure API

Business Intelligence

IntegrationMetadata
TableauCollected every 12 hours from the Tableau Metadata API
Looker (git)Collected every 12 hours from cloud hosted repos
Looker (API)Collected every 4 days from the Looker API. Note: The 4 day collection interval is due to Looker API limits
PowerBICollected every 12 hours from the PowerBI API
Mode BIMetadata is extracted from the query logs of your data warehouse/lake integration
Periscope/SisenseMetadata is extracted from the query logs of your data warehouse/lake integration
SigmaMetadata is extracted from the query logs of your data warehouse/lake integration