Monte Carlo uses a data collector to connect to data warehouses, data lakes and BI tools in order to extract metadata, logs and statistics. This section outlines the architecture and key components of the data collector.
Monte Carlo has designed its service such that:
- Data remains in your environment and never leaves to Monte Carlo's cloud. In particular, no individual data records or PII are ever exposed to Monte Carlo.
- Connectivity to warehouses, lakes and BI tools is never exposed externally to Monte Carlo's environment.
To accomplish these goals, Monte Carlo uses the following architecture, allowing customers to install a data collector in their own AWS environment:
- The data collector is deployed using a CloudFormation template within your AWS account. To have the data collector deployed for you in a Monte Carlo-operated AWS account, please contact your representative.
- An AWS admin is typically required to create the CloudFormation stack for the collector via the AWS console. Terraform deployments are supported through the aws_cloudformation_stack resource.
- There are select regions the data collector must be deployed in. The full list is available in the UI when setting up the data collector. If there is a region you would like supported that is not listed, please let us know!
- A VPC along with public/private subnets and other networking components. This VPC contains all components of the data collector. All outbound communication is routed through a single Elastic IP.
- An API gateway that accepts API calls from Monte Carlo's cloud for configuration and management purposes. API calls are made over a private connection between Monte Carlo’s VPC and the collector's VPC. The gateway is configured to only accept calls coming from Monte Carlo’s environment.
- A lambda function to handle API calls.
- A lambda function that executes distributed collection jobs. It connects to data warehouses, data lakes and BI tools to collect metadata, logs and metrics. This information is streamed back to Monte Carlo’s cloud via a secure Kinesis stream (using HTTPS calls).
- An S3 bucket that contains configuration and any other data required during processing.
- SQS queues to handle events in data lake environments.
- A cross-account IAM role to allow Monte Carlo to occasionally upgrade collector code as new versions are released. The role’s permissions are restricted to specific resources in the collector's CloudFormation template, and it cannot access or make changes to other resources in your AWS account.
From cloud to collector. Monte Carlo's cloud service will occasionally make API calls to the data collector in order to configure and control its functionality. API calls are made over a private API endpoint, and are routed through AWS’s private infrastructure. The collector’s API gateway is not exposed to the Internet, and is configured with a resource-based security policy that only allows API requests from Monte Carlo’s cloud VPC. This architecture guarantees that only Monte Carlo’s cloud environment can make API requests to the data collector, and provides the highest level of security.
From collector to cloud. The data collector sends back metadata, logs and metrics to Monte Carlo’s cloud service through a collection of Kinesis streams. The streams are hosted in Monte Carlo’s cloud environment, and the data collector uses a dedicated IAM role to write records via a secure HTTPS connection. The data collector will only send records to streams that are configured by inbound API calls, which are guaranteed to come from Monte Carlo’s cloud service.
The data collector requires little to no operations once deployed. Occasionally, Monte Carlo will release fixes, improvements and other upgrades. Most upgrades only include code changes, and will be performed fully automatically by the Monte Carlo team, using the collector's cross-account IAM role. In the uncommon case infrastructure upgrades are required, the Monte Carlo team will reach out with precise instructions, requiring a quick deployment process by one of your AWS admins.
Updated 4 days ago