Network Connectivity

❗️
Deprecated
As of January 2024, the Monte Carlo Data Collector Deployment Model has been deprecated in favor of the Agent and Object Storage Deployment models. Please see Architecture & Deployment Options for more information.

To integrate Monte Carlo with data warehouses, lakes and BI tools, you will need to enable network connectivity between Monte Carlo's data collector and your non-public resources.

Most setups will use one of the following methods to establish network connectivity:

👍
If integrating with a Redshift cluster our Alpha Network Recommender can help!

Identifying your collector's IP address

Many networking configurations will require knowledge of the source IP used by Monte Carlo's data collector. If Monte Carlo is hosting your data collector on its environment, please reach out to your representative to obtain your dedicated source IP.

If you are hosting the data collector in your own AWS account, please follow these steps to identify the collector's source IP address:

Sign in to the AWS console in the account where the data collector is deployed.
Go to CloudFormation > Stacks and click on the data collector's stack. The stack will typically be names "monte-carlo".
Click the "Outputs" tab and identify the key "PublicIP".

2620 — Finding your data collector's source IP address

IP filtering

If you govern access to your data resources using IP filtering (e.g. using a firewall, AWS security groups or Snowflake network policies), please add the data collector's source IP address to your whitelist.

If your IP filtering policies specify protocol and port ranges, please make sure to whitelist the protocol and port used by your data resource (e.g. Redshift typically requires TCP over port 5439).

VPC peering

👍
Don't want to worry about overlapping CIDR blocks or connecting VPCs?
Consider using PrivateLink instead! This is done by creating Network Load Balancer in your VPC as the service front end and then configuring a VPC endpoint service.
See details on how to setup an endpoint service using CloudFormation here.

See here for CloudFormation templates that can be used to automate setting up peering and to manage resources as code. Otherwise to set up peering manually, please follow these steps:

Identify the VPC in which your data collector resources are hosted (see screenshot below), and the VPC in which your data resource is hosted.
Follow AWS's peering instructions to peer your VPCs. You may need to update your routing tables to enable communication between the two VPCs.

🚧
CIDR block overlaps
VPC peering is not possible when the peered VPCs use overlapping CIDR blocks. If this case emerges, you may choose to use a custom CIDR block for your Monte Carlo data collector. See here for details.

If your data resource is protected by a security group you will need to enable access from the data collector. This can typically be done by retrieving the data collector's security group by searching for AWS::EC2::SecurityGroup in the stack resources and whitelisting it for the appropriate protocol/port in your resource's security group. See here for additional details.

2632 — Finding your data collector's VPC ID