To integrate Monte Carlo with data warehouses, lakes and BI tools, you will need to enable network connectivity between Monte Carlo's data collector and your non-public resources.
Most setups will use one of the following methods to establish network connectivity:
- VPC peering [recommended]
- IP filtering
- Data collector deployment into an existing VPC with pre-configured network access to data resources
Many networking configurations will require knowledge of the source IP used by Monte Carlo's data collector. If Monte Carlo is hosting your data collector on its environment, please reach out to your representative to obtain your dedicated source IP.
If you are hosting the data collector in your own AWS account, please follow these steps to identify the collector's source IP address:
- Sign in to the AWS console in the account where the data collector is deployed.
- Go to CloudFormation > Stacks and click on the data collector's stack. The stack will typically be names "monte-carlo".
- Click the "Outputs" tab and identify the key "PublicIP".
See here for CloudFormation templates that can be used to automate this process and help manage resources as code.
If your data resource (e.g. Redshift, Tableau) resides in a VPC on AWS, the easiest and most recommended way to establish connectivity is via VPC peering.
To set up peering, please follow these steps:
- Identify the VPC in which your data collector resources are hosted (see screenshot below), and the VPC in which your data resource is hosted.
- Follow AWS's peering instructions to peer your VPCs. You may need to update your routing tables to enable communication between the two VPCs.
CIDR block overlaps
VPC peering is not possible when the peered VPCs use overlapping CIDR blocks. If this case emerges, you may choose to use a custom CIDR block for your Monte Carlo data collector. See here for details.
- If your data resource is protected by a security group you will need to enable access from the data collector. This can typically be done by retrieving the data collector's security group by searching for AWS::EC2::SecurityGroup in the stack resources and whitelisting it for the appropriate protocol/port in your resource's security group. See here for additional details.
If you govern access to your data resources using IP filtering (e.g. using a firewall, AWS security groups or Snowflake network policies), please add the data collector's source IP address to your whitelist.
If your IP filtering policies specify protocol and port ranges, please make sure to whitelist the protocol and port used by your data resource (e.g. Redshift typically requires TCP over port 5439).
If you plan to place Monte Carlo's data collector resources in the same VPC as your data resource, you may follow the instructions here to do so. With this setup, Monte Carlo's collector stack will not create VPC, subnets and other networking resources, and instead use existing resources in your AWS account. This allows Monte Carlo to use existing network configurations and resources to connect to data warehouses, lakes and BI tools.
Additional networking configuration may be necessary
Even when deploying the data collector in an existing VPC, additional routing table and security group changes may still be necessary to enable connectivity between data collector resources and data resources.
Updated about 2 months ago