Migrating from a Remote Data Collector

👍

How do I know if I currently have a remote Data Collector?"

See this FAQ for details. If the returned type is REMOTE_V1, you have a remote Data Collector.

About

Monte Carlo is upgrading its collection platform architecture and will be phasing out the Data Collector.

This guide focuses on migrating from a remote Data Collector to one of the v2 deployment options.

To learn more about platform migrations, please see the documentation here. The documentation includes definitions and terminology, FAQs, timelines, and instructions for other deployment types.

Migration

The following steps outline how to migrate a remote Data Collector to one of the current deployment options. Please note that this migration should not affect existing monitors, assets, or other configurations. It primarily concerns the connectivity and sampling data storage model and mechanisms.

It's highly recommended that you review all steps before getting started. If you have any questions, please see our FAQs and contact details. We are always happy to assist you!

Steps

  1. Review Options
    Review the current deployment options to determine which best fits your use case.

    The "Customer-hosted Agent & Data Store Deployment" is most similar to the remote Data Collector, offering many advantages, such as:

    • Multiple deployment, region, and cloud options. Now available on AWS, Azure, and GCP with Terraform.
    • A much more lightweight, faster, and digestible package (e.g., reduced from 58 resources to 6 resources for AWS CloudFormation).
    • More transparency: the Agent is publicly accessible in Docker Hub, as well as the code on our public repos.
    • The flexibility to apply additional constraints to deployments if necessary.
    • A log of all agent operations and a changelog for every release.
    • Upgrades and management through the UI and CLI, with more infrequent updates.
    • Improved onboarding (setup) validation with a lot more documentation.
    • And much more... That said, Monte Carlo also recommends taking a look at the new 'Cloud with Customer-hosted Data Store Deployment' option and suggests using PrivateLink to connect with your integrations if they are on a supported cloud and vendor tier.
  2. Contact Monte Carlo
    Contact us at [email protected] to request a Data Collector migration. Be sure to include the following:

    • Your Monte Carlo Account ID (which can be found here).
    • The deployment type you wish to use.
    • Which Data Collector if you have more than one (which can be found here).
    • If you wish to use PrivateLink (and for which integrations with a cloud or cloud w/ data store deployment).

      🕒

      Monte Carlo will then provision resources in your workspace

      Note: Do not proceed until you receive confirmation from Monte Carlo.

      Typically, you will receive a response within 72 hours (US business days).

  3. Optionally, deploy resources
    If you are deploying a customer-hosted agent or data store, please follow the corresponding guide specific to your cloud provider to deploy and register it with Monte Carlo.

    Please note that if you opt to use the agent, the Tableau integration requires using the connected app authentication flow. Additionally, Data Lake Query Logs from S3 Buckets are not supported. Other integrations are supported without any changes, provided connectivity is established first (refer to step 4).

  4. Update networking
    Ensure that your new deployment (whether cloud, agent, etc.) can connect to your existing integrations. This guide is a good starting point for reviewing networking, such as which IPs to allowlist for a cloud deployment.

    Note that even if you are currently on a v1 deployment, you should refer to the v2 deployment material, as that is the deployment type you are migrating to and need to configure.

    Once you have completed the setup, the networking tests are a simple way to help validate that Monte Carlo can still connect to your integrations. It is important to select the correct deployment. When using the "Test Network" button on the UI, agents will be listed under the "Remote Agent" category, while all other deployments, such as cloud or cloud with data store, will be listed under the "Other" category with a corresponding stack ARN. Your legacy Data Collector will also still be listed under this category. Ensure you do not remove connectivity from this until the migration is complete, as jobs will start failing.

    Example of Selecting an Agent Deployment on the Monte Carlo UI

    Example of Selecting an Agent Deployment on the Monte Carlo UI

    Example of Selecting a Cloud Deployment on the Monte Carlo UI

    Example of Selecting a Cloud Deployment on the Monte Carlo UI

  5. Optionally, copy existing sampling data
    If opting for a remote agent or data store, you can choose to non-destructively copy (sync) existing data from the data collector's S3 bucket into the new blob store. We do not support migrating this data from a data collector to a cloud deployment. Either way, this step is not strictly required, but it is necessary if you want to retain sampling data after the migration is complete. Also, note that copying data might reset its TTL.

    For instance, on the AWS platform you can use the aws s3 sync command to achieve this:

    aws s3 sync s3://<OLD_BUCKET_NAME> s3://<NEW_BUCKET_NAME>/mcd
    
    aws s3 sync s3://<OLD_BUCKET_NAME> s3://<NEW_BUCKET_NAME>
    

    Notice that the prefix structure has slightly changed with the agent; all files are now inside a root "directory" called mcd. This is not the case for the data store or data collector.

  6. Contact Monte Carlo
    Contact us at [email protected] to let us know you are ready to continue with the migration. Please be sure to include your Monte Carlo Account ID (which can be found here) and specify the data collector you want to migrate with the destination.

    At this point, we will validate the migration and move jobs to use your new deployment. There will be a brief period during the transition when in-flight jobs might continue to use the Data Collector, while newly scheduled ones will use the new resources.

    🕑

    Note: Do not proceed until you receive confirmation from Monte Carlo.

    Typically, you will receive a response within 72 hours (US business days).

  7. Optionally, copy existing sampling data again
    As there might have been some in-flight jobs or jobs executed between step #5 and step #6, you can choose to repeat step #5 to sync the delta.

  8. Clean up resources
    After waiting for at least one week following Monte Carlo's response in step #6, you can delete your data collector and associated resources. This is because, during the first 72 hours of this migration, Monte Carlo will monitor jobs to confirm connectivity and might need to roll back if there is an issue. The additional time allows for any necessary coordination or changes.

FAQs

See FAQs here.

Resources and support

If you have any questions, please don't hesitate to reach out to us at [email protected]..