MSK (Beta)

If you are using Amazon Managed Streaming for Apache Kafka (MSK), Monte Carlo can collect metadata to construct lineage between tables populated via Kafka Connect.

📘

CLI

During the private beta, MSK integrations can only be setup using our CLI. Please see this guide for installation instructions.

Create Streaming System

You will first need to create a "streaming system" in Monte Carlo for your MSK cluster(s).

Use the CLI to create a streaming system:

% montecarlo integrations add-streaming-system --streaming-system-type msk --streaming-system-name MSK
Successfully created the MSK streaming system MSK Kafka with uuid: b1b7a853-3b60-43c3-b798-722583aee686.
  • --streaming-system-type should be msk
  • --streaming-system-name should be a descriptive name used to identify the streaming system

Make note of the id generated for the new streaming system, you will need it to add MSK Kafka and Connect integrations to the system.

MSK Kafka

Monte Carlo relies on a REST proxy to collect topic metadata from your MSK clusters. We currently support the Confluent REST Proxy.

Please let your Monte Carlo representative know, or contact [email protected], if you would prefer to use another REST proxy.

Use the CLI to create a MSK Kafka integration:

% montecarlo integrations add-streaming-cluster-connection --connection-type msk-kafka --new-cluster-id ji1g8x0vRAmYtUwKc5XyWg --new-cluster-name "MSK Kafka" --url http://rest-proxy:8084 --auth-type NO_AUTH --streaming-system-id b1b7a853-3b60-43c3-b798-722583aee686
  • --connection-type should be msk-kafka
  • --new-cluster-id should be the Kafka broker cluster identifier from the REST proxy
  • --new-cluster-name should be a descriptive name representing the Kafka broker cluster
  • --url should be the base URL of the REST proxy, this URL should be accessible from your data collector or agent
  • --auth-typeshould identify the authorization used for the REST proxy, available options are NO_AUTH, BASIC, or BEARER(the latter two will require you to provide --auth-token)
  • --streaming-system-id should be the id of the streaming system (the id that was generated above)

MSK Connect

Monte Carlo uses AWS APIs to collect connector metadata from MSK. We will need an IAM role that can be assumed by an AWS agent that includes permissions to execute these read-only APIs in the AWS account that is hosting your MSK clusters.

Create IAM role

Use the CLI to generate (and review) the required IAM policy statements:

% montecarlo discovery msk-policy-gen --resource-aws-profile prod
  • --resource-aws-profile should be an AWS profile for the account that is hosting your MSK clusters

You can then create an IAM role that is assumable by the AWS agent that includes these permissions.

Create MSK Connect Integration

Use the CLI to create a MSK Connect integration:

% montecarlo integrations add-streaming-cluster-connection --connection-type msk-kafka-connect --new-cluster-id connect --new-cluster-name "MSK Connect" --cluster-arn arn:aws:kafka:us-east-1:1234567890:cluster/msk-cluster/5d0bddae-a77f-48de-be47-d79145c9dfec-20 --iam-role-arn arn:aws:iam::1234567890:role/mc-msk-integration --external-id 78654d9d-1c7f-4cfd-abaf-503d5de7bcec --streaming-system-id b1b7a853-3b60-43c3-b798-722583aee686
  • --connection-type should be msk-kafka-connect
  • --new-cluster-idis intended to be an internal identifier for the Kafka Connect cluster, but in the case of MSK doesn't really matter (we may remove this as a required input in the future)
  • --new-cluster-nameshould be a descriptive name representing the Kafka Connect cluster
  • --cluster-arn should be the ARN of the MSK cluster that connectors will be connecting to
  • --iam-role-arn should be the ARN of the IAM role to be used by the AWS agent
  • --external-id should be the external id that allows the AWS agent to assume the IAM role
  • --streaming-system-id should be the id of the streaming system