MSK (public preview)
If you are using Amazon Managed Streaming for Apache Kafka (MSK), Monte Carlo can collect metadata to construct lineage between tables populated via Kafka Connect.
CLI
During the public preview, MSK integrations can only be setup using our CLI. Please see this guide for installation instructions.
Create Streaming System
You will first need to create a "streaming system" in Monte Carlo for your MSK cluster(s).
Use the CLI to create a streaming system:
% montecarlo integrations add-streaming-system --streaming-system-type msk --streaming-system-name MSK
Successfully created the MSK streaming system MSK Kafka with uuid: b1b7a853-3b60-43c3-b798-722583aee686.
--streaming-system-type
should be msk--streaming-system-name
should be a descriptive name used to identify the streaming system
Make note of the id generated for the new streaming system, you will need it to add MSK Kafka and Connect integrations to the system.
MSK Kafka
Monte Carlo relies on a REST proxy to collect topic metadata from your MSK clusters. We currently support the Confluent REST Proxy.
Please let your Monte Carlo representative know, or contact [email protected], if you would prefer to use another REST proxy.
Use the CLI to create a MSK Kafka integration:
% montecarlo integrations add-streaming-cluster-connection --connection-type msk-kafka --new-cluster-id ji1g8x0vRAmYtUwKc5XyWg --new-cluster-name "MSK Kafka" --url http://rest-proxy:8084 --auth-type NO_AUTH --streaming-system-id b1b7a853-3b60-43c3-b798-722583aee686
--connection-type
should be msk-kafka--new-cluster-id
should be the Kafka broker cluster identifier from the REST proxy--new-cluster-name
should be a descriptive name representing the Kafka broker cluster--url
should be the base URL of the REST proxy, this URL should be accessible from your data collector or agent--auth-type
should identify the authorization used for the REST proxy, available options areNO_AUTH
,BASIC
, orBEARER
(the latter two will require you to provide--auth-token
)--streaming-system-id
should be the id of the streaming system (the id that was generated above)
MSK Connect
Monte Carlo uses AWS APIs to collect connector metadata from MSK. We will need an IAM role that can be assumed by an AWS agent that includes permissions to execute these read-only APIs in the AWS account that is hosting your MSK clusters.
Create IAM role
Use the CLI to generate (and review) the required IAM policy statements:
% montecarlo discovery msk-policy-gen --resource-aws-profile prod
--resource-aws-profile
should be an AWS profile for the account that is hosting your MSK clusters
You can then create an IAM role that is assumable by the AWS agent that includes these permissions.
Create MSK Connect Integration
Use the CLI to create a MSK Connect integration:
% montecarlo integrations add-streaming-cluster-connection --connection-type msk-kafka-connect --new-cluster-id connect --new-cluster-name "MSK Connect" --cluster-arn arn:aws:kafka:us-east-1:1234567890:cluster/msk-cluster/5d0bddae-a77f-48de-be47-d79145c9dfec-20 --iam-role-arn arn:aws:iam::1234567890:role/mc-msk-integration --external-id 78654d9d-1c7f-4cfd-abaf-503d5de7bcec --streaming-system-id b1b7a853-3b60-43c3-b798-722583aee686
--connection-type
should be msk-kafka-connect--new-cluster-id
is intended to be an internal identifier for the Kafka Connect cluster, but in the case of MSK doesn't really matter (we may remove this as a required input in the future)--new-cluster-name
should be a descriptive name representing the Kafka Connect cluster--cluster-arn
should be the ARN of the MSK cluster that connectors will be connecting to--iam-role-arn
should be the ARN of the IAM role to be used by the AWS agent--external-id
should be the external id that allows the AWS agent to assume the IAM role--streaming-system-id
should be the id of the streaming system
Updated about 1 month ago