Contents
Short description
Strimzi, the most extensive Kubernetes Kafka operator, provides a step-by-step guide for deploying Kafka Connect on Kubernetes while offering solutions for any problems encountered during deployment. Strimzi uses Apache Kafka and not the Confluent platform, necessitating the addition of Confluent artifacts like Confluent Avro Converter to get a return on it. The article covers basic configuration, authentication, monitoring, plugins and artifacts, and credentials files. Additionally, the article provides solutions for deploying Prometheus and configuring Kafka with SASL/Plaintext engine and scram-sha-512.
we deploy Kafka Connect using the Strimzi operator in Kubernetes
Streamz is practically the most extensive Kubernetes Kafka operator that can be used to deploy Apache Kafka, or other components such as Kafka Connect, Kafka Mirror, etc. In this article, we will walk you through a Kafka Connect Kubernetes deployment step-by-step. And let’s touch on the problems that can be encountered during the deployment procedure and give ways to solve them.
Please note that Strimzi is based on Apache Kafkaand not on the platform Confluent. This is why you most likely need to add some Confluent artifacts, such as Confluent Avro Converterto get a return on it.
The article is based on Strimzi v0.29.0
. This means that you can install the following versions of Kafka Connect:
Note: you can convert the Confluent platform version to the Apache Kafka version and vice versa using the table here.
Installation
Openshift GUI and Kubernetes CLI
If you are using Openshift, go to Operators > Installed Operators > Strimzi > Kafka Connect.
Step by step: You will now be faced with a form containing the Kafka connect configuration. To get the equivalent Yaml form file, click on Yaml View. Any update to the form view is applied to the Yaml view on the fly. Just don’t use it to create an instance. It is necessary to convert the desired configuration to a Yaml file. After receiving the Yaml file, deploy the statement using the kubectl apply command. So, let’s summarize:
-
Enter the configuration in the form.
-
Click View Yaml.
-
Copy its contents to a Yaml file on your local computer (for example, kafka-connect.yaml).
-
Run: kubectl apply-f kafka-connect.yaml.
The Kafka-Connect view is either deployed or updated. Deployed resources consist of deployment and modules, service, configuration maps, and secrets.
Let’s go through the minimum configuration and step by step make it more advanced.
Minimal configuration
To deploy the simplest minimal configuration of Kafka Connect, you can use the Yaml below:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: my-connect-cluster
namespace: <YOUR_PROJECT_NAME>
spec:
config:
config.storage.replication.factor: -1
config.storage.topic: okd4-connect-cluster-configs
group.id: okd4-connect-cluster
offset.storage.replication.factor: -1
offset.storage.topic: okd4-connect-cluster-offsets
status.storage.replication.factor: -1
status.storage.topic: okd4-connect-cluster-status
bootstrapServers: kafka1, kafka2
version: 3.2.0
replicas: 1
The Kafka Connect Rest API on port 8083, opened in the module, is also suitable for this. Provide it on a private or internal network by defining a route to OLD.
REST API authentication
Add authentication to the Kafka Connect REST proxy using the configuration described here. The only thing is that it won’t work with the operator Strimzi. Therefore, to ensure security in Kafka Connect, you have two options:
-
Use the Kafka Connector API operator. The Strimzi operator will help determine the connector type in the YAML file. However, this may not be practical in some cases, as you need to update, stop, and stop connectors via the REST API.
-
Place your insecure REST API behind an authenticated API gateway such as Apache APISIX. Any other tool or program of your own development can be suitable.
JMX Prometheus metrics
To provide JMX Prometheus metrics useful for monitoring connector statuses in Grafana, add the following configuration:
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: jmx-prometheus
name: configs
jmxOptions: {}
It uses a predefined configuration for Prometheus exports. You can use this configuration:
startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
rules:
- pattern : "kafka.connect<type=connect-worker-metrics>([^:]+):"
name: "kafka_connect_connect_worker_metrics_$1"
- pattern : "kafka.connect<type=connect-metrics, client-id=([^:]+)><>([^:]+)"
name: "kafka_connect_connect_metrics_$2"
labels:
client: "$1"
- pattern: "debezium.([^:]+)<type=connector-metrics, context=([^,]+), server=([^,]+), key=([^>]+)><>RowsScanned"
name: "debezium_metrics_RowsScanned"
labels:
plugin: "$1"
name: "$3"
context: "$2"
table: "$4"
- pattern: "debezium.([^:]+)<type=connector-metrics, context=([^,]+), server=([^>]+)>([^:]+)"
name: "debezium_metrics_$4"
labels:
plugin: "$1"
name: "$3"
context: "$2"
Service for external Prometheus
If you plan to deploy Prometheus in conjunction with Strimzi to collect metrics, follow the instructions. But remember that if you use an external Prometheus, the story unfolds differently.
Strimzi operator creates port mapping in the service only for these ports:
Unfortunately, this does not create a mapping for port 9404, the HTTP port of the Prometheus exporter. So, we have to create the service ourselves:
kind: Service
apiVersion: v1
metadata:
name: kafka-connect-jmx-prometheus
namespace: kafka-connect
labels:
app.kubernetes.io/instance: kafka-connect
app.kubernetes.io/managed-by: strimzi-cluster-operator
app.kubernetes.io/name: kafka-connect
app.kubernetes.io/part-of: strimzi-kafka-connect
strimzi.io/cluster: kafka-connect
strimzi.io/kind: KafkaConnect
spec:
ports:
- name: tcp-prometheus
protocol: TCP
port: 9404
targetPort: 9404
type: ClusterIP
selector:
strimzi.io/cluster: kafka-connect
strimzi.io/kind: KafkaConnect
strimzi.io/name: kafka-connect-connect
status:
loadBalancer: {}
Note: this method only works for single-module deployments. You must define a route for the service, and even in the case of a headless service, the route returns one IP module at a time. Therefore, Prometheus cannot clear all subs metrics. This is why it is recommended to use Podmonitor and Prometheus in the cloud.
Plugins and artifacts
There are two ways to add plugins and artifacts.
Operator assembly section
To add plugins, use the build section of the operator. It gets the addresses of plugins or artifacts, loads them during the build phase, as the operator automatically creates the build configuration, and adds them to the image plugins directory.
He supports jar, tgz, zip, and maven
. But in the case of Maven, a multi-stage Dockerfile is created, which is problematic for Openshift, and it crashes at the build stage. So, you should only use other types that don’t need a compile stage, like jar, zip, tgz, and you’ll end up with a single-stage Dockerfile.
For example, to add the Debezium MySQL plugin, you can use the following configuration:
spec:
build:
output:
image: 'kafkaconnect:1.0'
type: imagestream
plugins:
- artifacts:
- type: tgz
url: >-
https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/2.1.4.Final/debezium-connector-mysql-2.1.4.Final-plugin.tar.gz
name: debezium-connector-mysql
Importantly: The Strimzi operator can only download generic artifacts. Therefore, if you want to load a confidential artifact that is not available to Kubernetes, you should abandon this method and go to the next one.
Change images
The operator can use the desired image instead of the default image. So you can add selected artifacts and plugins by building the image manually or using CI/CD. You may want to use this method because Strimzi uses the Apache Kafka image, not the Confluent framework. That is, there are no compatible packages in the deployment, such as Confluent Avro Converter, etc. So you need to add them to your image and configure the operator to use your docker image.
For example, if you want to add your custom Debezium MySQL Connector plugin from the Gitlab Universal and Confluent Avro Converter packages to the base image, first use this Dockerfile:
ARG CONFLUENT_VERSION=7.2.4
# Install confluent avro converter
FROM confluentinc/cp-kafka-connect:${CONFLUENT_VERSION} as cp
# Reassign version
ARG CONFLUENT_VERSION
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-avro-converter:${CONFLUENT_VERSION}
# Copy privious artifacts to the main strimzi kafka image
FROM quay.io/strimzi/kafka:0.29.0-kafka-3.2.0
ARG GITLAB_TOKEN
ARG CI_API_V4_URL=https://gitlab.snapp.ir/api/v4
ARG CI_PROJECT_ID=3873
ARG DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION=1.0
USER root:root
# Copy Confluent packages from previous stage
RUN mkdir -p /opt/kafka/plugins/avro/
COPY --from=cp /usr/share/confluent-hub-components/confluentinc-kafka-connect-avro-converter/lib /opt/kafka/plugins/avro/
# Connector plugin debezium-connector-mysql
RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-connector-mysql' \
&& curl --header "${GITLAB_TOKEN}" -f -L \
--output /opt/kafka/plugins/debezium-connector-mysql.tgz \
${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/debezium-customized/${DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION}/debezium-connector-mysql-customized.tar.gz \
&& 'tar' 'xvfz' '/opt/kafka/plugins/debezium-connector-mysql.tgz' '-C' '/opt/kafka/plugins/debezium-connector-mysql' \
&& 'rm' '-vf' '/opt/kafka/plugins/debezium-connector-mysql.tgz'
USER 1001
Create an image. Push it to imagestream or any other docker repository and configure the operator by adding the line below:
spec:
image: image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0
Depending on its type, use different configurations to add Kafka authentication. Here you can see the configuration for Kafka with SASL/Plaintext engine and scram-sha-512:
spec:
authentication:
passwordSecret:
password: kafka-password
secretName: mysecrets
type: scram-sha-512
username: myuser
Next, you need to specify the password in a secret file named my secret.
Processing file credentials
Because connectors require credentials to access databases, you must define them as secrets and access them using environment variables. However, if there are too many, you can put all the credentials in a file and address them in the connector with a modifier $file modifier
.
-
Put all the credentials as a key value named credentials in a secret file.
Credentials file:
USERNAME_DB_1=user1
PASSWORD_DB_1=pass1
USERNAME_DB_2=user2
PASSWORD_DB_2=pass2
Secret file:
kind: Secret
apiVersion: v1
metadata:
name: mysecrets
namespace: kafka-connect
data:
credentials: <BASE64 YOUR DATA>
-
Configure the operator with the secret as volume:
spec:
config:
config.providers: file
config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
externalConfiguration:
volumes:
- name: database_credentials
secret:
items:
- key: credentials
path: credentials
optional: false
secretName: mysecrets
-
Now in the connector you can access PASSWORD_DB_1 with the following command:
"${file:/opt/kafka/external-configuration/database_credentials/credentials:PASSWORD_DB_1}"
Put it all together
If we put all the configurations together, we have the following configuration for Kafka Connect:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: kafka-connect
namespace: kafka-connect
spec:
authentication:
passwordSecret:
password: kafka-password
secretName: mysecrets
type: scram-sha-512
username: myuser
config:
config.providers: file
config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
config.storage.replication.factor: -1
config.storage.topic: okd4-connect-cluster-configs
group.id: okd4-connect-cluster
offset.storage.replication.factor: -1
offset.storage.topic: okd4-connect-cluster-offsets
status.storage.replication.factor: -1
status.storage.topic: okd4-connect-cluster-status
bootstrapServers: 'kafka1:9092, kafka2:9092'
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: jmx-prometheus
name: configs
resources:
limits:
memory: 1Gi
requests:
memory: 1Gi
readinessProbe:
failureThreshold: 10
initialDelaySeconds: 60
periodSeconds: 20
jmxOptions: {}
livenessProbe:
failureThreshold: 10
initialDelaySeconds: 60
periodSeconds: 20
image: image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0
version: 3.2.0
replicas: 2
externalConfiguration:
volumes:
- name: database_credentials
secret:
items:
- key: credentials
path: credentials
optional: false
secretName: mysecrets
Note: the service, route and assembly configuration are not specified, this is in the article above.
We draw conclusions
Deploying Kafka Connect using the Strimzi operator can be a powerful and efficient way to manage data integration in your organization. By leveraging Kafka’s flexibility and scalability, along with the ease of use and automation provided by Strimzi, you can optimize your data pipelines and improve data-driven decision-making.
To understand how to deploy Kafka and use this tool in your work, we invite you to the Apache Kafka for developers course. Training starts May 12, 2023. On it we will analyze:
-
improper use of Kafka and lack of commits in it;
-
your Apache Kafka problem cases;
-
experience in creating a Data Lake of ~80 TB using Apache Kafka;
-
features of kafka operation with retention in 99999999.
In this article, we covered the key steps involved in deploying Kafka Connect using the Strimzi operator, including creating a minimal custom resource definition (CRD), the basic REST API authentication challenge, Kafka authentication, JMX Prometheus metrics, plugins, and file artifacts. By following these steps, you can easily customize your Kafka Connect deployment to meet your specific needs.