Kafka Connect CLI Tutorial
Learn how to use Kafka Connect in standalone mode to write data into Apache Kafka.
Kafka Connect deserves its own section to be learned in-depth, but in this tutorial, we will learn how to leverage Kafka Connect connectors with
connect-standalone to write data into Kafka.
In this tutorial, we will stream the changes of Wikipedia into a Kafka topic.
You can see the stream changes in Wikipedia here
In order to get data into Apache Kafka, we have seen that we need to leverage Kafka producers. Over time, it has been noticed that many companies shared the same data source types (databases, systems, etc...) and so writing open-source standardized code could be helpful for the greater good. The same thinking goes for Kafka Consumers.
Kafka Connect is a tool that allows us to integrate popular systems with Kafka. It allows us to re-use existing components to source data into Kafka and sink data out from Kafka into other data stores.
Example of popular Kafka Connectors include:
Kafka Connect Source Connectors (producers): Databases (through the Debezium connector), JDBC, Couchbase, GoldenGate, SAP HANA, Blockchain, Cassandra, DynamoDB, FTP, IOT, MongoDB, MQTT, RethinkDB, Salesforce, Solr, SQS, Twitter, etc…
Kafka Connect Sink Connectors (consumers): S3, ElasticSearch, HDFS, JDBC, SAP HANA, DocumentDB, Cassandra, DynamoDB, HBase, MongoDB, Redis, Solr, Splunk, Twitter
To use Kafka Connect in standalone mode, we need to provide the mandatory parameters:
Download a Kafka Connect connector, either from GitHub or Confluent Hub Confluent Hub
Create a configuration file for your connector
connect-standalone.shCLI to start the connector
Create the Kafka topic
wikipedia.recentchange in Kafka with 3 partitions
kafka-topics.sh --bootstrap-server localhost:9092 --topic wikipedia.recentchange --create --partitions 3 --replication-factor 1
As well as the topic dead letter queue
wikipedia.dlq, for catching any errors
kafka-topics.sh --bootstrap-server localhost:9092 --topic wikipedia.dlq --create --partitions 3 --replication-factor 1
Download the release JAR and configuration from here and unzip the archive on your computer at
1 2 3 4 5
~/kafka_2.13-3.0.0/connectors ls -R kafka-connect-sse ./kafka-connect-sse: connector.properties kafka-connect-sse-1.0-jar-with-dependencies.jar
Edit the configuration file
connectors/kafka-connect-sse/connector.properties with the following properties:
1 2 3 4 5 6 7
name=sse-source-connector tasks.max=1 connector.class=com.github.cjmatta.kafka.connect.sse.ServerSentEventsSourceConnector topic=wikipedia.recentchange sse.uri=https://stream.wikimedia.org/v2/stream/recentchange errors.tollerance=all errors.deadletterqueue.topic.name=wikipedia.dlq
Look into your Kafka installation directory (where your
config folders are)
Edit the content of the
1 2 3 4 5 6 7 8 9 10
key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true offset.flush.interval.ms=10000 # EDIT BELOW IF NEEDED bootstrap.servers=localhost:9092 offset.storage.file.filename=/tmp/connect.offsets plugin.path=/Users/stephanemaarek/kafka_2.13-3.0.0/connectors
The last three lines are the most important to make everything work.
In particular, the
plugin.path config: this is where you indicate the folder where you store your Kafka connectors you have downloaded before.
This must be an absolute path (not relative, and no shortcut with
~) to your
If you fail this step, Kafka Connect will stop after starting it.
Next, we can start our Kafka Connect Standalone connector
connect-standalone.sh ~/kafka_2.13-3.0.0/config/connect-standalone.properties ~/kafka_2.13-3.0.0/connectors/kafka-connect-sse/connector.properties
And as we can see, the data is flowing into our
There is a lot more to explore in Kafka Connect, this part was meant as an introduction.
Hope you liked it!
Conduktor & Kafka Connect
You've seen how to perform all these tasks using the CLI, but there's no need to keep using such a painful method.
Conduktor Platform can manage your Kafka Connect clusters and configure/start/stop connectors directly from there with just a single click. Try it now