Kafka Connect CLI tutorial
Kafka Connect CLI walkthrough in standalone mode: connect to Kafka with a connector using real-time Wikipedia changes, worker properties and the output topic.
Learn how to use Kafka Connect in standalone mode
Kafka Connect provides a scalable way to move data between Kafka and external systems. This tutorial demonstrates running a connector in standalone mode using real-time Wikipedia changes.
What you'll learn:
- What Kafka Connect is and when to use it
- How to configure a standalone connector
- How to set up the required properties files
- How to verify data flowing into Kafka
Live data stream
This tutorial uses the Wikipedia recent changes stream - a real-time feed of edits happening across Wikipedia.
What is Kafka Connect?
Kafka Connect is a framework for streaming data between Kafka and external systems using reusable connectors. Instead of writing custom producer/consumer code for common integrations, you can use pre-built connectors.
| Connector type | Direction | Examples |
|---|---|---|
| Source | External → Kafka | Debezium, JDBC, S3, MongoDB, Twitter |
| Sink | Kafka → External | Elasticsearch, S3, JDBC, HDFS, Splunk |
How to use Kafka Connect in standalone mode?
To use Kafka Connect in standalone mode, we need to provide the mandatory parameters:
- Download a Kafka Connect connector, either from GitHub or Confluent Hub Confluent Hub
- Create a configuration file for your connector
- Use the
connect-standalone.shCLI to start the connector
Example: Kafka Connect standalone with Wikipedia data
Create the Kafka topic wikipedia.recentchange in Kafka with 3 partitions
kafka-topics --bootstrap-server localhost:9092 --topic wikipedia.recentchange --create --partitions 3 --replication-factor 1 As well as the topic dead letter queue wikipedia.dlq, for catching any errors
kafka-topics --bootstrap-server localhost:9092 --topic wikipedia.dlq --create --partitions 3 --replication-factor 1 Download the release JAR and configuration from here and unzip the archive on your computer at kafka_2.13-2.8.1/connectors/kafka-connect-sse:
~/kafka_2.13-2.8.1/connectors ls -R
kafka-connect-sse
./kafka-connect-sse:
connector.properties kafka-connect-sse-1.0-jar-with-dependencies.jar Edit the configuration file connectors/kafka-connect-sse/connector.properties with the following properties:
name=sse-source-connector
tasks.max=1
connector.class=com.github.cjmatta.kafka.connect.sse.ServerSentEventsSourceConnector
topic=wikipedia.recentchange
sse.uri=https://stream.wikimedia.org/v2/stream/recentchange
errors.tollerance=all
errors.deadletterqueue.topic.name=wikipedia.dlq Look into your Kafka installation directory (where your bin and config folders are)
Edit the content of the config/connect-standalone.properties file
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.flush.interval.ms=10000
# EDIT BELOW IF NEEDED
bootstrap.servers=localhost:9092
offset.storage.file.filename=/tmp/connect.offsets
plugin.path=/Users/stephanemaarek/kafka_2.13-2.8.1/connectors The last three lines are the most important to make everything work.
In particular, the plugin.path config: this is where you indicate the folder where you store your Kafka connectors you have downloaded before.
This must be an absolute path (not relative, and no shortcut with ~) to your connectors directory
If you fail this step, Kafka Connect will stop after starting it.
Next, we can start our Kafka Connect standalone connector
connect-standalone ~/kafka_2.13-2.8.1/config/connect-standalone.properties ~/kafka_2.13-2.8.1/connectors/kafka-connect-sse/connector.properties And as we can see, the data is flowing into our wikipedia.recentchange topic:
kafka-console-consumer --bootstrap-server localhost:9092 --topic wikipedia.recentchange Standalone vs distributed mode
| Mode | Use case | Scalability |
|---|---|---|
| Standalone | Development, testing, single tasks | Single worker |
| Distributed | Production, high availability | Multiple workers |
See it in practice with Conduktor
Conduktor Console provides a visual interface for managing Kafka Connect clusters, deploying connectors and monitoring connector health and throughput.
Next steps
- Start programming with Kafka to build producers and consumers in code
- Monitor Kafka clusters including Connect workers
- Explore Confluent Hub for more available connectors