Managing your Kafka Topics with Conduktor
A look at how Conduktor helps you manage the Topics on your Kafka cluster.
What is Apache Kafka and Conduktor?
Before we look at how Conduktor helps you manage the topics on your Kafka cluster, let's first look at what a topic is in Apache Kafka.
Apache Kafka is a pub-sub system that is composed of a network of machines called brokers. The brokers accept “events” from event producers and retain them based on time or message size, allowing consumers to read and process them at will.
You can find lots more information about Apache Kafka on Conduktor Kafkademy, our free learning resource.
What is a Topic?
A topic in Apache Kafka can be compared to a table in a relational database or a folder in a file system. Topics can additionally be broken into several partitions, which we will go into more detail later and you can also find more information here.
A topic keeps a log of all messages you send to it and is recognised by the topic name. The topic naming strategy can be very specific to your own set up with a number of factors you can incorporate, including, where your data is coming from, how many topics you plan to have and the environment. You can find some interesting links for topic naming strategies here and here which may be useful.
In summary, topics are the high-level structure that holds events together:
They are comparable to a table or a folder.
Can have a specific naming strategy.
Can house many partitions, as discussed below.
Topics are typically broken down into multiple parts or “partitions”. A single topic may have more than one partition and it is common to see topics with multiple partitions. Shown above is a topic with three partitions.
Partitions are the way that Kafka ensures scalability, elasticity and redundancy across a number of brokers. Each partition can be hosted across multiple brokers to increase performance and ensure redundancy if a broker were to fail.
The number of partitions of a topic is specified at the time of creation; however, Conduktor allows you to easily add new partitions. For more in-depth information on partitioning, check out this page on Conduktor Kafkademy.
In summary, partitions are the smallest storage unit that holds a subset of events owned by a topic:
A new event always goes on the end.
They can only be read by seeking to offset and scanning forward from there.
Events are immutable.
How Conduktor improves Topic Management?
Understanding what topics are and how they function in Apache Kafka is one thing but how can Conduktor help you manage your topics?
Conduktor helps you in three major ways:
Navigating your topics and finding the one you are looking for can be a time-consuming process especially if you have 50, 100 or even 1,000 topics. Conduktor makes this process easier for you in a number of ways. Firstly, with a graphical view of all topics in your cluster, condensing the topic name through a feature called “Smart Groups”, tagging certain topics as “Favourites” and illustrating how recently active a topic has been.
Let's say you have a problem in production and you want to carry out some data inspection on a particular topic. When there are dozens to potentially hundreds of topics involved and depending on your topic naming strategy this can be easier said than done. “Smart Groups” in Conduktor can make your life much simpler in this regard.
By using “Smart Groups” you can save time when trying to navigate the topics on your cluster and provide some more information such as the “area” and “environment” of each topic in this case.
You configure through a regular expression (regex) your naming strategy or multiple naming strategies and then turn on or off with the click of a button.
For an in-depth look into the “Smart Groups” feature please read our blog here.
Maybe you have a few topics that you always use or just a few that are always needed to carry out some troubleshooting. You can save yourself a lot of time by adding a favourite tag to them in Conduktor. They will have a blue star beside them so that you can filter them out from the rest to keep track of these specific topics.
It can be difficult to know which topics you use most regularly or even which topics you can get rid of if you are running low on disk space. The activity tab in Conduktor helps you understand the topics that are regularly used. So you can easily add them to the favourites and remove any that are not regularly used.
2. Cluster Health
Overall Cluster Level
Conduktor provides a view of all topics on your cluster, including:
Overall number of partitions
Under Replicated Partitions (URP),
Partitions without a Leader (No Leader)
Partitions that are below the required minimum in sync replicas (<Min ISR).
Think of these metrics on your Kafka environment similarly to the engine lights in your car, if they light up in red there may not be something drastically wrong, however, it’s a good idea to check them out. For a detailed look at insync replicas and replications in Kafka, read the Conduktor blog here
If you configure the metrics through JMX or Jolokia, as shown here, the throughput in messages per second and bytes per second will also be visible. Finally, if the topic is part of a Consumer Group that will also be shown.
Individual Topic Level
On a topic level you can see information including; the number of events, number of partitions, and its replication factor.
You will be able to understand at a glance if there is a problem with this topic by looking at “All have a leader”, “All ISR’s” and “All ISR’s at least at 1”. This will tell you instantly if the partition has a leader and the correct number of in sync replicas. We want to see a green tick beside each, if they appear in red there may be a problem.
Further information you can check at the topic level is if a schema is attached, part of a consumer group and the lag of that consumer group. You can check which broker each leader partition is on, carry out data inspection on the topic through a single consumer and import data via CSV.
If a Problem Occurs
If a problem occurs on your cluster such as a broker going down, you will be able to see any Under Replicated Partitions (URP), Partitions without a Leader (No Leader) or partitions that are below the required minimum insync replicas (<Min ISR) at both a cluster level and an individual topic level as shown in the images below.
You will see alerts that come up in red so you can look into specific Topics and the effect on each individually.
3. Cluster Administration
Conduktor has a number of advanced features with regard to topics that while commonly are more useful to those carrying out Kafka administration, are helpful to know and understand regardless.
Topic configurations are all very dependent on how you are using your Kafka cluster. Maybe you need a topic to retain the information a lot longer than the default or perhaps a topic has an exponentially higher number of events per second incoming and needs a different compression level or number of partitions compared to the default.
You have the ability to change a topic’s configuration directly in Conduktor. The parameters you set will impact topic performance and behaviour, a few important examples for topic configurations include:
Log Cleanup Policy
Min Insync Replicas
For a more detailed look at the advanced concepts of topics check out our advanced Kafka Topic tutorials on Conduktor Kafkademy.
Add partitions on a Topic
While you will have made calculations on the number of partitions required for a topic at the creation stage, applications evolve and you may need to add more, usually to aid on throughput.
In general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve. You can add partitions on the fly in Conduktor and its straightforward procedure.
One thing to keep in mind is to be careful if messages are produced with keys. When publishing a keyed message, Kafka provides a guarantee that messages with the same key are always routed to the same partition. This guarantee can be important for certain applications since messages within a partition are always delivered in order to the consumer. If the number of partitions changes, such a guarantee may no longer hold.
Emptying a topic
Emptying a topic allows you to get rid of the messages on the topic but keep the topic.
You may want to empty a topic if the data contained within is corrupted but you still want to keep the topic itself for use in future.
You can do this after clicking into a topic and choosing the “Advanced” button to the top right of the screen above.
Performing administrative tasks on a Kafka cluster, such as managing and monitoring topics, used to be a manual and error-prone task that was done using CLI tools.
Conduktor can help you manage the topics on your cluster through navigation, health and cluster operations more easily, efficiently and through an intuitive UI.
If you want to see the benefits for yourself download Conduktor now