The Big data movement has lead to an exponential growth in data captured by every business. The data stretch from high-level information like customer surveys, market movements, etc, to finer product level information like user trails, feature clicks, metrics (business and application).
The business requirements around data are served by many open-source data streaming platforms like Apache Kafka. Generating data is only the start to do Data-Driven Decision Making (DDDM).
Data-driven decision-making (DDDM) is defined as making decisions based on hard data as opposed to intuition, observation, or guesswork. The value of data-driven decisions is dependent on the quality of the data and its analysis and interpretation.
A need for Data Visualization
Data are a powerful asset when there is an opportunity to access them and visualize the value they capture.
Data visualization solutions provide ways to access and filter the most relevant data and show it in different formats. Unlike a data-streaming platform, used by developers and ops, the data visualization platform is used by Business Analysts and Stakeholders. It must offer features that are more suited for non-technical users, intuitive, fast, and beautiful!
Apache Kafka as a leading data platform
Apache Kafka has been the cornerstone of all the data streaming transformations which yield real-time data collection, processing, storage, and analysis.
It has been adopted by over 60% of Fortune 100 companies. The open-source software is often an underlying component in most streaming architectures. There is a high probability that your business has adopted Kafka or is looking to use it, to help to transform their IT and lead their digital data transformation.
To make the best use of data flowing through Apache Kafka, we need a data visualization solution with proper integration to Apache Kafka and visualize data in almost-real-time, not tomorrow or a week.
Also, due to the vast choices of data format these days (more or less specialized), a data visualization platform must be able to understand data published in different formats (csv, XML, JSON, Apache Avro, Google Protobuf, MessagePack, etc.) to be efficient and avoid useless data-conversions preparation jobs.
Apache Kafka (or any other data-streaming solution) is often deployed across diverse infrastructure platforms like cloud, on-premise VMs, containers, etc. A successful data visualization solution must support connectivity to all of the various deployments while enforcing the required security and governance.
The solution we’re looking for must be business-users-focused rather than developer-focused. The platform must be easy to operate by Data Analysts and stakeholders without the need of any developer.
A data streaming platform typically processes a bunch of events at any given instant. After a bit of tuning and with the proper infrastructure, Apache Kafka can quickly process 2 million of records/s. It should be “easy” and intuitive to display aggregated values in real-time, to provide instant insights and be able to act upon them quickly. Also, looking at the whole history at a glance (day, week, year) is a must to see the evolution and trends.
Conduktor: an intuitive interface to peek into your data
We provide a desktop application with a friendly user interface to work with Apache Kafka and its extensions (Schema Registry, Kafka Connect, Kafka Streams, ksqlDB, etc.).
Conduktor enables you to look into the data published into your Apache Kafka topics. It is compatible with most of data formats: from the basics to the most complex.
You can search, analyze and export data published in real-time or from the past and analyze them. It’s often a requirement for developers, QA, and Data Analysts.
Working with Conduktor
Let’s walk through how to work with Conduktor. We need to make sure to have the following:
- Access to a running Apache Kafka cluster.
- Download and install Conduktor on your own machine
Signup/Login into Conduktor, you will be able to synchronize your clusters with your team or create new clusters:
Connect Conduktor to your Apache Kafka cluster:
Conduktor supports on-premise installations, Kubernetes custom installations, Cloud-managed installations, as well as service providers like Confluent Cloud, Aiven, CloudKarafka, you name it.
You can validate connection details before saving them and iterate easily in case of trouble. You can connect Conduktor to other features outside of our scope here: SSH, Schema Registry, Kafka Connect, ksqlDB etc. These are not immediately applicable for business stakeholders but are relevant for Operational needs.
Once connected, you’ve presented a dashboard with an overview of your Apache Kafka cluster, important metrics to watch for. Conduktor will warn you if it detects anything wrong (misconfiguration, topics failures, Kafka Streams applications down…).
Apache Kafka has its data organized in topics. It’s quite normal to have hundreds of them in a small business, and thousands in larger businesses. The topics are named accordingly to the data they serve and their level of security, privacy, business units, etc.
A few examples of classic analytics data published into Apache Kafka are: user signups, notifications, and product features clicks.
You can explore the data published on any of the topics by clicking the search icon associated with each Topic. In the above cluster, let’s look into the notifications topic.
Data Lookup and Selection
Conduktor provides a powerful data lookup capability.
It offers flexibility to lookup data across various timeframes like current data, last hour, yesterday, since the beginning, etc. It provides support of various data formats like JSON, Apache Avro (with many variations), Protobuf, JSON-Schema, binary data. Each topic can have a different format due to specializations, the programming language used, developer experience.
Here, a topic using the JSON format:
You can start building insights by filtering data based on different criteria: similarity, equality, containment, specific field, regular expressions. This is available for data and metadata of the records.
Here, we’re lookup for all the “email” notifications types only:
Conduktor provides multiple views to see the data and will keep expanding to provide more perspectives of the same data.
By default, the view is clear and simple to quickly see the data at a glance. Conduktor also provides a tabular view to display data alongside its other attributes: key, value, timestamp, headers (deconstructed). This allows for customization (column selection) to reduce data noise by remaining focused on the useful bits of data.
It’s also possible to “project” the data to extract only what’s necessary.
Here, we focus on the field notification.name by using a “Field Selection”:
Data features are essential (aggregations in real-time, keeping the latest occurrence by key only…), we keep adding more and more (charts, histograms, pivot tables)! Feel free to tell us what you’d like to see.
Once we have captured some data, we can export them for offline analysis and share them internally.
Conduktor can export data in Excel-supported CSV format or JSON for developers. It’s possible to export thousands of records in real-time, or just a small slice. It will allow us to build visualization like this: