Real-time data visualisation

The Big data movement has led to an exponential growth in data captured by every business. The data stretch from high-level information like customer surveys, market movements, etc, to finer product level information like user trails, feature clicks, and metrics (business and application).

The business requirements around data are served by many open-source data streaming platforms like Apache Kafka. Generating data is only the start to do Data-Driven Decision Making (DDDM).

Data-driven decision-making

Data-driven decision-making (DDDM) is defined as making decisions based on hard data as opposed to intuition, observation, or guesswork. The value of data-driven decisions is dependent on the quality of the data and its analysis and interpretation.

A need for Data Visualization

Data are a powerful asset when there is an opportunity to access them and visualize the value they capture.

Data visualization solutions provide ways to access and filter the most relevant data and show it in different formats. Unlike a data-streaming platform, used by developers and ops, the data visualization platform is used by Business Analysts and Stakeholders. It must offer features that are more suited for non-technical users, intuitive, fast, and beautiful!

Apache Kafka as a leading data platform

https://docs.confluent.io/5.5.1/kafka/introduction.html

Apache Kafka has been the cornerstone of all the data streaming transformations which yield real-time data collection, processing, storage, and analysis.

It has been adopted by over 60% of Fortune 100 companies. The open-source software is often an underlying component in most streaming architectures. There is a high probability that your business has adopted Kafka or is looking to use it, to help to transform their IT and lead their digital data transformation.

To make the best use of data flowing through Apache Kafka, we need a data visualization solution with proper integration to Apache Kafka and visualize data in almost-real-time, not tomorrow or a week.

Also, due to the vast range of data formats these days (more or less specialized), a data visualization platform must be able to understand data published in different formats (CSV, XML, JSON, Apache Avro, Google Protobuf, MessagePack, etc.) to be efficient and avoid useless data-conversions preparation jobs.

Platform Agnostic

Apache Kafka (or any other data-streaming solution) is often deployed across diverse infrastructure platforms like cloud, on-premise VMs, containers, etc. A successful data visualization solution must support connectivity to all of the various deployments while enforcing the required security and governance.

Business First

The solution we're looking for must be business-users-focused rather than developer-focused. The platform must be easy to operate by Data Analysts and stakeholders without the need of any developer.

A data streaming platform typically processes a bunch of events at any given instant. After a bit of tuning and with the proper infrastructure, Apache Kafka can quickly process 2 million records. It should be "easy" and intuitive to display aggregated values in real-time, to provide instant insights, and be able to act upon them quickly. Also, looking at the whole history at a glance (day, week, year) is a must to see the evolution and trends.

Conduktor: an intuitive interface to peek into your data

We provide a desktop application with a friendly user interface to work with Apache Kafka and its extensions (Schema Registry, Kafka Connect, Kafka Streams, ksqlDB, etc.).

Conduktor enables you to look into the data published into your Apache Kafka topics. It is compatible with most data formats: from the basics to the most complex.

You can search, analyze, and export data published in real-time or from the past and analyze them. It's often a requirement for developers, QA, and Data Analysts.

Working with Conduktor

Let's walk through how to work with Conduktor. We need to make sure to have the following:

Access to a running Apache Kafka cluster.
Download
and install Conduktor on your own machine

Signup/Login into Conduktor, you will be able to synchronize your clusters with your team or create new clusters:

Connect Conduktor to your Apache Kafka cluster:

Connectivity

Conduktor supports on-premise installations, Kubernetes custom installations, Cloud-managed installations, as well as service providers like Confluent Cloud, Aiven, CloudKarafka, you name it.

You can validate connection details before saving them and iterate easily in case of trouble. You can connect Conduktor to other features outside of our scope here: SSH, Schema Registry, Kafka Connect, ksqlDB, etc. These are not immediately applicable to business stakeholders but are relevant for Operational needs.

Once connected, you've presented a dashboard with an overview of your Apache Kafka cluster, and important metrics to watch for. Conduktor will warn you if it detects anything wrong (misconfiguration, topics failures, Kafka Streams applications down...).

Data Exploration

Apache Kafka has its data organized in topics. It's quite normal to have hundreds of them in a small business, and thousands in larger businesses. The topics are named according to the data they serve and their level of security, privacy, business units, etc.

A few examples of classic analytics data published into Apache Kafka are: user signups, notifications, and product feature clicks.

You can explore the data published on any of the topics by clicking the search icon associated with each Topic. In the above cluster, let's look into the notifications topic.

Data Lookup and Selection

Conduktor provides a powerful data lookup capability.

It offers flexibility to lookup data across various timeframes like current data, last hour, yesterday, since the beginning, etc. It provides support for various data formats like JSON, Apache Avro (with many variations), Protobuf, JSON-Schema, and binary data. Each topic can have a different format due to specializations, the programming language used, and developer experience.

Here, a topic using the JSON format:

You can start building insights by filtering data based on different criteria: similarity, equality, containment, specific field, and regular expressions. This is available for data and metadata of the records.

Here, we're lookup for all the "email" notifications types only:

Data Views

Conduktor provides multiple views to see the data and will keep expanding to provide more perspectives of the same data.

By default, the view is clear and simple to quickly see the data at a glance. Conduktor also provides a tabular view to display data alongside its other attributes: key, value, timestamp, and headers (deconstructed). This allows for customization (column selection) to reduce data noise by remaining focused on the useful bits of data.

It's also possible to "project" the data to extract only what's necessary.

Here, we focus on the field notification.name by using a "Field Selection":

Data features are essential (aggregations in real-time, keeping the latest occurrence by key only...), and we keep adding more and more (charts, histograms, pivot tables)! Feel free to tell us what you'd like to see.

Offline Analysis

Once we have captured some data, we can export them for offline analysis and share them internally.

Conduktor can export data in Excel-supported CSV format or JSON for developers. It's possible to export thousands of records in real-time, or just a small slice. It will allow us to build visualization like this:

Next steps

Try Conduktor for free and let us assist you in building your Data-Driven Business.

Thank you.

We aim to accelerate Kafka projects delivery by making developers and organizations more efficient with Kafka.

Help me implement Data Mesh for Kafka

Real-time data visualization