Validating Event-Driven Architectures~ NaN minutes
Demonstrating how to use our latest product to validate workflows and test data integrity at each stage of your system architecture.
Event-driven architectures have become common ground for developing modern applications. Their loosely coupled, low-latency, and scalable nature provide organizations with the agility required to be competitive in any market. Up until now, navigating how to test them was the less glamorous side of real-time decision making and reactive applications.
With Conduktor Testing, you can empower QAs with limited Kafka expertise, write end-to-end tests quickly, and automate their execution in any CI/CD environment.
Ultimately, we’ll help you focus on what to test (business scenarios that matter), not how to test it (implementing an async testing framework).
Case Study | Event-Driven Retail System
In this post, we will demonstrate how to compose an end-to-end Test Scenario for an event-driven system composed of multiple topics. This tutorial will demonstrate:
Testing the workflow to validate the integrations
Are messages correctly broadcast to Kafka topics from the external applications?
Does the workflow complete within an agreed SLA?
Testing the data correctness at each step:
Was our enrichment successful, based on the userId present in the message?
The pre-requisite to creating tests are cluster configurations, which are defined in your workspace during setup. This involves specifying the bootstrap servers, schema registry URL (if using it), and any additional broker configurations. The Conduktor Testing Agent allows you to reach any public, private or local cluster that your host has access to.
For the purpose of this tutorial, we will demonstrate a retail system consisting of multiple stream processing steps. The architecture and message flows are demonstrated below.
The event flow can be described as:
Message is produced into the pageviews topic
User service consumes the message => enriches it with additional data => then produces an enriched message into the pageviews_enriched topic
The promotions service consumes the enriched message and => assuming certain business rules are met => produces another message into the upsell_events topic
Visually representing the Test Scenario
There are 3 Kafka topics involved in the system architecture. This means there are multiple stages at which it’s possible to check the data and validate if the intermediary applications (user service, promotions service) have fulfilled their responsibilities.
To build the scenario in Conduktor Testing, we use nodes on a canvas to represent tasks relating to Kafka. An example task could be producing or consuming from a Kafka topic. We’re also working to add support for other protocols such as HTTP.
It’s worth noting that the external applications (user service, promotions service) are not represented on the graph. Though, they must be available for a valid test execution as they’re responsible for intermediate processing steps. In each case, the services consume a message and do something, before publishing a new message into another Kafka topic. It’s the resultant output in Kafka that we are testing at each stage.
Breaking it down
The first node on the canvas is a producer task. It’s responsible for publishing a message into the pageviews topic. The relevant cluster and topic are configured on the General tab. Then, a JSON value is set as the message value on the Data tab. You can also configure randomly generated data for more dynamic test conditions.
The user service will take care of enrichment based on the userId presented in this message. It’s possible to check the RecordMetadata (offset, partition, topic, etc.) associated with a published message, but in this case we will not configure a producer check.
Consumer Task - Check Enriched
This task is used to consume the enriched message that we expect to be emitted from the user service (assuming it’s working correctly).
In the Data tab, the deserialization format for consuming the enriched message is set. In this case, we expect a JSON value. It’s also necessary to configure lifecycle rules that determine when the task should stop consuming, or when it should be considered to have failed.
In this case, we expect 1 enriched message for each message produced. Therefore, the stop condition is set to 1 record. If messages were continuously being produced into the topic, you could use a filter to intercept the correct one.
The final step relating to this task is to create a check on the consumed message. Below demonstrates the expected format of the message post-enrichment.
To test data inside the message, it’s possible to use jq to access the attributes for the purpose of an expression. In this case, I will validate that the planType relevant to this user was correctly enriched by the user service.
Below demonstrates an expression that asserts the planType for this user to be Free. If any other string value is received, this test would be considered a failure.
Consumer Task - Check Upsell
The final task is used to ensure that the business logic for the promotions service is working correctly. The business rule assumes that Free members who visit the checkout page should be upsold promotional products. The expectation is a message produced into the upsell_events topic.
From a business perspective, it’s important this happens within an agreed SLA. For the purpose of this example, let’s assume that the SLA is set at 500ms.
This can be addressed via the lifecycle rules configured on the Data tab. The fail condition is set to an elapsed time of 500ms. Meaning, the test will be considered a failure if no message is received within that interval.
Now that the workflow and checks have been configured, the only remaining step is to execute the scenario and observe the system behavior.
Manually execute the test scenario using the Run button.
In the above case, we can see that the first 2 tasks passed. However, the end-to-end test scenario has failed due to the final task. Hovering over the execution events will provide more detail on the reasons why.
Consumed reached the Timeout fail condition
This mean the business SLA of 500ms, associated with the final task, was not met
You can also preview the data associated with messages produced and consumed events. Below demonstrates previewing of the ‘Record consumed’ event, associated with the ‘Check Enriched’ task.
Navigating to the Checks tab will detail the result of any test checks. In this case, the check on the enriched message was successful. Meaning, the user was correctly attributed to a Free plan by the user service.
Great! To recap, we have proven how to:
Visually architect a business scenario we want to test
Test the business data that are output into Kafka from the user service
Test whether the workflow executes within an agreed business SLA
Of course, everything that’s been demonstrated in this blog post has involved the UI and manual test executions. To take this further and enable continuous, automated testing, it’s possible to export the test configuration for integration in any CI/CD environment.
You can see an example Github action here, but we’ll save a deeper dive on that for a future post!
Sign up and start using Conduktor Testing today! Cheers!