Distributed Systems

Nearly everything a modern business does will involve software in some way, but the days of isolated mainframes running applications are long gone. Products and applications are too complex, with too many requirements. The majority now run on distributed systems that spread the burden. In this article, we will look at what a distributed system is, how they work, and the positives and negatives of such systems.

What is a Distributed System?

As the name implies, a distributed system is a computing system in which the various components or nodes are spread out across a network of computers (or virtual machines, containers, or any node that can connect and handle basic tasks). Though physically separated, the nodes are linked together and pool their resources to maximize efficiency when running a program.

Practically every form of computer network can function as a distributed system in some sense, but early forms of distributed systems were difficult to setup and maintain. In the modern environment, distributed systems will be cloud-based, operating over the internet. Most SaaS applications will function as distributed systems, with end users having access to a managing application that appears to be a single interface operating in one place, but will rely on cloud computing power to process tasks. When you use a ridesharing app like Uber or Lyft, your only interaction with the system will be an app on your phone, but there will be hundreds if not thousands of components elsewhere that work together to deliver rides.

Parts of a Distributed System

Distributed System Architectures

The most common form of a distributed system is the client-server model, in which multiple clients make requests of a central server, which apportions resources amongst clients. Peer to Peer systems are also common. In Peer to Peer, each node is an equal participant, with work divided up equally and each node capable of acting as either a client or a server.

In addition to the two well-known architectures above, there also exist three-tier distributed systems and n-tier or multi-tier distributed systems.

Key Features of Distributed Systems

Concurrency: In a distributed system, nodes need to be able to process tasks at the same time, without issues related to sequence or reference to a global clock.
Scalability: The purpose of distributed systems is to be able to handle more complex applications, which means having more power. This power is achieved by scaling the size of the network, adding more components when needed.
Fault tolerance: Distributed systems need to be robust to individual node failures, continuing to function even in the event of a localized fault.
Openness: The software running on a distributed system needs to be open to sharing and capable of distributing workload according to available resources
Transparency: Each individual node should appear to be a singular, capable device. Meanwhile, the nodes themselves need to have a certain level of access to other nodes in the system for communication.

Types of Distributed System

Distributed Systems appear almost everywhere and as such there are thousands of possible examples all around. Practically every internet-enabled application is today built as a distributed system.

The earliest examples were the simple computer networks that originated in the early computing era around the 70s and 80s. Local area networks did function as distributed systems, and many of the technologies developed for LAN systems eventually led to the development of the internet on which most distributed systems are built today.

Some of the common internet-powered distributed systems you may see include cryptocurrency processing, distributed compute systems that power scientific research, peer-to-peer file sharing, online video games, distributed databases, and real time distributed systems such as logistics tracking, ride-hailing applications, and flight control systems.

Away from the internet, most telecommunications networks are also distributed systems. Arguably, early telephone networks were the original distributed systems, preceding LANs by several decades, with mobile networks arriving at a similar time.

Benefits of distributed systems

There are a number of reasons to build applications that utilize a distributed system. They include:

Flexibility - Distributed systems are easy to scale, with additional nodes being added as and when they are needed.

Fault Tolerance - A single point of failure is no longer an issue when you can run multiple instances of your application across multiple data centers and cloud providers.

Low Latency - A distributed system reduces the distance between users, servers and storage systems, which helps bring down latency from milliseconds to microseconds or even nanoseconds.

Disadvantages of Distributed Systems

Distributed systems do have their drawbacks, as explained below:

Data Consistency - In a distributed system, it is difficult to maintain data consistency. Different nodes may have multiple copies of the same data which may not be in sync with each other. This can cause problems when trying to retrieve data from one node and then using that data on another node.

Network Failure - If your network fails, then you will lose connectivity between your nodes. This can cause serious issues for your application because there could be no way for clients to access the services provided by their servers.

Security - Since distributed systems introduce many new points of access, there are also many additional vulnerabilities. Keeping the system and any associated data secure is more challenging, with modern systems needing to safely provide the right to multiple users across locations and devices.

Scheduling - Tasks being run on a distributed system need to be scheduled in a certain order, at certain times and in certain locations. While there are mature scheduling solutions, all of them still have some inefficiencies which lead to wastage of resources.

Monitoring Overhead - Keeping track of data across a distributed system is difficult, requiring a lot of resources. It’s not just a matter of making sure that application data is being used correctly either, as monitoring will also need to make sure of things like hardware usage and load balancing.

Enterprise Use Cases for Distributed Systems

Discussing enterprise use of distributed systems is a fairly easy topic, as practically every major company will use distributed systems in some form. Much like Apache Kafka, the solution is incredibly useful and ubiquitous. Nonetheless, we will endeavor to list some of the prominent examples and possibilities below:

Enterprise Examples

Uber

Uber has utilized a microservice architecture to build applications for different purposes. These microservices are deployed across Uber’s vast distributed computing power, enabling the company to deal with thousands of requests and scale rapidly across new markets.

Amazon

“The moment we added our second server, distributed systems became the way of life at Amazon”

Amazon have made use of distributed computing from the very beginning of their company. In an extensive blog, Jacob Gabrielson describes how distributed computing evolved at Amazon over the years. Amazon identify three different types of distributed system in use: Offline, Soft real-time, and real-time.

Enterprise Use Cases

Looking more broadly, here are some potential use cases:

Parallel Processing

In the past, parallel computing and distributed systems were two separate things. Nowadays, thanks to modern operating systems, processors, and cloud services, distributed systems are widely used to enable parallel processing applications, improving speeds and quality..

Distributed artificial intelligence

Modern machine learning algorithms require vast sets of training data to learn from, as well as massive amounts of processing power. Distributed AI makes use of the potential of distributed systems to train from very large data sets using multi-agents more quickly.

Distributed Database Systems

Distributed databases are databases that are located across multiple servers, possibly across multiple physical locations. Distributed databases can be either homogeneous or heterogeneous. A homogeneous distributed database will share the same DBMS and data model across every node in the network. A heterogeneous distributed database can have multiple data models and DBMSs.

Apache Kafka can be deployed as a platform to help enable a distributed database, and can even function as a database itself if needed.

Cloud Computing vs Distributed Systems

Cloud computing is a popular buzzword in business circles, and it may seem that cloud computing is just another type of distributed system. However, the two are technically different. With a distributed system, resources are pooled from all the different nodes and tasks shared amongst the pool; the aim of such a system is to provide more power, supporting more tasks, and with greater compute.

Cloud computing simply implies a network-hosted server being used for any task. It is very easy to use cloud computing to build a distributed system, since you just need to add more servers to whatever application is running or task is being performed. But that doesn’t imply that every use of cloud computing must be a distributed system. The purpose of cloud computing is to offer on-demand environments, not the power or scalability of a distributed system.

Conclusion

Without distributed systems, the modern world of computing simply wouldn’t be possible. Telecommunications and the internet simply could not function without them.

As the business world continues to develop and embrace more technology, more compute, more cloud computing, and so on; so distributed systems will continue to rise in popularity. There simply isn’t a path for the modern enterprise that doesn’t involve some kind of distributed network, data store or compute. After all, this is why Apache Kafka has become so widely used, as the platform is essential for the management of data streams across distributed systems.

Given the complexity that is already present in distributed systems, businesses need any advantage they can get when it comes to building and maintaining their systems. Apache Kafka is a difficult and complex platform that can have a steep learning curve, which is why we developed Conduktor UI to take the pain out of Apache Kafka. Start a free trial of the system right now to see it in action and learn how easy it can be to get a Kafka deployment up and running.

We aim to accelerate Kafka projects delivery by making developers and organizations more efficient with Kafka.

Help me implement Data Mesh for Kafka