Understanding Change Data Capture

In this eBook, we look at why Change Data Capture systems have become popular, discussing how CDC works and how to scale it. We explore potential use cases for CDC, along with popular technology choices to build a CDC system in production environments.

eBook visual

What's in the eBook?

  • 01
    ACME.com A case in point

    An example use case using an e-commerce store demonstrates the need for real-time data replication across databases

  • 02
    Change Data Capture (CDC)

    What is Change Data Capture and what are the methods of implementing a CDC system?

  • 03
    Designing a production-grade real-time CDC system

    Taking CDC from theory to production, with a discussion of architecture and a sample Debezium code snippet

  • 04
    Why CDC is better than traditional batch-based ETL

    The advantages that a CDC system has over the more traditional Extract Transform Load pipeline

  • 05
    CDC use cases

    Where and when should you use CDC? We look at 6 potential use cases

  • 06
    CDC with Debezium and Kafka

    How to implement a CDC system using the open-source platform Debezium and Apache Kafka

Introduction

Change Data Capture (CDC) is an alternative approach for batch ETL, enabling real-time data replication across databases. CDC can detect, capture, and move data from a source database as data changes, allowing a real-time data synchronization across downstream systems. The first few sections of this document walk you through a fictitious online store use case, explain the ETL shortcomings, and introduce you to CDC concepts. Then we discuss how the CDC works, its benefits, and how to implement a CDC system at scale. We conclude the document by discussing potential use cases for CDC, along with popular technology choices to build a CDC system in production environments.