CDC with Ascend
Introduction
Change Data Capture (CDC) is crucial for efficiently replicating and synchronizing database changes. Ascend.io offers two distinct approaches for implementing CDC: a traditional multi-component method and a more streamlined MySQL CDC method.
CDC with Ascend
This method involves a few components for data replication. You'll need two Read Connectors and a Merge Transform to complete the process. This approach works for all databases except MySQL. The MySQL Read Connector has CDC replication modes.
Implementing CDC with Ascend
Step 1: Initial Data Backfill: Use a full sync Read Connector for initial data replication from the source.
Step 2: Incremental Changes: Create a read connector for your changed data capture using an incremental replication strategy.
Step 3: Data Normalization and Merging: Normalize incoming data and use a merge transform to integrate the full sync with the changed data.
Step 4: Synchronization: Ensure proper sequencing between initial and incremental syncs to avoid data gaps.
We're continuing to develop CDC strategies with Ascend and improve connections for a more streamlined process. Reach out to Ascend Support to see if your particular connection has CDC ingestion enabled.
MySQL CDC
MySQL CDC in Ascend introduces current and history modes specifically for MySQL databases, simplifying the replication process in two steps.
MySQL CDC Replication Modes
Current Mode and History Mode present two dynamic approaches for managing MySQL tables, each tailored to specific needs—from obtaining the latest data snapshot to tracking detailed historical changes, including schema adaptations and update and deletion strategies. Depending on the mode, the schema will include source columns with additional metadata for change tracking.
Current Mode
- What It Offers: This mode replicates the latest state of your MySQL table as of the last refresh cycle.
- What You See: All the standard columns from your table, plus a few extra for metadata. These include timestamps for when each row was first seen and last modified. And if a row gets deleted, it’s marked with a timestamp too.
- Ideal For: Times when you're interested in the most recent data snapshot.
- Added Columns: Three columns are added to the output table:
__ascend_ingested_timestamp
- when the record for a given primary key was first ingested.__ascend_modified_timestamp
- the last modified timestamp for the record for the given primary key__ascend_deleted_timestamp
- when the record for the given primary key was deleted from source
History Mode
- What It Offers: It captures the evolving story of your data, tracking changes over time.
- What You See: Similar to Current Mode, but with additional columns for 'record version start' and 'end', plus deletion timestamps. This gives you a historical view of each record’s lifecycle. In history mode, records without an 'end' timestamp are currently active.
- Ideal For: Situations where tracking the historical changes of data is crucial.
- Added Columns: Three columns are added to the output table:
__ascend_record_version_start
- from what point in time the record version is valid.__ascend_record_version_end
- up to what time the record version is valid. If this is NULL, then that record version is currently valid.__ascend_deleted_timestamp
- when the record was deleted.
Implementing MySQL CDC Replication
Step 1: Configure Read Connector: Select Change Data Capture as the replication strategy.
Step 2: Select Replication Mode: Choose between current and history modes.
Step 3: Key Columns: Indicate the column or columns to use for merging incremental change records with the output table of either replication mode. Comma-separate the column names if you're using more than one. This is often the primary key of the source table.
MySQL Configuration
Generating Schema: For the initial configuration, do not select GENERATE SCHEMA when configuring your MySQL Read Connection for CDC replication.
When switching from History to Current, always select GENERATE SCHEMA again. The additional timestamp metadata columns added by Ascend is different between the two and regenerating the schema enables Ascend to transition the columns.
Managing Deletions
MySQL CDC has "Hard Deletes," where data is permanently removed, and "Soft Deletes," where data is marked but retained.
- Hard Deletes: This means completely removing a row from the table. In CDC mode, this is detected and recorded.
- Soft Deletes: In this scenario, the row is marked as deleted in the table, useful for keeping track of deletions without removing data. Deletion timestamps in the schema indicate when a row was removed.
The timestamps in both modes (Current and History) are based on ingestion timestamp, and not event timestamp. This is to reconcile the fact that the backfill has no event timestamps, and with incremental replication, which doesn't have exact event timestamps
Partitioning and Output
- Simplified Output: We’ve condensed everything into a single output partition, making your data easier to manage.
Summary
Ascend's CDC capabilities facilitate efficient data replication, with traditional and MySQL-specific methods. The MySQL CDC, with its current and history modes, offers a more straightforward approach to replicating and managing data changes in MySQL databases.
Updated 10 months ago