Synthetic CDC

DataZen has a built-in data differential engine that can quickly identify net changes of data sources over time. This feature is designed to extract changes from source systems that do not offer a change stream or that do not have a mechanism to keep a high watermark pointer.

DataZen uses its internal synthetic CDC identification logic automatically based on the type of source system and options selected for the job. Generally speaking, this differential engine tracks the signature of source records in an internal staging database and compares them over time as records are being extracted from the source system.

The synthetic CDC engine is bypassed in the following scenarios:

  • the source system is a CDC data source itself (such as SQL Server CDC or Change Tracking)
  • the job reader does not specify a Key Column

Unless special measures are taken, a Job Reader will create a Change Log that contains all the source records, which is also referred to as an Initial Sync file. A Resync operation also creates a Change Log with a complete set of records from the source system.

When the Synthetic CDC option is bypassed, all records detected from the source system are forwarded to the Sync File, unless the Job Reader has a High Watermark defined (Timestamp, DateTime or a Long value).

CDC State Table

When one or more Key Columns are specified in a Job Reader, the data read from the key columns are hashed and stored seperately in a database table to keep state information. In addition to the hashed key fields, the following information is also stored for each record:

  • __enzo_idcols_hashbytes: the hash value of all the source column names
  • __enzo_created: the datetime when the record was inserted in this table
  • __enzo_updated: the datetime when the record was last modified in this table
  • __enzo_hashbytes: the hash value of the complete source record
  • __enzo_status: internal use only

The CDC state table is used to detect any changes made to the source records, including any data updates, column changes (including data type). Although this state table contains limited information (mostly hash values), the number of records to be stored in this table may be very large depending on the source system.

The Synthetic CDC engine is capable of identifying "net" changes between two time intervals, including deleted records when the option is selected. The outcome of the differential analysis performed between two time intervals is stored in the Change Log.

Because DataZen determines the "net" changes only, the identification of an inserted versus updated record may not be possible. As a result, the Change Log contains two types of changes: upserted and deleted. It is up to the target system to perform an insert operation if the record is missing, or an update operation if the record is already present.

601 21st St Suite 300
Vero Beach, FL 32960
United States

(561) 921-8669
terms of service
privacy policy









© 2023 - Enzo Unified