High Watermark / Pointers
Some jobs support the ability to track the "last highest value" of a field from the data source so that future calls can retrieve only the data that changed. Normally, this value is a DateTime or Timestamp data type, or an integer (or long) value. For example, a database system may have a timestamp field that can be used for a High Watermark. Twitter offers an id field that contains a numeric value that keeps growing. A SharePoint List contains a LastModified field that can be used for this purpose.
Generally speaking a High Watermark is used as an optimization technique that limits how future data is retrieved so that only the changes are extracted. High Watermarks values are usually not necessary when the source system is a CDC stream itself or a messaging platform.
Specifying a Job Reader High Watermark
When creating or a updating a job reader, you can specify the High Watermark field to use
in the Timestamp / DateTime field in the Replication Settings tab.
The initial values can be set manually if the intent is to avoid reading all source
records the first time the job runs. To do so, click on the
Set initial pointers (high value)... link.
This screen shows how to define the Last Read Value setting of a job. However, some jobs can also hold a last "deleted" pointer, seperately from the last "read" pointer. See the Database Job Reader section for more information.
Specifying an HTTP Job Reader High Watermark
When creating an HTTP job reader, you can specify the High Watermark field to use
in the High Watermark field in the Capture Strategy tab.
In order to set this field, you need to first set a Capture Strategy that
uses a WINDOW operation (WINDOW READ or WINDOW READ + CDC).
The initial values can be set manually if the intent is to avoid reading all source
records the first time the job runs. To do so, click on the
set initial values... link.
When editing an HTTP Job Reader, the screen looks a bit different; however, the same fields are visible on the screen and can be modified the same way.
View/Editing High Watermark Values
To edit High Watermark values (last read or last deleted), select the desired job
from the list of jobs in DataZen Manager. Shortly after clicking on it, the right panel shows most job settings, including
the current High Watermark value in Last Read Pointer (in this example, a DateTime).
If the job holds a High Watermark, the Edit Pointers button
on the right panel will be enabled; click on it.
This screen shows both the Last Read Pointer and Last Delete Pointer when available. You can manually edit the value. The value can be modified as follows:
- Reset (null): resets the value to NULL; all available data will be read again
- Date/Time Value: Selects a date/time value from a date picker
- Numeric Value: Enter a numeric value
- Custom Value: Free-form text
In some cases, this setting may be an array when a job holds multiple pointers. When entering a date as free-form, use the following notation: YYYY-MM-DD hh:mm:ss.nnn
If not set correctly, changing this value may cause the job to fail.