Data Pipeline

Jobs can define a Data Pipeline that can translate or transform the incoming dataset before being saved to the Sync File or sent to the target system.

The execution of the Data Pipeline follows the order in which the pipeline components are listed. You can change the order of each component by using the up/down arrows.

To preview the data pipeline, click on the Apply button. Since the Target System uses the output of the data pipeline, if one is defined, it is important to execute the pipeline before configuring the target system options.

When defining a Direct Job, a single Data Pipeline can be defined. It is applied as the data is being read (before saving to the intermediate Sync File).

When creating a Job Writer, a secondary Data Pipeline can be defined specific to the target system. Doing so allows data engineers to separate the translation of the source data that should apply to all future consumers (such as encrypting a sensitive field) from a target-specific translation/transformation.

Pipeline Components

A number of built-in pipeline components are available in DataZen:

  • Data Filtering: Applies a secondary filter to the data by adding a SQL Where clause as a Data Filter.
  • Data Hashing: Applies a hash algorithm to a selected data column (must be a string data type); supported hashing algorithms are MD5, SHA1, SHA256, SHA384 and SHA512.
  • Data Masking: Applies masking logic to a selected data column, such as credit card number or a phone number. Supports generating random numbers, free-form masking, and generic / full masking.
  • Data Quality: Validates simple rules against a data column and throws a warning or an error if the data doesn’t match the specified rules.
  • Data Dynamic Column: Adds a new column to the output as a constant value, a simple SQL formula, or an Enzo Function.
  • Remove Columns: Remove unwanted columns from the final output.
  • Staging: Sink Data: Allows you to save the entire dataset into a staging table in a SQL Server database, with the option to recreate the table if it already exists.
  • Staging: Custom SQL: Allows you to execute a command on a SQL Server database, and optionally replace the current Data Pipeline DataSet entirely with the returned result. This option is often used with the Sink Data operation.
  • Custom .NET DLL: Calls an external .NET DLL and uses the returning DataTable as the new recordset.

Example: Dynamic Column

This example shows you how to add a new column calc to the source dataset with a float data type. The formula is a SQL Expression:

rental_rate * 20

See Microsoft's documentation on DataColumn.Expression Property for more information on the expected syntax.

When using a DataZen Function, you can use additional data transformation options.

Example: Custom .NET DLL

It is possible to call an external .NET DLL to perform more advanced transformations using the full capabilities of the .NET Framework. Once the DLL has been provided in the DLL Path, you can view the list of available public classes and methods.

The method must be named Apply and have one of the following signatures:
  public DataTable Apply(DataTable table)
  public DataTable Apply(DataTable table, dynamic pipeline)

When accepting a second parameter, the list of connection strings used by the pipeline from other components will be provided as a serialized string.

The .NET DLL should be manually deployed on the server where the Sync Agent is running. The DLL Path provided should be the same on both the client machine and the server. The DLL itself is loaded into memory when the job starts and unloaded upcon completion.

Example: Staging Data

Staging data in a SQL Server database offers a quick way to temporarily copy source data into a table to debug or troubleshoot data issues, audit incoming data, or take further actions in the database. It can also be a way to execute logic on incoming data by leveraging the full power of the SQL syntax in order to run more advanced data transformation logic.

In this example, the Data Pipeline first sinks the data in a staging table (created automatically) called stagedata, then execute a custom SQL command to transform the data and select different columns. By choose the Replace dataset with this result option the Data Pipeline offers the ability to completely transform or replace the dataset sent to the target system.

601 21st St Suite 300
Vero Beach, FL 32960
United States

(561) 921-8669
terms of service
privacy policy









© 2023 - Enzo Unified