SOLUTIONS PARTNERS BLOGS CONTACT

 Introduction
 Concepts
   Sync Agents
   Jobs Readers & Writers
   Synthetic CDC
 Security
 Connections 
     Overview
     Databases:
      SQL Server
      MySQL
      Oracle
      Teradata
      ODBC
     Drives/FTP:
      Local Path/UNC
      AWS S3 Bucket
      Azure Container/ADLS
      Google Drive
      FTP
     HTTP/S:
      HTTP/S Connection
      Rate Limiting
      OAuth Tokens
     Messaging:
      AWS SQS
      Azure Event Hub
      Azure Service Bus
      Google PubSub
      Kafka
      MSMQ
      RabbitMQ
     Big Data
      Google BigQuery
      Azure CosmosDB
 Jobs
     High Watermark Values
     Dynamic Jobs
     Job Readers 
      Database Reader
      Drive Reader
      HTTP/S Reader
      Messaging Consumer
      Big Data

     Job Writers 
      Create From Reader
      Database Target
      Drive Target
      HTTP/S Target
      Messaging Producer
      Big Data Target

     Triggers
     Data Pipelines
 DataZen Functions
 Resync & Replay
 Sync Agent API

Data Pipeline

Jobs can define a Data Pipeline that can translate or transform the incoming dataset before being saved to the Sync File or sent to the target system.

The execution of the Data Pipeline follows the order in which the pipeline components are listed. You can change the order of each component by using the up/down arrows.

To preview the data pipeline, click on the Apply button. Since the Target System uses the output of the data pipeline, if one is defined, it is important to execute the pipeline before configuring the target system options.

When defining a Direct Job, a single Data Pipeline can be defined. It is applied as the data is being read (before saving to the intermediate Sync File).

When creating a Job Writer, a secondary Data Pipeline can be defined specific to the target system. Doing so allows data engineers to separate the translation of the source data that should apply to all future consumers (such as encrypting a sensitive field) from a target-specific translation/transformation.

Pipeline Components

A number of built-in pipeline components are available in DataZen:

  • Data Filtering: Applies a secondary filter to the data by adding a SQL Where clause as a Data Filter.
  • Data Hashing: Applies a hash algorithm to a selected data column (must be a string data type); supported hashing algorithms are MD5, SHA1, SHA256, SHA384 and SHA512.
  • Data Masking: Applies masking logic to a selected data column, such as credit card number or a phone number. Supports generating random numbers, free-form masking, and generic / full masking.
  • Data Quality: Validates simple rules against a data column and throws a warning or an error if the data doesn’t match the specified rules.
  • Data Dynamic Column: Adds a new column to the output as a constant value, a simple SQL formula, or an Enzo Function.
  • Remove Columns: Remove unwanted columns from the final output.
  • Custom .NET DLL: Calls an external .NET DLL and uses the returning DataTable as the new recordset.

Example: Dynamic Column

This example shows you how to add a new column calc to the source dataset with a float data type. The formula is a SQL Expression:

rental_rate * 20

See Microsoft's documentation on DataColumn.Expression Property for more information on the expected syntax.

When using a DataZen Function, you can use additional data transformation options.

Example: Custom .NET DLL

It is possible to call an external .NET DLL to perform more advanced transformations using the full capabilities of the .NET Framework. Once the DLL has been provided in the DLL Path, you can view the list of available public classes and methods.

The method must be named Apply and have one of the following signatures:
  public DataTable Apply(DataTable table)
  public DataTable Apply(DataTable table, dynamic pipeline)

When accepting a second parameter, the list of connection strings used by the pipeline from other components will be provided as a serialized string.

The .NET DLL should be manually deployed on the server where the Sync Agent is running. The DLL Path provided should be the same on both the client machine and the server. The DLL itself is loaded into memory when the job starts and unloaded upcon completion.






601 21st St Suite 300
Vero Beach, FL 32960
United States

(561) 921-8669
info@enzounified.com
terms of service
privacy policy

PRODUCTS

ENZO SERVER
ENZO DATAZEN

SOLUTIONS

SOLUTIONS OVERVIEW
LOW-CODE INTEGRATION
SHAREPOINT INTEGRATION
RABBITMQ INTEGRATION
HYBRID SHARDING
READ/WRITE PARQUET FILES
SQL SERVER HYBRID QUERIES

RESOURCES

ENZO AZURE VM
BLOGS & VIDEOS
IN THE NEWS
ENZO ADAPTERS
ONLINE DOCUMENTATION
TCO CALCULATOR

COMPANY

LEADERSHIP TEAM
PARTNERS


© 2023 - Enzo Unified