/howto/sharding/Overview

Sharding Overview

This section introduces you to the Shard adapter which allows you to query (and optioanlly update) multiple heterogeneous data sources in parallel. The Shard adapter allows you to merge data from multiple sources, including SaaS platforms, flat files, nosql and relational databases, and present the output as a combined result set (rows and columns), or as a XML/JSON document through HTTP/S calls.

The Shard adapter aggregates data from multiple sources in real-time, regardless of their storage format, communication protocol, or authentication mechanism, and exposes a metadata layer that is accessible using SQL or REST commands. The conceptual diagram below shows a Shard View that aggregates data from from SalesForce, a flat file, and SharePoint Online. The aggregation mechanism is performed in parallel by Enzo, and the WHERE clause of the original request (if any) is forwarded to the underlying system whenever possible for maximum performance.

The Shard adapter provides advanced capabilities, such as:

Multithreaded Data Aggregation: aggregates data from multiple sources using parallelism
Heterogeneous Aggregation: merges data from disparate data sources into a single, uniform output
Filter Logic Propagation: pushes WHERE clause logic to downstream systems for higher performance when possible
Metadata Layer: creates a View definition that offers strong schema binding
SQL and HTTP/S: exposes a shard through SQL and HTTP/S calls
Hierarchical views: supports shard layering to query subsets of a shard

This article provides a simple use case of the Shard adapter: SQL Server table sharding.

The example provided here is the most basic form of sharding, and does not require advanced configuration settings. The Shard adapter is the only one required to follow the sample commands.

Database Setup

Let's first setup a few things before we can start working with the Shard adapter.

SQL Server Database

The example provided in this article uses two separate SQL Server databases, and a single table in each database. For simplicity each database will be stored on the same SQL Server instance, however this is not a requirement for the adapter to work; the Shard adapter can merge data from any number of tables stored on any number of database servers (including from different RDBMS engines).

Create Two Databases and Tables

Let's create two databases in the same SQL Server instance; we will also create a table in each, and add a few records for testing purposes. Note that the schemas of each table are slightly different to show how you can merge data sources with different column names. Open SSMS, connect to your SQL Server instance and run the following T-SQL batch:

The SQL commands in this section must be executed against a SQL Server database directly.

CREATE DATABASE [CRM-FL]
GO

CREATE DATABASE [CRM-CA]
GO

CREATE TABLE [CRM-FL].dbo.[Customers] (
	ID int identity(1,1) NOT NULL,
	[Name] nvarchar(50) NOT NULL,
	[UserId] nvarchar(100) NOT NULL,
	Active bit NULL
)

CREATE TABLE [CRM-CA].dbo.[PastCustomers] (
	RecordId int identity(1,1) NOT NULL,
	[FullName] nvarchar(50) NOT NULL,
	[UserId] nvarchar(100) NOT NULL,
	CustomerSince int NULL
)

INSERT INTO [CRM-FL].dbo.Customers VALUES
	('user1', 'user1', 1),
	('user2', 'user2', 1),
	('user3', 'user3', 1),
	('user4', 'user4', 1),
	('user5', 'user5', 1)
INSERT INTO [CRM-CA].dbo.PastCustomers VALUES
	('pastuser1', 'pastuser1', 2019),
	('pastuser2', 'pastuser2', 2019),
	('pastuser3', 'pastuser3', 2018),
	('pastuser4', 'pastuser4', 2017),
	('pastuser5', 'pastuser5', 2015)

The above script will create a few records in two different tables. We can now view the records with the following command:

SELECT * FROM [CRM-FL].dbo.Customers
SELECT * FROM [CRM-CA].dbo.PastCustomers

Enzo Configuration

Create Connection Strings

Now that the two databases have been created, let's create a Connection Strings in Enzo that points to the SQL Server instance that holds the two databases. The simplest way is to use Enzo Manager to create the Connection String as explained in this documentation, and name the connection string sqlserverdata.

Alternatively, you can open SSMS, connect to Enzo directly and execute this command (be sure to set your database connection string correctly):

The SQL commands in this section must be executed against Enzo directly. See the Quick Start section for details on how to connect to Enzo.

exec Sys.Vault.ConnectionStringCreate 'sqlserverdata', 'server=localhost;uid=sa;pwd=YOUR_PWD'

Configure the Shard Adapter

Finally let's configure the Shard adapter. This can be done from the Enzo Manager or using SQL commands. Let's use SQL commands to configure our shard. Open another connection to Enzo using SSMS (make sure you connect to Enzo directly), and run the following script:

EXEC shard._configCreate 
	'crmfl',	-- the name of this data pointer
	0,		-- not a default configuration setting
	'data',		-- a data pointer
	'sqlserverdata',	-- name of the connection string
	'SELECT ID, Name, UserId FROM [CRM-FL].dbo.Customers',  -- the actual SQL command to execute
	'breadcrumb'	-- request a breadcrumb to be added automatically (column: __source__)

The above script creates a pointer to the first table, and is called crmfl; this is not yet the shard per se. We have defined a data entry point (a.k.a. data pointer) that fetches data from the [CRM-FL] database. The breadcrumb option adds an extra column called __source__; it will contain the name of the shard pointer so you can identify which data source the data is coming from.

Let's do the same using the second database, except that we will select columns with an alias so that the second output matches column names from the first table; we will name this data pointer crmca.

EXEC shard._configCreate 
	'crmca',	-- the name of this data pointer
	0,		-- not a default configuration setting
	'data',		-- a data pointer
	'sqlserverdata',	-- name of the connection string
	'SELECT RecordId as ID, FullName as [Name], UserId FROM [CRM-CA].dbo.PastCustomers',  -- the actual SQL command to execute
	'breadcrumb'	-- request a breadcrumb to be added automatically (column: __source__)

We now have two data pointers; we can query each pointer individually using EXEC commands at this time to make sure we obtain the right data:

EXEC shard.ExecShard 'crmfl'
EXEC shard.ExecShard 'crmca'

Create the Shard

The next step is to create the actual shard, which will use the previously created data pointers. The main difference is the connection string: instead of specifying the name of a database Connection String, we are listing the data pointers separated by a comma, and specify this entry as a shard.

EXEC shard._configCreate 
	'crm',	-- the name of this shard pointer
	0,			-- not a default configuration setting
	'shard',		-- shard entry point
	'crmfl,crmca',
	null,
	'breadcrumb' -- keep the breadcrumb

We can now fetch data from both data pointers, through the crm shard created:

EXEC shard.execshard 'crm'

Create and Use an Enzo View

Last but not least, to make this shard easier to query we will create an Enzo View that will point to the crm shard we created previously. The main objective of the Enzo View is to define a firm metadata schema on the shard, and make it available to SELECT commands (including Linked Server connections). The name of the columns is specified in a C# notation below; the string data type is assumed when the data type is missing. Last but not least, creating an Enzo View requires passing the actual command we want to execute (in this case, we want to execute the ExecShard command we just ran above).

EXEC shard._createVirtualTable@ShardTable 'vCRM', 'crm', 'int ID,Name,UserId,__source__', 'EXEC shard.execshard ''crm'''

Run the Shard

Now that an Enzo View has been created, you can run SELECT commands against the Shard directly. The Enzo View called vCRM runs the ExecShard command behind the scenes.

SELECT * FROM shard.vCRM

You can also specify a WHERE clause to filter your data. Because each data pointer runs a SQL command against a relational database, the WHERE clause if forwarded to each data pointed to filter the data at the source for higher performance.

SELECT * FROM shard.vCRM WHERE userId like '%1'

Ad-Hoc Shard Commands

We saw how to build a shard structure with data pointers and a shard pointer and how to execute the default SQL command against the shard. In addition to executing the shard using its configured SQL commands, it is possible to execute an ad-hoc SQL command as long as each data source supports the command format. In this example, the data pointers are all SQL Server databases, so we can send an ad-hoc T-SQL command to the shard. Let's execute a generic command against the shard:

EXEC shard.execshardsql 'crm', 'SELECT * FROM sys.tables'

The above SQL command will return all available tables from each database.

Sharding Overview

Database Setup

SQL Server Database

Create Two Databases and Tables

Enzo Configuration

Create Connection Strings

Configure the Shard Adapter

Create the Shard

Create and Use an Enzo View

Run the Shard

Ad-Hoc Shard Commands

PLATFORM

SOLUTIONS

RESOURCES

COMPANY

BLOGS AND VIDEOS

IN THE NEWS

OTHER

Sharding Overview

Database Setup

SQL Server Database

Create Two Databases and Tables

Enzo Configuration

Create Connection Strings

Configure the Shard Adapter

Create the Shard

Create and Use an Enzo View

Run the Shard

Ad-Hoc Shard Commands

PLATFORM

SOLUTIONS

RESOURCES

COMPANY