LAB: Import Parquet Files into a SQL Server Table

March 2021
Enzo Server
Lab





This hands-on lab will show you step-by-step how you can import one or more multiple Parquet files into a SQL Server database using SQL commands from SQL Server Management Studio (SSMS). Importing Parquet files into SQL Server can be useful to inspect content and integrate with reporting tools.


Although Enzo provides significant flexibility to access ad-hoc data sources, use DATAZEN to import and export Parquet files with automatic ongoing replication and automatic Change Data Capture into any target system, including SQL Server, HTTP Endpoints, Azure Event Bus, AWS SQS, or any database system using ODBC drivers.

  LEARN MORE

Pre-requisites
In order to complete this lab, you will need:
  • A computer (or virtual machine) with Windows 10 Pro or higher
  • SQL Server Express Edition 2016 or higher
  • SQL Server Management Studio
  • Enzo Server installed and running
  • An Azure subscription and an Azure Storage account

The lab will drive you the installation and configuration of the Parquet and AzureStorage adapters for Enzo Server, download and deploy test Parquet files in an Azure Container, and finally import these files into a SQL Server database using SSMS.

This lab uses Azure Storage as the source of parquet files; however Parquet files can also be stored on local drives and AWS S3 Buckets.

To follow this lab, you will need to install Enzo Server on your machine. See this page to download Enzo Server and to review installation instructions.

STEP 1: Parquet and AzureStorage Adapters Setup



This lab requires the Parquet Adapter; however you can still follow parts of this lab without downloading/configuring the AzureStorage adapter; Parquet files will be downloaded first to your local computer, so you will be able to query Parquet files locally.

Let's start by configuring Enzo Server so that you can access Azure Storage, and read Parquet files using SQL commands. This step requires that you have installed Enzo Server and that the Enzo service is running. To begin, start Enzo Manager (the management application for Enzo Server).

Download Parquet and AzureStorage Adapters

From Enzo Manager, open the Adapter Marketplace (under the Tools menu). Find the Parquet adapter and click on the Install button. A screen may appear asking you to enter a license key; click on Generate free 30-day license to generate a trial license key and click OK. The Parquet adapter is now queued for installation. Find the Azure Adapter from the list of adapters, and click on the Install button. Once both adapters are queued for installation, click on Close & Install/Update.

At this point, both adapters have been downloaded. A window will ask you to restart Enzo Server. Once Enzo Server has restarted, you will see both Azure Storage and Parquet adapters listed in Enzo Manager.





Enzo Marketplace - Parquet and AzureStorage adapters

Create and Locate Azure Storage Keys

If you already have an Azure subscription and already have an Azure Storage Account with Access Keys, you can skipt his section. Before you can configure the AzureStorage adapter, you will need to have a Microsoft Azure subscription. If you do not have an Azure account, you can sign-up by visiting the https://portal.azure.com website. Once you have an Azure subscription, follow the steps from the Azure documentation to create an Azure Storage service.

Once you have created the Azure Storage service in Azure, locate the Access Keys of your Storage Account. They can be found in the Azure Portal. Follow the Microsoft documentation for instructions on how to create and find the Access Keys that are needed in the following section.

Configure the AzureStorage Adapter

Back in Enzo Manager, click on the AzureStorage adapter, and click on the New Config... button. This screen will require the following information:

  • Config Name: Enter a friendly name for this setting (ex: default)
  • account: The name of the Azure Storage Account you created above
  • key: The Access Key for the Azure Storage Account
  • useSSL: Check the box to use an encrypted connection
  • retries: Enter 1 for the retry count

Check the box indicating this will be the default setting, and click OK. The following screen should now show your settings for the AzureStorage adapter.

Make sure the configuration setting is marked as the Default setting. A star (*) should be visible next the configuration name specified. If you do not see the star, click on the Set as Default button.





Sample AzureStorage configuration setting in Enzo Manager

To validate the configuration settings provided, open the Data Explorer window in Enzo Manager (Tools -> New Data Explorer Window...). Enter the following command and click on Execute.

SELECT * FROM AzureStorage.BlobContainers

Let's create a new Blob Container in which we will download the Parquet files later.

EXEC AzureStorage.CreateBlobContainer 'parquet'



Sample AzureStorage configuration setting in Enzo Manager

Configure the Parquet Adapter

Click on the Parquet adapter, and click on the New Config... button. This screen will require the following information:

  • Config Name: Enter a friendly name for this setting (ex: config1)
  • defaultFilePath: Leave this setting empty
  • schemaFile: Leave this setting empty

Check the box indicating this will be the default setting, and click OK. The following screen should now show your settings for the AzureStorage adapter.





Sample Parquet configuration setting in Enzo Manager

This lab doesn't require default settings for the Parquet adapter, so the above settings can be left empty.

Download the following Zip file (https://www.enzounified.com/downloads/userdataparquet.zip) and unzip its content so that all the files can be found under you local C:\ drive under the c:\tmp\parquet\userdata\ directory (you will need to create the directory structure as needed).

To test the Parquet adapter, go back to the Data Explorer window and run the following command:

SELECT * FROM Parquet.rows WHERE fileName='c:\tmp\parquet\userdata\userdata*'

The fileName parameter allows the use of wildcards; when a wildcard is identified, the Parquet adapter will execute the query on all the files that match the wildcard and return a combined resultset. This is referred to as automatic "sharding". You can also query a single file by providing the filename without wildcard (for example: userdata1.parquet).

Reading multiple files in parallel (sharding) is available with a paid edition of Enzo Server.

STEP 2: Copy Parquet Files in Azure



In this section, we are going to upload the Parquet files into the Azure Storage container created in Step 1 using SQL commands. You can also upload the files manually from the Azure Portal; however using SQL commands allows you to automate data movement in/out of Azure.

Execute the following commands from the Data Explorer window (see Step 1 on how to open Data Explorer in Enzo Manager). This uploads the parquet files in Azure.

EXEC AzureStorage.SaveFileBlob 'parquet', 'userdata1.parquet', 'c:\tmp\parquet\userdata\userdata1.parquet'
EXEC AzureStorage.SaveFileBlob 'parquet', 'userdata2.parquet', 'c:\tmp\parquet\userdata\userdata2.parquet'
EXEC AzureStorage.SaveFileBlob 'parquet', 'userdata3.parquet', 'c:\tmp\parquet\userdata\userdata3.parquet'
EXEC AzureStorage.SaveFileBlob 'parquet', 'userdata4.parquet', 'c:\tmp\parquet\userdata\userdata4.parquet'
EXEC AzureStorage.SaveFileBlob 'parquet', 'userdata5.parquet', 'c:\tmp\parquet\userdata\userdata5.parquet'
EXEC AzureStorage.SaveFileBlob 'parquet', 'headersonly.parquet', 'c:\tmp\parquet\userdata\headersonly.parquet'

To verify the files have been downloaded successfully, execute the following command:

SELECT * FROM AzureStorage.Blobs WHERE container='parquet'





Upload Parquet files in an Azure Storage Container

STEP 3: Query Parquet Files in Azure from SSMS



Now that the Parquet files are in Azure, let's use SQL Server Management Studio (SSMS) to query the files directly. As we saw previously, it is possible to query Parquet files stored in a local directory; in this section we will query Parquet files stored in Azure.

Let's first connect to Enzo Server using SSMS.

For instructions on how to connect to Enzo Server from SSMS, please see the Quick Start section in the Enzo Server online documentation.



Connect to Enzo Server from SSMS

At this point, SSMS is connected to Enzo Server; you can query Enzo Server using SQL commands. Let's query the Parquet files as we did previously, except that the connection will be made to Azure. To retrieve data from Parquet files in Azure, execute the following command from SSMS:

SELECT * FROM parquet.rows WHERE 
	filename='azurestorage@parquet://userdata?.parquet'
	AND country='Italy'

You can also access Parquet files stored in S3 buckets; this requires installing the S3 adapter. The SQL command looks similar: SELECT * FROM Parquet.Rows WHERE filename='s3@bucket://filename.parquet'

Reading multiple files in parallel (sharding) is available with a paid edition of Enzo Server.



Query Parquet Files in Azure from SSMS

STEP 4: Import Parquet Files in SQL Server



Now that we can query Parquet files stored in Azure, let's import the data into a SQL Server table. We first need to setup a Linked Server from SQL Server to Enzo Server so that we can import the data into SQL Server directly.

To configure a Linked Server to Enzo Server, see the online documentation here. Once completed, open a new Query Window in SSMS, and connect directly to SQL Server (not Enzo Server), and create a test database with a target table:

CREATE DATABASE testload
GO
USE testload
GO
CREATE TABLE tbl ([registration_dttm] datetime, [id] int, first_name nvarchar(100), 
	last_name nvarchar(100), email nvarchar(255), gender nvarchar(25), 
	ip_address nvarchar(50), cc nvarchar(50), country nvarchar(50), birthdate nvarchar(50), 
	salary decimal(10,2), title nvarchar(50), comments nvarchar(255))

To explore a simpler and more consistent way of importing data into any database platform, including the ability to only import changes into the target table, see Enzo DataZen.

The simplest way to import data into SQL Server is to use an EXEC command against Enzo Server. The following command will import all the rows returned found in the Parquet files in Azure into the local SQL Server table. Note: this command assumes that the Linked Server connection is called 'localhost,9550'.

INSERT INTO tbl 
EXEC [localhost,9550].BSC.parquet.readrows  
	null,
	'azurestorage@parquet://userdata?.parquet'

SELECT * FROM tbl 



Import Parquet Files in SQL Server with Enzo Server

At this point, all available records found in Parquet files has been loaded into a local SQL Server table, which can be queries using the full power of the SQL language.



Conclusion

In this blog, we reviewed how we can explore and query Parquet files using Enzo Server, and how to import Parquet files stored in Azure into a local SQL Server database table. For more information, including how to automate data ingestion and how to create Parquet files with Enzo Server, please contact us.









Want to see DataZen in action?





NEW !!! INTRODUCING DATAZEN

Secured Corporate, Cloud and B2B Data Replication

  LEARN MORE   CONTACT US





Any Source

Combined with Enzo Unified, DataZen allows you to replicate data from virtually any source system, including Twitter, SharePoint Online lists, flat files, NoSQL data, or any ODBC data sources.



Flexible Cross Database Replication

Replicate tables between Oracle, MySQL, SQL Server, Teradata, DB2 or any other relational database, either located on the same network, or across the globe.




Secured B2B Data Replication

Securely share specific tables/views from any internal database with business partners using PGP encryption and cloud drives.



Cloud Adoption

By detecting changes at the source, DataZen forwards only new, updated and deleted records to the target systems, saving bandwidth and helping with cloud adoption.







To learn more about configuration options, and to learn about the capabilities of DataZen, download the User Guide now.

  USER GUIDE