The Future of Data Integration: What Every CTO Should Be Aware Of

August 2019

As corporations continue to adopt cloud computing and leveraging external platforms, the need to access disparate, disconnected, and heterogeneous data has never been greater. Cloud computing is forcing the rapid introduction of technologies at a speed that very few predicted, ranging from new data stores, micro service architectures, and scalable serverless platforms. This rate of change is also visible with more traditional services that are increasingly becoming available “as a Service”, such as MongoDB’s Atlas platform, IBM MQ on Cloud and Azure SQL Databases to name a few.

In addition to the increase in the technology rate of change, corporations are facing an increasing number of Software as a Service (SaaS) providers that have emerged over the years in a way that are disrupting the data landscape at its very core, such as SalesForce. Even older platforms with deep roots in the Enterprise are being re-platformed to support the SaaS model, such as Office 365 / SharePoint Online and the Oracle Cloud. By shifting to a SaaS model, and cloud hosting in general, information stored in those systems becomes harder to integrate because the data is only made available through API layers instead of common data stores.

This shift in data and service adoption is creating an increasingly difficult landscape to manage and predict. Companies adopting new cloud services and SaaS vendors are investing time and effort in building integration solutions that bridge the natural gap between the entire data ecosystem. As a result, organizations need a new way to rapidly integrate their systems hosted both on premises and in the cloud, and create a seamless information landscape regardless of the shape and location of their data.

As decision makers prepare for the future, they must deal with the reality that many of today’s technologies are becoming outdated. In fact, three technologies in particular will almost certainly be replaced in the next 7-10 years:

Extract, Transform, Load (ETL)

While ETL processes have been invaluable over the years, allowing organizations to move data from one system to the next, most ETL processes contain a large amount of business logic as part of the transformation phase. The challenge of storing business logic in an ETL platform lies in the rigidity of the environment; it isn’t code that can become part of modern deployment automation and testing. As a result, the industry is moving towards an ELT model, where the ‘T’ of transformation happens outside of an integration tool, where it is easier to modify, so organizations can more quickly react to change. In addition ETL processes are meant for batch processing and are not designed to participate in an increasingly real-time event-driven environments.

ODBC Drivers

The first ODBC drivers were built a few decades ago in the early 90s, allowing organizations to tap into complex data sources using SQL commands, unleashing the ability to stitch information together rapidly. While many corporations continue to use ODBC drivers to abstract modern systems and services, their point-to-point deployment model, outdated security capabilities and rigid deployment architecture makes them hard to leverage in modern pipeline-oriented deployment infrastructure. More importantly ODBC drivers are fundamentally limited in their ability to deliver data beyond relational data sets, forcing organizations to solve their data integration challenges using other solutions. ODBC drivers will be replaced by more modern Data as a Service platforms that can leverage with the latest security standards, address the traditional reporting needs through a native relational interface, and integrate with modern platforms through HTTP/REST seamlessly.

Real-Time Data Warehouse

Many organizations currently have a large number of data warehouses and data marts in use, leveraging relational or no-sql databases. These organizations are realizing that their data warehouses suffer from two primary issues: data shape, and timeliness. Both of these issues directly stem from the increase in technology adoption rate and increasingly diverse SaaS hosting models. Absorbing new data models, or changing existing models becomes exceedingly complex, and the ability to react quickly to customer behavior, driven partly from social media interaction, is luring organizations to embark on the real-time data warehouse model. However, real-time data warehouses are likely to lose traction as more real-time streaming capabilities emerge, replacing the need to go to a data store for detecting near-time business events. Indeed, messaging and event streaming platforms are removing the need for complex data stores to achieve business agility.

Modern Business Demands More Flexibility

Today’s consumers have high expectations. The drivers that push organizations towards more decentralized information services, and the rate of change that cloud computing and hosted services introduces will continue to force organizations to rethink how data is ingested, distributed, and turned into useful information. Time-to-market expectations will only continue to get more compressed and drive innovation in the data integration space.

Will your organization still be building integrations with outdated technology, while others are unifying around a more flexible approach?

Now is the time to reconsider traditional ETL processes, data services, and real-time processing capabilities.