Big SQL: SQL on Apache Hadoop Across the Enterprise

存储架构 2017-10-12

Why Big SQL?

Enterprise Data Warehousing (EDW) emerged as a logical home for all enterprise data that captures the essence of all enterprise systems. But in recent years, there’s been an explosion of data being captured from social media, sensors, etc. This rapid growth has put tremendous pressure on traditional systems as they run of space quickly. Due to this new age data explosion, businesses are challenged to handle this rapid growth effectively and efficiently without relinquishing the ability of deriving existing insights, as well as enhance existing business logic to identify new opportunities.

One way to handle the rapid data growth is to offload data to Hadoop and free up space in an existing data warehouse or directly load the raw data to Hadoop. Hadoop is a highly scalable and low-cost storage platform where jobs are distributed across all the servers (nodes) in the cluster in parallel. When these options are discussed with business at this juncture of exploding data warehouses, the immediate concern is can we query against data what is distributed across relational databases and Hadoop. This perfectly sets the stage to introduce IBM Big SQL.

What is Big SQL?

IBM has invested decades of research in building a robust engine that can efficiently execute queries even when they are complex for relational data. Big SQL leverages that very engine but adapted it to handle Hadoop data. Some strengths that Big SQL inherently possess are advanced SQL compiler and cost based optimizer for efficient query execution. Combining these strengths with a massive parallel processing (MPP) engine helps distribute query execution across nodes in a cluster.

Why is IBM Big SQL an attractive option for data on Hadoop?

IBM Big SQL brings advanced SQL query engine capabilities to the Hadoop ecosystem that were typically available only for relational databases until now. Some of the core strengths of Big SQL are: SQL compatible, ANSI SQL compliant, federation, high performance, high concurrency, data security, automatic workload management, automatic memory management, application portability and many more.

There are many SQL engines on Hadoop that claim to be ANSI SQL compliant. But why is SQL compatibility important? All relational databases follow the ANSI SQL standards. They also add a flavor of some specific SQL types that differentiates them from other relational databases. In data warehouse offload use cases, the data comes from Oracle, Db2, Netezza or any such relational data warehouses. Businesses have invested on developing applications that generate reports or insights against those warehouses. Now when you want to offload that data to Hadoop, what will happen to the applications? Can it be re-used?

Offload data from Oracle, Db2 or Netezza

To make the relational SQL differences obscure and seamless, Big SQL brings the ability to understand not only the generic ANSI SQL, but also SQL types specific to Oracle, Db2 and Netezza. Therefore, when you offload data from Oracle, Db2 or Netezza, the applications can be easily ported without any changes made. This simplifies the planning and execution of data warehouse offloading use cases as well as time and money spent on re-writing the applications to work on Hadoop. Another advantage is the SQL skills that engineer possess can be used against Hadoop.

Federation capability

In addition, with Big SQL, you can not only efficiently query data on Hadoop, but also combine data that is spread around different enterprise data warehouses. The federation capability of Big SQL lets you not only query against and combine with Hadoop data, but it also lets you pushdown predicates. Therefore, not all data moves back and forth between the systems. Only the results of the predicates are sent back to combine with Hadoop data.

SQL compatibilityandANSI SQL compliant

This feature is a big plus for businesses who have invested a lot of time and money in building a comprehensive enterprise data warehouse because all the time and money invested is not obsolete anymore. The applications and SQL skills can be continued with data on Hadoop, thereby not adding additional time and money requirement when you want to add a Hadoop warehouse or data lake in the enterprise. SQL compatibility and ANSI SQL compliant engine in Big SQL enables seamless transfer of applications and SQL skills to execute SQL and PL/SQL statements.

With the Hortonworks partnership, IBM Big SQL is tightly integrated with Hortonworks Data Platform to provide businesses a robust, reliable, and resilient environment to maximize existing business and identify new business opportunities.

Some useful links:

A short video on IBM Big SQL:

To learn more about HDP and IBM Big SQL and also try the Sandbox with tutorials, go to



SQL or Table Replace the values &ZeroWidthSpac... My problem with the data is Id Name -------- 1 a 2 b 1 a 1 a 1 NULL 2 b 2 b 2 NULL 2 b 2 NULL ID is unique to the name...
Unicode code for SQL Server 2008 text I have a problem uinsg SQL Server 2008. I need to write a query with selection and join of several tables. It's all OK. For about 32000 rows. But...
Using SQL Server Integration Services to Export Da... By: Parvathy Natraj | | Related Tips:More > Integration Services Development Free Webcast - Development Solutions for Multiple Databa...
T-SQL Enhancement in SQL Server 2005 在 第一部分 中,我们讨论了 APPLY 和 CTE 这两个T-SQL Enhancement。APPLY实现了Table和TVF的Join,CTE通过创建“临时的View”的方式使问题化繁为简。现在我们接着来讨论另外两个重要的T-SQL Enhancement Items: PIVOT 和 R...
Hadoop Weekly Issue #185 05 September 2016 This is a short and sweet issue covering Facebook's usage of Spark, a comparison between Apache Flink and Apache Kafka Streams, ne...