Sybase IQ

SAP IQ (also known as SAP Sybase IQ or Sybase IQ) is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited[1] with pioneering the commercialization of column-store technology.

At the foundation of SAP IQ lies a column store technology that allows for speed compression and ad-hoc analysis. SAP IQ has an open interface approach towards its ecosystem. SAP IQ is also integrated with SAP’s Business Intelligence portfolio of products to form an end-to-end business analytics software stack, and is an integral component of SAP's In-Memory Data Fabric Architecture and Data Management Platform.

History

In the early 1990s, Waltham, Massachusetts-based Expressway Technologies, Inc. developed the Expressway 103, a column-based, engine optimized for analytics, that would eventually become Sybase IQ. Sybase acquired Expressway and re-introduced the product in 1995 as IQ Accelerator, then renamed it shortly thereafter to Sybase IQ, giving it version number 11.0.[2]

By offering the IQ product as part of a collection of related technologies often found in a data warehouse (including Sybase Adaptive Server Enterprise, Replication Server, PowerDesigner PowerDesigner, and SQL Anywhere), Sybase became one of the first mainstream companies to acknowledge the need for specialized products for the data warehouse market.[3]

With version 12.0, Sybase replaced the loosely coupled query interface from Adaptive Server Enterprise with a tight coupling with SQL Anywhere.

Version 16 brings a re-engineered column store for extreme, petabyte scale, data volumes, and more extreme data compression.[4]

In 2014, SAP HANA, together with partners BMMsoft, HP, Intel, NetApp, and Red Hat announced the world’s largest data warehouse. A team of engineers from SAP, BMMsoft, HP, Intel, NetApp, and Red Hat, built the data warehouse using SAP HANA and SAP IQ 16, with BMMsoft Federated EDMT running on HP DL580 servers using Intel Xeon E7-4870 processors under Red Hat Enterprise Linux 6 and NetApp FAS6290 and E5460 storage. The development and testing of the 12.1PB data warehouse was conducted by the SAP/Intel Petascale lab in Santa Clara, Calif., and audited by InfoSizing, an independent Transaction Processing Council certified auditor.[5]

Version history

With the release of SP08, the version numbers have been changed to align with and match SAP HANA’s version numbers to reflect the product’s continuous integration with SAP HANA. The actual release title SP03 is a follow-on to SP02, covering all platforms not affected by the release.

In-memory data fabric

SAP’s new approach streamlines and simplifies Data Warehousing into an In-Memory Data Fabric.[6]

SAP's In-Memory Data Fabric Architecture

SAP IQ with SAP HANA

With the advent of big data, SAP IQ has coupled with SAP HANA to deliver a distributed in-memory analytics platform. There are three main applications and use cases which try to capitalize on SAP IQ’s strengths concerning scalability and performance as an EDW and big data processor, while leveraging SAP HANA’s in-memory speed for operational reporting:

SAP HANA for operational reporting with SAP IQ for big data processing (NLS)

In this scenario, SAP Enterprise Resource Planning (ERP) data goes into SAP HANA which acts as an operational data store for immediate analysis. Once the data is analyzed it is integrated into SAP IQ via Near-line storage mechanisms (as described above). Here SAP IQ acts as an enterprise data warehouse that receives data from a variety of traditional sources (such as OLTP Databases and files systems), and SAP HANA Operational Data Store(ODS)[7]

SAP IQ as an Enterprise Data Warehouse (EDW) with SAP HANA as Agile Data Mart

When SAP IQ is used as an EDW, it can also be augmented with HANA’s in-memory technology. Common uses include planning and analysis reports where simultaneous OLTP processing is needed. In this case, data flows from SAP IQ to SAP HANA.[7] SAP BusinessObjects BI can be used to achieve visibility across both platforms.

Technology

SAP IQ 16 Engine

To a user, SAP IQ looks just like any relational DBMS with a SQL-based language layer accessible via ODBC/JDBC drivers. However, inside, Sybase IQ is a column-oriented DBMS, which stores data tables as sections of columns of data rather than as rows of data like most transactional databases.

Column-Store Architecture

Column-orientation has a number of advantages.[8] If a search is being done for items matching a particular value in a column of data, only the storage objects corresponding to that data column within the table need to be accessed. A. A traditional row-based database would have to read the whole table, top to bottom. Another advantage is that when indexed correctly, a value that would have to be stored once in each row of data in a traditional database is stored only once, and in SAP IQ, an n-bitindex is used to access the data.[9] Nbit and tiered indexing is used to allow for increased compression and fast, incremental batch loads.

Additionally, the column-based storage enables SAP IQ to compress data efficiently on the fly.[10]

Indexing Technology

Prior to SAP IQ 16, each data page was structured as an array of cells of a fixed size, so all values have the same data type. While this storage approach is efficient for structured and fixed length data, this does not hold for the more unstructured and variable sized data that is seen today. To combat storage inefficiency and store variable sized data with minimal wasted space, each page is composed of cells of a variable size that are packed tightly together; the column store architecture supports a variable number of cells per page and various page formats within a column. SAP IQ also applies Lempel-Ziv-Welch ( LZW ) compression algorithms[11] to each data page when it is written to disk, to significantly reduce data volume.[12]

Bitmaps are used for secondary indexes.[11]

Massively Parallel Processing Framework

SAP IQ has a massively parallel processing (MPP) framework based on a shared-everything environment that supports distributed query processing. Most other products capable of MPP tend to be based on shared-nothing environments. The benefit of shared-everything is that it's more flexible in terms of the variety of queries that can be optimized—especially for balancing the needs of many concurrent users. The downside is that in extreme cases, competition among processors to access a shared pool of storage (usually a storage-area network), can lead to I/O contention, which affects query performance.[12]

However, the aforementioned storage architecture of SAP IQ allows compute and storage layers to scale out independently of each other and also allows these resources to be provisioned on-demand for better utilization without restructuring the underlying database.

Multiplex Architecture

SAP IQ uses a clustered grid architecture, which is made up of clusters of SAP IQ servers, or Multiplex. These clusters are used to scale performance for large numbers of concurrent queries or queries that are great in complexity. This is built upon a shared-everything architecture where all compute nodes interact with the same shared storage and queries have the ability to distribute across all compute nodes. The Multiplex has a coordinator node which manages the database catalog and coordinates transactional writes to the store. Other nodes can be reader only nodes, or readers and writes, like the coordinator node. The storage fabric can be implemented with numerous technologies that allow sharing amongst the multiplex nodes.

This architecture has multiple uses, including workload balancing and elastic virtual data marts. Workload balancing is achieved by the SAP IQ query engine through dynamically increasing/decreasing parallelism in response to changes in server activity. There is automatic failover if a node stops participating in a query, and other nodes will pick up work originally assigned to the failed node so the query can complete. On the client side, compatibility with external load balances ensures that queries are initiated on physical servers in a balanced fashion to eliminate bottlenecks. Physical nodes in the Multiplex can be grouped together into “logical servers” which allow workloads to be isolated from each other (for security or resource balancing purposes); machines can be added to these as demand changes. The aim of the grid architecture is to enable resiliency even during global transactions.

Loading Engine

The SAP IQ Loading Engine can be used for incremental batch, low latency, concurrent loading, and bulk loading (with both client and server data files). The bulk loading process allows for multiple load process to occur simultaneously, if the loads are of different tables. Data can be loaded from other databases as well as files. Page-level snapshot versioning allows concurrent loads and queries, with locking occurring at the table level only. With SAP Replication Server, now enhanced to optimize loads into SAP IQ, transactions are compiled into the fewest set of operations, and then bulk micro-batch loads into SAP IQ are performed, which gives the appearance of real-time, continuous loads.

The bulk loader now performs all operations in parallel to make full use of all server cores, remove bottlenecks, and keep all threads productive, instead of serializing the process. The loading process remains a two-phase process, first reading raw data and creating FP indexes, and second creating secondary indexes, but everything is executed in parallel. High Group indexes, which the query optimizer relies on for information about which columns/rows contain which data values, are now structured as a set of tiers, increasing as you move down the pyramid.

Lastly, SAP IQ introduces a write-optimized, Row Level Versioned (RLV) Delta store which enables high-velocity data loads and fast availability of data to users. This store is minimally indexes and compressed, with row-level locking for concurrent write, and its own transaction log and is append only, and acts as a companion to the main store, with data being loaded at high speed to the RLV store, and migrating to the main store later, merging to it periodically. To the user it does not appear as though there are two separate entities at work and queries operate transparently across the two stores. To make use of this the users can specify particular “hot” database tables as RLV tables.

Framework and Client APIs

SAP IQ offers query APIs based on pure ANSI SQL standards (with few restrictions), that include OLAP and full-text search support. Stored procedures are supported in both ANSI SQL and Transact-SQL dialects, and can be executed on a scheduled or immediate basis. As well there are database drivers for a variety of programming languages such as JAVA, C/C++m PHP, PERL, Python, Ruby, and ADO.Net.

Handling of Unstructured Data

SAP IQ is an analytics engine that can query both structured and unstructured data and join the results together. SAP IQ introduced a new text index and an SQL “contains” clause to facilitate this by searching for terms within a blob of unstructured text; SAP Sybase’s partnerships with vendors allow for various binary forms of text files to be ingested into SAP IQ and text indexes created for them; these text indexes get the data ready for higher level text analysis applications to perform full-text searches within SAP IQ via SELECT statements. SELECT syntax can be used by applications performing tokenization, categorization and further text analysis.

In-Database Analytics / Extensibility Framework

In-database analytics are built upon the fundamental concept of keeping analytics algorithms close to the data for higher performance. The extensibility framework, called “in-database analytics” enables embedding of analytic functions inside the database engine of SAP IQ, moving analytics into the database, instead of to a specialized environment out of the database, a process which is error prone and slower. Pre-built functions are available natively and via partners of SAP IQ providing specialized statistical and data mining libraries that plug into SAP IQ. This framework increases SAP IQ’s power to do advance processing and analysis as the data does not have to be moved out of the database into a specialized environment for analytics. All data and results obtained can be shared though the DBMS and can be easily acquired through an SQL interface. Through user defined functions (UDFS) partners can extend the DBMS with custom computations, by providing a specialized statistical and data mining libraries that plug right into SAP IQ to enhance its performance of advanced processing and analysis.

Security

SAP IQ provides several features, both included in the base product and licensable separately, to help protect the security of the user’s data. A new feature introduced in IQ 16 is the Role Based Access Control (RBAC) which enables for a separation of duties and upholds the principle of least privilege, by allowing the breakdown of privileged operations into fine-grained sets that can be individually granted to users. Included as part of the base product are: users, groups and permissions, database administration authorities, user login policies, database encryption, transport-layer security, IPV6, role-based access control, and database auditing. Additional features are part of a licensable option called the advanced security option: FIPS encryption, Kerberos authentication, LDAP authentication, and Database column encryption.

Information Lifecycle Management (ILM)

As part of ILM, SAP IQ allows users to create multiple user DBSpaces (logical units of storage/containers for database objects) for organizing data. This can be used to separate structured or unstructured data, group it together according to age and value, or to partition table data. DBSpaces can also be marked as read-only to enable one-time consistency checking and back-up. Another application of ILM is the ability to partition tables, and place moving portions along the storage fabric and backup capabilities; this enables a storage management process where data cycles through tiered storage, moving from faster more expensive storage to slower, cheaper storage as it ages, partitioning data according to value.

High availability and Disaster Recovery

The multiplex set-up provides scalability and High Availability for compute nodes because a multiplex coordinator node can failover to an alternate coordinator node.

The SAP IQ Virtual Backup also allows users to quickly backup data, and along with storage replication technology, data is continuously copied so backups can occur quickly and “behind the scenes”. Once virtual backups are completed they can be verified through test and restore; enterprise data can be copied for development and testing. Then all that’s left is to complete the backup at a transactionally consistent point in time. SAP claims that disaster recovery is easier with a shared everything approach to MPP. The SAP Sybase PowerDesigner modelling tool enables users to build an ILM model that can be deployed with SAP IQ. Storage types, DBSpaces, and lifecycle phases can be defined in an ILM model and the tool can be used to generate reports and create partition creation and movement scripts.

Hadoop Integration

SAP IQ provides federation with the Hadoop distributed file system (HDFS), a very popular framework for big data, so that enterprise users can continue to store data in Hadoop and utilize its benefits. Integration is achieved in four different ways, depending on the user’s needs, through client-side federation, ETL, data, and query federation. Client-side federation joins data from IQ and Hadoop at a client application level while ETL federation lets the user load Hadoop data into the column store schemas of IQ. HDFS data can also be joined with IQ data on the fly through SQL queries from IQ, and finally results of MapReduce jobs can be combined with IQ data, also on the fly.

SAP Control Center

SAP Control Center replaces Sybase Central as a Web-Based graphical tool for administration and monitoring. SAP Control Center can be used for monitoring SAP Sybase servers and resources (node, multiplex) from any location, as well as monitoring performance, and spotting usage trends, The web application has a multi-tiered plug-in architecture which is made up of a server and product-based agents that bring SAP Sybase performance back to the Control Center Server.

Web-Enabled Analytics

SAP IQ comes with web-enabled application drivers facilitating access to SAP IQ from a variety of Web 2.0 programming and execution environments (Python, Perl, PHP, .Net, Ruby). Through query federation with other databases, developers can build applications that interact with multiple data sources at the same time (as well as with database platforms from other vendors). Federation proxy tables that map to tables in external databases can be created; these are materialized as in-memory tables but that can be interacted with as though they resided within SAP IQ. This way data sources can be combined in a unified view.

Supported Platforms

SAP IQ also supports plugging in external algorithms written in C++ and Java. SQL queries can call these algorithms, allowing for the execution of in-database analytics, which provides better performance and scalability. Additionally, Sybase IQ also provides drivers for access via languages such as PHP, Perl, Python, and Ruby on Rails.

SAP IQ supports most major operating system platforms, including:

Customers

Sybase claims that Sybase IQ is currently installed in over 2000 customer sites. Notable customers include comScore Inc.,[13] CoreLogic, Investment Technology Group (ITG),[14] and the U.S. Internal Revenue Service (IRS).[15]

While Sybase IQ has been widely used for focused, data-mart-style deployments,[16] it has also been deployed as an enterprise data warehouse.

Competitors

SAP IQ competes against “RDBMS platforms used for enterprise data warehouses (Oracle, IBM, Microsoft) and specialty analytics platforms (Teradata, Oracle Exadata, IBM/Netezza).”

References

  1. C-Store: A column-oriented DBMS, Stonebraker et al., Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005
  2. Cole, Barb (November 7, 1994), Sybase makes a data warehousing play via acquisition, Network World
  3. Moore, Trevor (2010), Sybase IQ Survival Guide, p. 16, ISBN 978-1-4466-5758-4
  4. http://scn.sap.com/community/developer-center/analytic-server/blog/2013/04/01/sap-sybase-iq-16-for-xldb-analytics-now-available
  5. "SAP and Partners Set New Record for World's Largest Data Warehouse". Press Release. SAP. March 5, 2014. Retrieved August 19, 2016.
  6. http://www.sap.com/pc/tech/data-warehousing.html
  7. 1 2 http://scn.sap.com/community/bi-platform/businessobjects-bi-for-sap/blog/2013/11/27/how-does-all-this-work-together-bw-bw-on-hana-suite-on-hana-hana-live-part-8
  8. MacNicol, Roger; French, Blaine (August 2004), Sybase IQ Multiplex – Designed For Analytics (PDF), Proceedings of the 31st VLDB Conference, Trondheim, Norway
  9. Sybase IQ#cite note-Moore-1
  10. http://www.sap.com/bin/sapcom/en_us/downloadasset.2013-09-sep-23-13.achieving-analytics-that-are-both-cost-effective-and-optimized-for-performance-pdf.html
  11. 1 2 http://blasthemy.com/sap/TechEd13/1_Session_PDFs/RDP/RDP107/RDP107.pdf
  12. http://www.doblerconsulting.com/wp-site/wp-content/uploads/SAPSybaseIQWhatsNewin16SeptemberWebinarfinal.pdf
  13. Henschen, Doug (November 24, 2010), ComScore’s Big Data Deployment In Detail, Information Week
  14. Clark, Don (November 18, 2007), Start-Ups Mine Database Field — Nimble Software Helps Make Sense Of Information Tide (PDF), Wall Street Journal
  15. Lai, Eric (March 22, 2008), Been audited lately? Blame the IRS's massive, superfast data warehouse, ComputerWorld
  16. Henschen, Doug (July 12, 2011), Sybase IQ Gains Beefier Analysis Capabilities, Information Week

External links

This article is issued from Wikipedia - version of the 11/1/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.