We wanted to know how companies viewed the changing data warehousing landscape, so we surveyed 200 businesses to learn more about the issues they faced. In "Delivering the Best of All Worlds for Today's Analytics" we compare the technology, present the options, and provide findings from our survey. We also discuss the latest column store techniques and open source technology to provide both enterprise class performance and affordability.
2. What You’ll Learn
This white paper examines the issues facing companies that are evaluating their
data warehousing and analytics architectures and options. It draws on analyst
and market research including a survey of 200 enterprises on their attitudes
towards, and concerns and issues about the adoption of newer data warehous-
ing technologies such as open source databases and appliances.
The Data Warehousing Landscape
Today’s data growth is nothing short of explosive. Data comes from all directions, all devices and in more
volume than ever before. Organizations are drowning in it but ironically they still don’t have the information
they need to meet all their business goals.
Just as data is proliferating faster than ever, so is user demand for access to that data. Internal users,
empowered by powerful, easy-to-use PC and browser-based tools, are becoming increasingly savvy and
they are frustrated by IT’s inability to keep pace with their analytics needs. And data consumers are no
longer confined to internal users. Today’s customers and business partners are requiring secure query
portals through which to view and analyze their business.
In this context, it should come as no surprise that a 2007 Gartner Group survey1 found CIO’s identified
Business Intelligence (BI) as their number one technology priority, up from number ten just four years ago.
The good – and bad – news is that there are more options than ever before to help address these issues.
What Are Your Data Warehousing Options?
Option 1: Software Options: Traditional, Open Source and Analytic Databases
Traditional Databases
The issues associated with this option are known only too well, so we will not dwell on them in this white
paper. The traditional database vendors, in cooperation with their hardware partners, can provide the
performance, scalability and support that are required but at a premium price that is often out of alignment
with the business benefits, especially on smaller projects. In addition to the direct and indirect costs
associated with these solutions, there is the issue of vendor lock-in. The deeper enterprises get into the
specific feature sets of these databases and the more reliant they become on the IT teams and partner
resources that are tightly aligned with the vendors of these databases, the more difficult it becomes to
migrate to lower-cost, more easily implemented alternatives.
That said solutions from traditional database vendors remain the most commonly deployed data
warehousing solutions for most enterprises. A poll conducted by Kickfire at a recent TDWI event suggests
that almost 60% of enterprises still rely on either IBM or Oracle-based data warehousing solutions.
However, the ascendancy of these vendors and their solutions is being increasingly challenged by
new database and appliance options as we will discuss in the following sections of this white paper.
Page 2
3. Open Source Databases
Many enterprises view open source databases as very attractive options to traditional databases from an
initial and total cost-of-ownership perspective. In fact, an April 2008 Kickfire survey of 200 such companies
indicates that 66% want to be able to use open source databases, specifically MySQL, for data warehous-
ing. However, many of these enterprises expressed concerns about the open source database’s ability to
support their data warehousing needs. The most common of these concerns were:
Figure 1
MySQL Data Warehousing Issues and Concerns
100
90
80
70
60
% Users
50
40
30
20
10
0
Performance - reports / queries too slow
Scalability - won’t scale beyond 100 GB
Functionality - doesn’t support ad hoc queries
Tuning - will need constant tuning
Hardware Build-out - scaling only possible by adding servers
Although these concerns are valid, open source databases have come a long way in a very short period of
time. The ability to enhance solutions quickly, supported by the huge open source ecosystem, is one of
their most attractive benefits. There are thousands of open source developers around the world ranging
from the IT teams in global enterprises, to developers working for software vendors to the archetypal
lone-wolf open source experts. Between them, they have contributed millions of man hours to the develop-
ment of these systems – far more development time than even the largest database vendor can bring to
their own system.
Most analysts and industry watchers are united on three key issues regarding open source databases:
• There are more than 11 million active implementations and over 50,000 downloads per day of MySQL2
alone, highlighting the fact that open source databases are here to stay.
page 3
4. • Open source databases are ready for the enterprise. As Noel Yuhanna, principal analyst at Forrester,
stated in his July 2008 Market Update: Open Source Databases3. “Sun Microsystems’ acquisition of
MySQL further validated the open source database market’s worthiness, and enterprises can now expect
even more reliability and improved support in the coming years.” In an excerpt from the same report he
adds, “Every enterprise should now consider open source databases as part of its overall DBMS strategy,
as doing this will deliver cost savings, especially when supporting small to midsized applications.”
This is not to say that open source databases will immediately replace traditional databases. The heavy
financial, application and IT skills investment that most enterprises have in traditional databases ensure
that they will continue to be part of the enterprise software landscape for the foreseeable future.
However, we believe that where a feature/function fit can be assured between the application and an open
source database, many enterprises will choose to deploy the lower cost, more rapidly implemented option.
The future for most enterprises will not be choosing between traditional or open source databases, it will be
developing a co-existence strategy between them and establishing guidelines for users on which database
is most suitable for which type of applications and workloads.
Analytics Databases
Many of the brightest data warehousing designers have come to believe that traditional databases, no
matter what hardware they run on and how well tuned, cannot keep pace with the enterprise’s demand for
faster and cheaper analytics. To that end, several companies have launched database products that are
wholly optimized for analytics applications. These products typically stand the traditional row-based data
storage model on its head and drive data access more efficiently from the column rather than the row.
The best of these column-based analytics databases deliver truly dazzling performance. Academic
research, in the form of the Yale/MIT paper published in 2008 entitled Column-Stores vs. Row-Stores:
How Different Are They Really? 4 has confirmed that, for analytics applications, it is extremely difficult for
a row-based database to perform at the same level as a column store database.
However, the greatest strength of this analytics-based approach is also its greatest weakness – analytics
performance is optimized at the expense of transactional performance. For most enterprises, this means
that the analytics database is a complementary option to their existing traditional databases not a potential
replacement. Analytics databases are still expensive and can require many CPUs to build out sufficient
parallelism to achieve their performance targets. Many of the enterprises that we have surveyed, are not
ready or willing to take on another proprietary data warehousing vendor much less take on the substantial
incremental costs associated with this complementary strategy.
Option 2: Cloud Computing and Software-as-a-Service Options
Two of the hottest current topics in IT are the role of Cloud computing and Software-as-a-Service (SaaS) in
the enterprise. There has been a lot of press focus around these concepts due to the meteoric growth of
one or two of the SaaS vendors and because household brand names have entered the Cloud computing
arena. In reality, these options are more delivery-based options than technology options since the services
rely on the vendor’s ability to host either software or appliance-based systems and to deliver cost-effective,
managed services rather than the vendor’s ability to innovate and bring new technology to the market.
page 4
5. Cloud Computing
Cloud computing-based options are relatively new and, although some enterprises are piloting these
solutions, there are few, if any, examples of multi-terabyte data warehouses deployed using this model.
However, it is certainly possible to see how this option might play a role in proof of concept projects; those
that will have a limited production life or projects that need to be deployed extremely quickly.
The chief concerns around Cloud computing are performance and security. A lot of thought and planning is
required before moving large volumes of sensitive and mission critical data across the public Internet to be
processed on a shared storage and computing infrastructure that is processing a mixed workload of online
and analytics applications. Issues may include compliance; the speed and logistics of loading terabytes of
data over the Internet; and predicting performance on a mixed workload platform when so many variables
are in play.
Surveying the market, there appear to be no Cloud-based “pure-plays”. Thus, even those people who are
trumpeting the benefits of the Cloud-based architecture are hedging their bets and supporting this as one
of several deployment models. Typically, these vendors point out that this model is well-suited to trial
deployments but they will push for an on-premise model when it comes to large-scale and/or production
deployments.
SaaS Options
Several Software-as-a-Service (SaaS) vendors have recently launched in the data warehousing/Business
Intelligence market. Almost all of these companies are really Business Intelligence tool vendors going to
market through a SaaS model. They offer very intuitive web-based user functionality but do little, if
anything, at the database level to impact performance. In addition they share many of the same issues
such as security and predictable performance that apply to the Cloud-based option.
Option 3: Appliance Options
Just as open source databases are here to stay, so are data warehousing or analytics appliances. As
James Kobielus of Forrester Research wrote in his April 2008 report Appliance Power: Crunching Data
Warehousing Workloads Faster and Cheaper than Ever,5 “Appliances are taking up permanent residence
in the heart of the enterprise data center – the data warehouse (DW). DW appliances – in all their bewilder-
ing proliferation – are moving into the mainstream.”
Data warehousing and analytics appliances have been with us for many years and are proven architec-
tures in global enterprises. They are particularly well-suited to sectors such as finance, retail, consumer
packaged goods and travel where the need for high-performance and massive scalability is aligned with
the ability to spend hundreds of thousands, if not millions, of dollars on appliance-based solutions. These
appliances have proven the concept that a purpose-built device can compete with - and outperform - the
database plus commodity server solutions that have dominated the data warehousing landscape for so
long.
page 5
6. We can look at the evolution of data warehousing appliances in a number of different ways. However,
looking back over the last 10-15 years, it is clear that here have been three waves of innovation, whether
true technology innovation or marketing-led innovation.
First Generation Appliances: Proprietary Appliances
The pioneering vendors who developed the first wave of these appliances had to architect their solutions
around proprietary hardware and software architectures in order to deliver the performance to establish
themselves as a viable alternative to traditional database solutions. Moreover, because these appliance
vendors were targeting the highest level of the enterprise market and were competing with database
solutions that cost millions of dollars, they built pricing models very similar to those of the database
vendors with entry-level price-points in the high hundreds of thousands of dollars.
The first generation vendors proved that appliances could deliver superior analytics power at a lower
price-point than the traditional database plus server solutions. However, their proprietary solutions were
almost as expensive as traditional options and required highly specialized teams to design, develop,
deploy and maintain them.
Second Generation Appliances: Virtual or Bundled Appliances
Seeing the success of the early innovators, a second wave of vendors came to market. Although some of
these companies brought new technology innovations to market, the majority were more marketing plays
based on virtual appliances or loosely coupled bundles of software and hardware components. Many of
these second generation appliance vendors are hardware companies that have acquired data warehousing
business units as a go-to-market mechanism for their core hardware solutions. Others are niche software
players looking to benefit from the lower support costs inherent in the appliance model and the reduced
total cost-of-ownership economics that they can offer to their customers.
The good news is that this second generation brought significant marketing spend to the table and
educated many enterprises on the benefits of an appliance-based approach to data warehousing. The
competition from having multiple, similar vendors in the market also brought prices down to where entry-
level price points were typically at the $100,000 level.
The marketing spend and activity also piqued the interest of many VCs and entrepreneurs who saw the
business opportunity for analytics appliances and believed they could bring new innovation to the market.
This investment and start-up activity is now giving rise to a third generation of analytics appliances.
Third Generation Appliances: Open Source Appliances
As open source databases become more feature rich and better supported and as enterprises adopt such
databases in increasing numbers, it is inevitable that appliances vendors will use standard or modified
open source databases as one of the key building blocks of their solutions in preference to the expensive
options from the traditional database vendors. However, while these various third-generation appliance
vendors use the common building blocks of commodity hardware and open source software, the way in
which these vendors have configured these systems is, to repeat Forrester’s word, “bewildering”.
page 6
7. Kickfire: The Best of All Worlds?
Notable in the third generation category is Kickfire. Kickfire’s vision is to develop a “best of all worlds”
solution; an open source data warehousing solution supporting the key features of an analytics database
architecture deployed on a dedicated appliance that can deliver enterprise-level performance and function-
ality at an affordable price-point.
It is no secret that general purpose CPUs have major bottlenecks that need to be eliminated rather than
mitigated. Typically vendors and users seek to ease these bottlenecks through proven techniques such as
parallelism, disk striping and advanced tuning. However, these are stop gap measures that do not address
the fundamental issue that general purpose computers are not optimized to move and analyze large
amounts of data in a short period of time.
Kickfire’s appliance-based solution addresses this fundamental issue in a new and innovative way: it is a
purpose-built appliance optimized for data warehousing performance that deploys the latest analytic
database features on a standard, open source database. In line with industry trends towards open source,
Kickfire selected Sun’s MySQL database as its database engine and developed a Storage Engine that
plugs directly into MySQL’s core architecture. So, what’s different about Kickfire?
• World’s First SQL Chip - at the core of the Kickfire Database Appliance is the world's first SQL chip
that uses parallel, pipelined data flow to deliver the power of tens of high-end, general purpose CPUs on a
single processor. Backed by large amounts of directly addressable memory, this provides blazing raw
performance to power the Kickfire software. This architecture, called Dataflow Architecture, has been the
basis of many high-performance military, scientific and research systems going back to the mid-1980s and
is well-proven.
• Enterprise Class Data Warehousing Software Features - at the software level, Kickfire brings
to MySQL the now proven concept of storing analytics data by columns not rows. This minimizes read
access times and, combined with Kickfire’s highly-efficient data compression and hardware-based search
indexes, guarantees predictable and scalable performance without the traditional need for constant tuning
or adding more and faster hardware.
• Open Source Standards – unlike other appliances, Kickfire uses the standard Linux Operating
System and runs the standard MySQL database. This means that the ever growing range of open source
business intelligence tools and utilities for data loading, backup and restore can be leveraged with Kickfire.
Kickfire’s appliance delivers enterprise-class performance more efficiently than any another data ware-
housing architecture. Customers therefore benefit from record-breaking performance delivered with the
simplicity of an appliance in a cost-effective, low TCO package and the ease of installation and manage-
ment of a standard database.
page 7
8. Figure 2
Kickfire: Open Source and Industry Standard Architecture
SQL Chip P o w e re d by
Standard Database
Standard Server
Standard Storage
External Storage
Because the Kickfire appliance is a true appliance and not a bundle of loosely-couple hardware compo-
nents, it has a small form factor and needs minimal power and cooling in sharp contrast to the racks of
commodity servers and disk arrays that are typical of other high-end analytics solutions. In fact, relative to
the typical server configurations needed to power traditional terabyte plus data warehousing solutions,
Kickfire’s appliance needs less than 10% of the rack space and consumes less than 650 watts – about the
same as a typical microwave oven.
Unlike many other solutions, Kickfire is targeted towards the data marts and medium-sized data ware-
houses that most forward-thinking enterprises are implementing in preference to the monolithic, complex
data warehouses architected to house every shred of information within the enterprise.
Kickfire is targeting data sizes from tens of gigabytes to the low tens of terabytes in size and is packaged
as a purpose-built appliance that is quick to deploy, requires minimal tuning and maintenance, takes up
less space and power in the data center and, in many cases, pays for itself with the first project. Kickfire
believes that the appliance-driven commoditization of enterprise applications is here to stay and should not
be the exclusive preserve of enterprises willing and able to spend millions of dollars on data warehousing
solutions.
TPC-H Benchmarks: the Proof is in the Benchmarks
In May 2008 Kickfire published newly audited results based on The Transaction Processing Performance
Council’s TPC-H benchmarks that shocked many of the traditional data warehousing vendors. An unknown
company, Kickfire, had broken the performance record6 in the non-clustered category and the price-
performance record on the rigorous industry-standard TPC-H 300 GB benchmark, delivering a record-
breaking 54,895 queries per hour on a 300 GB database. Not only did Kickfire set a new performance
record, it did so at an unheard of cost – the Kickfire appliance that was tested cost less than $50,000,
roughly a quarter the cost of traditional solutions from the database and hardware giants.
page 8
9. What does this mean for my business?
Business in the 21st century is driven by the web and the architectural basis for the new web economy is
the LAMP7 stack. A key component of LAMP is MySQL which is emerging as the primary repository of
online information worldwide. As data volumes grow, the ability to rapidly analyze this information breaks
down because MySQL is architected and optimized to support transactional systems. Kickfire is the first
and only analytic appliance for MySQL enabling businesses that depend on MySQL analytics to:
• Improve profit margins
• Deploy services faster
• Offer new self-service, high-performance information applications
• Consolidate servers and data to reduce cost
• Achieve 10-100X analytic and reporting performance improvements
• Scale operations as data volumes grow
Kickfire customers are active participants in the web economy. While there are many uses of Kickfire for
data analysis, two representative use cases are:
Marketing Analytics
The movement of services, communications and commerce online presents marketers with compelling
opportunities. In the new online world, click-stream data and session history reflect actual customer
behavior at a level of detail never captured before. Both B2B and B2C marketers can replace sampling
techniques based on focus groups, opinion surveys and shopping observers with real data that reflects the
entire population of prospective buyers. Today, customer analytics distinguishes the leaders from the
laggards. Leaders leverage customer analytics to optimize campaigns, dictate contact and advertising
strategies, segment markets and improve the bottom line.
Retailers, marketing service providers, e-commerce companies, mobility service providers, telecommunica-
tions service providers, government organizations and others are working with Kickfire to better understand
their data. Kickfire has transformed hour-long queries into queries that run in less than thirty seconds for
one marketing service provider vs. their hand-coded, optimized queries. With Kickfire, this marketing
service provider can allow more users access to their information and deliver new chargeable services to
customers confident in the performance of the query system.
Network and Security Management Data Analysis
Today’s network and security management tools monitor devices and generate ever increasing amounts of
as-polled network data. But the systems to analyze this data fall short in both performance and data
scalability. Often businesses are forced to analyze less data or buy multiple systems to scale to the
desired amount of data. By removing these barriers and enabling analysis of historical data for longer
periods of time, managers can more effectively evaluate application availability, plan capacity, determine
appropriate thresholds to deliver users timely and relevant alerts and manage service-level agreements.
Network and security management tools providers and corporations across many industries are working
with Kickfire to deliver greater value from their network data. One network management company has
achieved 600X query performance improvements from Kickfire enabling them to offer trend analysis on a
full year of data rather than the present maximum of 30 days data. And, higher performance means less
hardware is required so this same customer will be able to consolidate 5-10 network analysis systems into
one Kickfire appliance when fully deployed.
page 9
10. Summary
In this white paper we have examined the issues and options facing companies that are evaluating their
data warehousing and analytics needs. We have highlighted concerns and issues that many enterprises
have towards the adoption of newer data warehousing technologies such as open source databases and
appliances.
The available evidence appears to support Gartner’s prediction that by 2011, at least 80% of commercial
software will contain significant amounts of open source code.8 Combined wth the trend towards appli-
ances as the lowest cost and most efficient vehicle for data warehousing, the case for an open source and,
specifically, a MySQL-based data warehousing appliance such as Kickfire is compelling.
For data warehousing users, who are interested in the cost-saving potential of open source databases but
concerned over performance and scalability, Kickfire presents a “best of all worlds” option. Combined with
MySQL’s low cost-of-ownership and deserved reputation for ease-of-use, Kickfire’s column-based process-
ing, indexing and compression software and the raw power of the Kickfire SQL chip bring enterprise-class
performance and data warehousing functionality to MySQL.
For MySQL users looking at how to use MySQL for data warehousing and analytics, the case for consider-
ing the Kickfire appliance is an obvious one: Kickfire brings processing power that general purpose Linux
servers simply can’t match and data warehousing-specific software features not available in any other
MySQL storage engine or application.
As the analyst community rightly points out, data warehousing and analytics appliances are now part of the
mainstream. Kickfire recognizes this and is leading the way towards making this technology simple to
implement, easy to use and affordable for companies of all sizes.
page 10
11. 1
Gartner Inc. press release “Gartner EXP Survey of More Than 1,400 CIOs Shows CIOs Must Create
Leverage to Remain Relevant to the Business” January 23, 2008
2
There are more than 11 million active implementations and over 50,000 downloads per day of
MySQL according to Sun Microsystems/MySQL.
3
Market Update: Open Source Databases by Forrester Research, Noel Yuhanna. July 2008.
4
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel R. Madden,
Nabil Hachem. In proceedings of SIGMOD 2008.
5
Appliance Power: Crunching Data Warehousing Workloads Faster and Cheaper Than Ever,
James G. Kobielus, Forrester Research. April 2008.
6
As of August 31st, 2008, the Kickfire Database Appliance Series 2400 delivers 54,895 QphH@300GB
(Queries per hour on the TPC-H benchmark) propelling Kickfire to world leadership in query performance
(non-clustered systems) on the 300GB TPC-H benchmark. Kickfire is also number one in
price/performance at $0.89/QphH@300GB USD on the 300GB benchmark. Moreover, Kickfire delivers
this record breaking performance with a 3 year total system cost of only $48,790 USD. Kickfire’s price
performance metric can be found at http://www.tpc.org/tpch/results/tpch_price_perf_results.asp. The
Kickfire Database Appliance is in beta and will be available October 14, 2008. TPCH, QphH and $/QphH
are trademarks of the TPC. For additional information on the TPCH benchmark, please visit the
Transaction Processing Performance Council's Web site at http://www.tpc.org/.
7
The LAMP “stack” software bundle is the open source web platform consisting of Linux, Apache, MySQL
and Perl/PHP/Python.
8
http://www.networkworld.com/news/2007/092007-open-source-unavoidable.html
page 11