Expert IT analyst groups like Wikibon forecast that NoSQL database usage will grow at a compound rate of 60% each year for the next five years, and Gartner Groups says NoSQL databases are one of the top trends impacting information management in 2013. But is NoSQL right for your business? How do you know which business applications will benefit from NoSQL and which won't? What questions do you need to ask in order to make such decisions?
If you're wondering what NoSQL is and if your business can benefit from NoSQL technology, join DataStax for the Webinar, "How to Tell if Your Business Needs NoSQL". This to-the-point presentation will provide practical litmus tests to help you understand whether NoSQL is right for your use case, and supplies examples of NoSQL technology in action with leading businesses that demonstrate how and where NoSQL databases can have the greatest impact."
Speaker: Robin Schumacher, Vice President of Products at DataStax
Robin Schumacher has spent the last 20 years working with databases and big data. He comes to DataStax from EnterpriseDB, where he built and led a market-driven product management group. Previously, Robin started and led the product management team at MySQL for three years before they were bought by Sun (the largest open source acquisition in history), and then by Oracle. He also started and led the product management team at Embarcadero Technologies, which was the #1 IPO in 2000. Robin is the author of three database performance books and frequent speaker at industry events. Robin holds BS, MA, and Ph.D. degrees from various universities.
1. How to Tell if Your Business
Needs NoSQL
Robin Schumacher
VP Products
2. • Founded in April 2010
• The Apache Cassandra™ company
• Home to Apache Cassandra Chair & most committers
• Cassandra is a massively scalable NoSQL database
• Provide enterprise-class big data platform based on
Cassandra
• 270+ customers
• Headquartered in San Francisco Bay area
• Funded by prominent venture firms
Overview of DataStax
4. Leading in Performance
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-
on.html
Netflix Cloud Benchmark…
“In terms of scalability, there is a clear winner throughout
our experiments. Cassandra achieves the highest
throughput for the maximum number of nodes in all
experiments with a linear increasing throughput.”
Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August
2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013.
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf
End Point Independent NoSQL Benchmark
Highest in throughput…
Lowest in latency…
5. NoSQL Momentum
“According to analysis by
Wikibon‟s David Floyer (and
highlighted in the Wall
Street Journal), the NoSQL
database market is
expected to grow at a
compound annual growth
rate of nearly 60% between
2011 and 2017. The SQL
slice of the Big Data market,
in contrast, will grow at just
a 26% CAGR during that
same time period.”
7. But Does My Business Need NoSQL…?
Just because a technology
appears to be having
strong adoption in the
market, that doesn‟t mean
it‟s right for your
business…
8. What is NoSQL…?
• Progressive data management
engines
• Go beyond legacy relational
databases
• Flexible data model
• Horizontal scalability
• Distributed architectures
• Use of languages and
interfaces that are “not only”
SQL
9. NoSQL Example – Apache Cassandra
Apache Cassandra is a massively scalable NoSQL database that
offers continuous availability and easy data distribution.
10. NoSQL Example – Apache Cassandra
“Cassandra stands at the front of the NoSQL pack when it
comes to supporting real-time, big data applications.”
– Wikibon
12. NoSQL Business Considerations
• Need scale-out (vs. scale-up)?
• Manage different types of data like social media?
• Lots of data coming in (and fast)?
• Have non-RDBMS, non-ACID transactions?
• Must keep large data volumes online?
• Continuous uptime necessary?
• Wide-scale data distribution needed?
• Need to integrate different systems?
• Cost a factor?
13. Need Scale-Out (vs. Scale-Up)?
No
• Application does not require multiple machines
• Can scale-up and meet the application’s current and future needs
Yes
• Application demands divide-and-conquer
• Capacity expansion is best/can only be handled via new machines
Key takeaway: If your applications can easily run on one machine, fit all your
data in RAM or can easily expand via new cores/more drives to fulfill current
and future requirements, you may not need NoSQL…
14. NoSQL Case Study
Ooyala distributes and analyzes video content for companies like
ESPN, Rolling Stone and others. They track about one quarter of all
online video viewers each day and generate 1-2 billion events that are
streaming in real-time through their system.
15. Manage Different Types of Data?
No
• No non-structured data (all or mostly rigid formats)
• E.G. No social media data
Yes
• All types of data (structured, semi, and unstructured)
• Social media data
Key takeaway: If all your data systems deal with standard RDBMS structured
data and that won‟t be changing, then you may not need NoSQL…
16. NoSQL Case Study
HealthCare Anytime needs to analyze doctor’s notes and other types
of difficult data to properly bill back Medicare / Medicaid.
17. NoSQL Case Study
“Cassandra‟s NoSQL data model allows us to insert and query data much more
naturally than what we had previously. The analysts who routinely use this data were
impressed with the flexibility and speed at which the queries came back.”
– CSC/NASA
18. Lots of Data Coming In (and Fast)?
No
• No high velocity data (e.g. device, sensors, web streaming, etc.)
• No multiple locations
• Little/no concern about write speed
Yes
• High velocity, write intensive
• Multiple locations sending data
• Must consume data as quickly as possible
Key takeaway: Business applications involving rapid time series data, device
„exhaust‟, web or financial streaming data make good use cases for
NoSQL…
19. NoSQL Case Study
Gnip takes in huge volumes of social media data at high rates of
speed (e.g. 20,000 Tweets per second).
20. Non-RDBMS, Non-ACID transactions?
No
• Standard RDBMS, Nested, ACID transactions required
• Complex, requiring rollbacks, savepoints, etc., needed
Yes
• “Big Data” transactions OK or are necessary
• Atomic, Isolated, Durable (AID), but eventual or tunable consistency
allowed
Key takeaway: NoSQL databases do transactions, but since they don‟t
support joins or foreign keys, consistency conforms to the CAP theorem vs.
RDBMS ACID styled consistency…
21. NoSQL Case Study
eBay does transactions, but does not want overhead of RDBMS
ACID-type transactions.
23. Must Keep Large Data Volumes Online?
No
• No application requirement to keep large volumes of data
• System typically purges data older than certain time period
Yes
• Must keep large volumes of data online and available to customers
• Retain both hot and cold data
Key takeaway: Some NoSQL databases like Cassandra can excel over
typical RDBMS‟s when it comes to maintaining large volumes of data online
and meeting stringent performance SLA‟s …
24. NoSQL Case Study
Easou is the #1 mobile search firm in China. One of their Cassandra
applications stores online video images for retrieval / viewing and is
300TB in size.
25. Continuous Uptime Necessary?
No
• Applications have no need for constant uptime
• Unplanned downtime can be handled via traditional failover
Yes
• Applications cannot tolerate any downtime
• Standard log shipping, failover, hot backups, won’t do
Key takeaway: Some NoSQL databases like Cassandra are able to
guarantee no downtime because of their architectures…
26. NoSQL Case Study
Netflix systems are run in the cloud across multiple availability zones
with Cassandra and sport constant uptime.
27. NoSQL Case Study
Commenting on Amazon outage in Oct 2012: “We configure all our clusters
to use a replication factor of three, with each replica located in a different
Availability Zone. This allowed Cassandra to handle the outage remarkably
well. When a single zone became unavailable, we didn't need to do
anything. Cassandra routed requests around the unavailable zone and when
it recovered, the ring was repaired.”
- Netflix Tech Blog
28. Wide-Scale Data Distribution Needed?
No
• Application’s data needs are single site only
• No need to distribute data in other locales for any reason
Yes
• Application serves customers in multiple locations
• Data is distributed across multiple data centers / cloud zones for
latency/performance or disaster recover reasons
Key takeaway: Cassandra is the gold standard among NoSQL databases for
multi-data center, data distribution use cases…
29. NoSQL Case Study
Rightscale keeps its customers in contact with each other all over the
world via Cassandra clusters in 5+ global data centers.
30. Need to Integrate Different Systems?
No
• Applications use siloed databases
• No need for different data systems to interact with each other
Yes
• Application has different database workloads
• Multiple data domains serve single application
Key takeaway: ETL and simple connectors oftentimes do not do the job.
Instead, what‟s needed is something like DataStax Enterprise, which
provides one database that serves multiple database workloads…
31. NoSQL Case Study
Datafiniti, which is a search engine for data, needs to consume lots
of data in real time and provide fast search on top of the same data.
32. Cost a Factor?
No
• Application is small and not cost intensive to operate
• Software license costs not a factor
Yes
• Large scale business applications
• Traditional RDBMS software costs a significant concern
Key takeaway: NoSQL databases costs can oftentimes be 70-80% less than
legacy RDBMS software. Further large operations staff are not required to
manage NoSQL systems.
33. NoSQL Case Study
Constant Contact found that scaling out with NoSQL vs. an RDBMS
saved them 90% in software costs, and was implemented in 1/3 the
time...
35. NoSQL Implementation Strategies
New Hybrid Replacement
• New big data
applications
• Legacy systems keep
old databases
• NoSQL database
used for heavy lifting /
big data management
• Legacy RDBMS
maintains smaller
parts of database
• Legacy RDBMS
cannot meet
demands of new or
evolving big data
system
• Data models and data
are migrated
36. DataStax Enterprise – NoSQL for the Enterprise
DataStax Enterprise is a complete big data platform, built on Cassandra, that
is architected to manage real-time, analytic, and enterprise search data all
in the same database cluster.
37. What You Get With DataStax Enterprise
1. DataStax Enterprise
Database Server
1. OpsCenter Enterprise
Management solution
1. Expert 24x7 support
38. Use Cases Handled By DataStax Enterprise
Managed by Cassandra Managed by Hadoop Managed by Solr
• Time series data
• Device/Sensor/Data
“exhaust” systems
• Distributed applications
• Media streaming
• Online Web retail
(transactional, shopping
carts, etc.)
• Real-time data analytics
• Social media capture and
analysis
• Web click-stream analysis
• Write-intensive transactional
systems
• Buyer behavior analytics
• Compliance/regulatory
analysis
• Customer
recommendation output
• Fraud detection
• Risk analysis
• Sales program
campaign analysis
• Supply chain analytics
• Batch Web clickstream
analysis
• General Web search
• Web retail faceted
(categorization) search
• Search/hit prioritization
and highlighting
• Application log search and
analysis
• Document (PDF, MS
Word, etc.) search and
analysis
• Geospatial search
• Real estate location and
property search
• Social media match ups
39. Next Steps
Download DataStax Enterprise and try it in your own
environment.
• Go to www.datastax.com/download
• Download a copy of DataStax
Enterprise
• Installs and configures in minutes
• Completely free for development use;
subscription required for production
deployments