Mais conteúdo relacionado Semelhante a Hadoop Summit Keynote 2014 (20) Hadoop Summit Keynote 20141. Waiting for Hadoop
(Apologies to Samuel Beckett)
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written
permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from
sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This
publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may
include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms
and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research
organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity."
Merv Adrian
Research Vice President, Information Management
@merv
Blogs.gartner.com/merv-adrian
2. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
What Is "Big Data”?
"Big data" is high-volume, high-velocity and high-variety
information assets that demand cost-effective,
innovative forms of information processing
for enhanced insight and decision making.
Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
3. Let's go. We can't.
Waiting for Hadoop…
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
4. Why not? Let's wait till
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
we know
exactly how
we stand.
Waiting for Hadoop…
6. Investments Are on the Rise,
And Deployments Are Beginning
11%
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Have invested in
big data technology
Plan to within next year
Plan to within 2 years
No plans at this time
Don't know
31%
2013
5%
15% 19%
64%
30%
N = 720
Investing or Planning
27%
15%
16%
31%
2012
58%
N = 473
Investing or Planning
Source: Gartner Research Circle Surveys, 2012, 2013
7. But They Know the Leading Opportunities
0% 5% 10% 15% 20% 25% 30% 35%
Monetizing Data
(Directly/Indirectly)
Marketing & Sales Growth
New Products & Services
Innovation
Risk & Fraud Detection
Operational & Financial
Performance
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 6
2014
2013
8. I wouldn't
even know
him if I saw
him.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Who is he?
9. So We Search… on Gartner.com, 2nd Highest Term
1600
1400
1000
800
600
400
200
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Big Data + Hadoop
Magic Quadrant
0
1200
Januray
Feb
March
January
Over
1000
searches
per
month
10. Starting With What You Need to Do,
We See Pieces of a Solution…
Analyze
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Compute
Persist
Ingest
Monitor,
Administer
Describe
12. It's the start
that's
difficult.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
You can start
from
anything.
13. Yes, but you
have to
decide.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
19. The Complexity of Stack Composition Is Rising
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Ingest/Propagate
Describe, Develop
Compute, Search
Persist
Monitor, Administer
Analytics, Machine Learning
20. And Usage Moves - From Pilot to Production
10%
15%
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
57%
14%
4%
Piloting on premise
Piloting in the cloud
Production on premise with cluster
Production on premise with
appliance
Production in the cloud
Source: Gartner Webinar n=127
21. And “Production” Means Growth
1%
15%
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
4% None – haven’t started yet
20
18% 62%
Fewer than 10 nodes
Between 11 and 50 nodes
Between 51 and 100 nodes
Over 100 nodes
Source: Gartner Webinar, April 2014 n=145
22. What is Your Secondary Processing Mode for Hadoop?
18%
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
21
14%
53%
6%
9%
Stream processing
Interactive analytics
Graph applications
Database Management
Systems
Search
Source: Gartner Webinar, April 2014 n=120
23. Then all we
have to do is
wait on here.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
It’s not
certain.
24. No, nothing is
certain...
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
25. So, After Batch, What’s Next?
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
26. YARN Changes the Game – It All Starts Here
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
YARN
Cluster Resource Management
HDFS
Distributed Storage
SQL
Interactive
Streaming
and events
DBMSs:
Graph,
others
Batch In-Memory Search
27. SQL-on-Hadoop Is The Most Typical Addition in 2014
Which SQL-on-Hadoop Approach are You MOST Likely to Use in 2014?
9%
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
26
27%
23%
9%
32%
Creating your own SQL queries via Hive
Using a distribution-specific SQL solution
(e.g., Cloudera Impala, Pivotal HAWQ)
Using interfaces to HDFS/Hbase from
analytics tool providers (e.g. Cognos, SAS,
Tableau)
Using Hadoop BI specialists (e.g. Platfora,
Datameer)
Getting to HDFS/Hbase data from your
DBMS’ external table capability (e.g. Kognitio
HDFS Connector, Teradata SQL-H)
Source: Gartner Webinars 2014 n=164
28. HBase Is The Default “Hadoop Database,” But Not Alone
• In every distribution
• Not just the Valleybase anymore: Bloomberg, Nielsen, others adopt
• Becoming more secure: cell level is coming
• But there are alternatives:
- NOSQL (Accumulo, Apache Cassandra, MongoDB... )
- RDBMS on cluster and off
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
27
29. Let’s go.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
We can’t.
30. Why not?
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
We’re
waiting for
[Hadoop].
31. Spark Powers Machine Learning,
Other Iterative Uses in-Memory
Sp ark
Unifies batch, streaming, interactive comp.
Easy to build sophisticated applications
» Support iterative, graph-parallel algorithms
» Powerful APIs in Scala, Python, Java
• In-memory execution engine (richer alternative to
MapReduce) for multiple reuse of data to support
• Iterative algorithms (machine learning, graphs)
• Interactive data mining
• Directed acyclic graphs, function pipelining, Partition aware
(minimize shuffle)
• Used with HDFS, HBase
• Streaming applications
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
BlinkDB!
Sophisticated algos.
Spark!
Spark!
Streaming! Shark SQL!
GraphX! MLlib!
Streaming
Batch,
Interactive
Batch,
Interactive Interactive
Data-parallel,
Iterative
32. Storm: Do-It-Yourself Stream Processing
• Storm processes streams
• Spouts emit tuples: k/v
tuples representing
events
• Bolts consume tuples and
pass them through rest of
topology
• Logic & topology is up to
you
• Apache: Incubating
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Spout
Spout
31
Bolt
Bolt
Bolt
Bolt
Bolt
Bolt
33. Tackling the Limitations of Search
Finding
Stuff
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Shifting
Schemas
On-the-fly
Aggregations
• Iterating over a large number of
results
• Doing calculations on field values
for lots of documents
• Joining values from multiple
indexes
• Does not do complex analytic
chains well
• You must precalculate answers
to facilitate responsiveness
• If new data changes stored
answers, you must reindex
• Indexes are HUGE
Distributed
Computing
34. Hadoop to the Rescue? Maybe…
• Scalable, reliable, fault-tolerant data processing
• Very good for batch processing of lots of data
• Can do very complex analysis
• Can work on data from multiple records at once
• But it’s hands-on. Much assembly required.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
35. ...we'll come
back
tomorrow.
And then the
day after
tomorrow.
And so on.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
He should be
here. And if
he doesn't
come?
36. So Now We Wait…
For What’s Next. But First…
37. Securing HDFS – There’s No DBMS There
Supported
Distribution
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Access Restriction (Physical and Logical)
Configuration & Vulnerability Management
Identity & Access Management
Network traffic
encryption
Audit & Protection
Data masking
Tokenization,
encryption
36
Data
Protection
Monitoring For Sensitive Data
Data
Anonymization
Admin. Privilege
Management
Change
Management
Log
Management
Operations Hygiene
HDFS Data
38. Data lake… …or reservoir?
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
39. "Big Data Replacing the Data Warehouse?"
Not a Relevant Notion. It Joins the Warehouse.
Data warehouses are collections of data — not technology platforms.
A data warehouse can be made out of anything that manages data.
The key point is that when we find value, it is indeed managed.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Source MQ DW DBMS Survey, Nov. 2013 and Nov. 2012
What Are
Organizations
Planning for
Their DWs?
40. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Managed
Transformed
Filtered
Secured (somewhat)
Portable
Potable (fit for consumption)
A reservoir
contains
water that is…
41. And it’s
not over.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Apparently
not.
43. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
The Journey from Pilot…
to Production…
to Platform
Begins here.
Thank you!
http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/