Mais conteúdo relacionado Semelhante a Past Present and Future of Data Processing in Apache Hadoop (10) Mais de DataWorks Summit (20) Past Present and Future of Data Processing in Apache Hadoop1. Data Processing with Hadoop
Looking Back, Looking Ahead
Arun C. Murthy
Founder & Architect
@acmurthy (@hortonworks)
Page 1
2. Hello!
• Founder/Architect at Hortonworks Inc.
– Lead - Map-Reduce/YARN/Tez
– Formerly, Architect Hadoop MapReduce, Yahoo
– Responsible for running Hadoop MapReduce as a service for all
of Yahoo (~50k nodes footprint)
• Apache Hadoop, ASF
– Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
– Long-term Committer/PMC member (full time for 7 years)
– Release Manager for hadoop-2.x
© Hortonworks Inc. 2013 Page 2
3. Once upon a time …
… long, long ago, there was a kingdom we shall call
Apache Hadoop
http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo
© Hortonworks Inc. 2013 Page 3
4. Hadoop begat …
… a two-headed monster on every node in the kingdom;
each belonged to a different clan and answered to a
different master
http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg
© Hortonworks Inc. 2013 Page 4
5. Knights of Bytes - HDFS
… stored data uncompromisingly in directories/files, nary a
care about contents
http://whoiscraigmoser.com/Images/identity/knight.png
© Hortonworks Inc. 2013 Page 5
6. Prince of Processing - MapReduce
He ruled with an iron fist by mapping,
and then by mercilessly reducing data http://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg
© Hortonworks Inc. 2013 Page 6
7. Peace Reigned
… for a while with the odd change in the direction of the wind
http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg
© Hortonworks Inc. 2013 Page 7
8. Slowly, but surely …
Human beings define reality through misery and suffering.
- Agent Smith
http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
© Hortonworks Inc. 2013 Page 8
9. Slowly, but surely …
Human beings define reality through misery and suffering.
- Agent Smith
http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
© Hortonworks Inc. 2013 Page 9
10. Slowly, but surely …
… people of the kingdom clamored for more.
A palpable sense of greed & expectation.
http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg
© Hortonworks Inc. 2013 Page 10
11. Signs of Distress
SQL said some, others said Machine Learning,
still others said Real-Time Event Processing
http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg
© Hortonworks Inc. 2013 Page 11
12. A Meeting at the Summit
MapReduce is dead!
Err… not quite.
We need more options! We need more!
True…
http://4.bp.blogspot.com/-
oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp
© Hortonworks Inc. 2013 Page 12
13. A Meeting at the Summit
A common thread YARN running through all applications…
Long live the King!
http://whipup.net/wp-content/images/2008/08/yarn.gif
© Hortonworks Inc. 2013 Page 13
14. The Edict
Henceforth, in the Kingdom of King YARN…
MapReduce has been relegated to the status
of, merely, one of the applications!
http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg
© Hortonworks Inc. 2013 Page 14
15. Reign of King YARN
King YARN came to throne
with promises to return power
to all applications
equally, lower performance
taxes and resource
management…
http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg
© Hortonworks Inc. 2013 Page 15
16. Oh the Shame!
Well, at least, Prince
MapReduce still had
powerful allies like
Highness
Hive, Powerful
Pig, Cheery
Cascading…
http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg
© Hortonworks Inc. 2013 Page 16
17. Things get worse before better
Unfortunately, things got a lot worse for the Prince MapReduce…
http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg
© Hortonworks Inc. 2013 Page 17
18. Knight Tez
He did MapReduce, and so much more…
Smartly aligned himself to Kingdom YARN.
http://twomorrows.com/alterego/media/08shiningknight.gif
© Hortonworks Inc. 2013 Page 18
19. Knight Tez
Long term alliances of MapReduce with
Hive, Pig, Cascading etc. broke up…
… they decided to throw their
lot with Knight Tez!
http://informatica.upg-ploiesti.ro/62689/img/partners.jpg
http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png
© Hortonworks Inc. 2013 Page 19
21. On a more serious note…
© Hortonworks Inc. 2013 Page 21
22. Every season has a flavor…
SQL-on-Hadoop is the new black!
SQL-on-Hadoop will be solved within
the existing ecosystem
© Hortonworks Inc. 2013 Page 22
23. Looking ahead
What will it be next year?
Real-time event processing?
Machine Learning?
© Hortonworks Inc. 2013 Page 23
24. Play to our strengths
Invest in the Apache Hadoop platform
and the ecosystem (Hive et al).
© Hortonworks Inc. 2013 Page 24
27. Hadoop MapReduce – The Paradigm
m m0 m1 m2 m3 m4
r r0 r1 r2
© Hortonworks Inc. 2013 Page 27
28. Hadoop YARN
Node
Node
Manager
Manager
Container App Mstr
App Mstr
Client
Resource Node
Node
Resource
Manager
Manager Manager
Manager
Client
Client
App Mstr Container
Container
MapReduce Status Node
Node
MapReduce Status
Manager
Manager
Job Submission
Job Submission
Node Status
Node Status
Resource Request
Resource Request Container Container
29. Tez - Core Ideas
Task <Input, Processor & Output>
Input Processor Output
Task
Tez Task - <Input, Processor, Output>
YARN ApplicationMaster to run DAG of Tasks
© Hortonworks Inc. 2013 Page 29
30. Pig/Hive-MR versus Pig/Hive-Tez
SELECT a.state, COUNT(*)
FROM a JOIN b ON (a.id = b.id)
GROUP BY a.state
I/O Synchronization
I/O Pipelining
Barrier
Pig/Hive - MR Pig/Hive - Tez
© Hortonworks Inc. 2013 Page 30
31. Pig/Hive-MR versus Pig/Hive-Tez
SELECT a.state, COUNT(*), AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state
Job 1
Job 2
I/O Synchronization
Barrier
Job 3
I/O Synchronization
Barrier
Single Job
Job 4
Pig/Hive - MR Pig/Hive - Tez
© Hortonworks Inc. 2013 Page 31