2. Do you Know?
The total volume of electronic data stored is approximately 2 zettabytes (1
billion TB)
Do you know how many photos FB host?
10 billion photos, nearly 1PB
Do you know how much data internet archive stores everyday?
Around 2PB of data and is growing at a rate of 20PB of data
15 million smart meters (US) generating data at the rate of 3GB per second
Events collected through user interaction from sites are generated at the rate of
1.5GB per second.
3. What is BIG DATA?
Volume
Velocity
Variety
5. Who is using Hadoop?
•
Amazon
•
Facebook
•
Google
•
IBM
•
Yahoo!
•
Last.fm
•
New York Times
•
PowerSet
•
Veoh
6. What makes Hadoop special?
•
No high end or expensive systems are required
– Built on commodity hardwares
•
Can run on Linux, Mac OS/X, Windows, Solaris
•
Fault tolerant system
– Execution of the job continues even of nodes are failing
•
Highly reliable and efficient storage system
•
In built intelligence to speed up the application
– Speculative execution
•
Fit for lot of applications:
– Web log processing
– Page Indexing,page ranking
– Complex event processing
9. Does Hadoop solves every one problem?????
• I am DB guy, I am proficient in writing SQL and trying very
hard to optimize my queries, but still not able to do so.
Moreover I am not Java geek. Will this solve my problem
Use Hive/HBase
• Hadoop is written in Java, and I am purely from C++ back
ground, how I can use Hadoop for my big data problems?
Use Hadoop Pipes
• I am a statistician and I know only R, how can I write MR
jobs in R?
Use RHIPE Package
• Well how about Python, Scala, Ruby, etc programmers?
Does Hadoop support all these?
Use Hadoop streaming
10. Training Links
Course Details:
http://onlinetraining2011.blogspot.com/2012/12/apache-hadoop-and-aws-mapreduce-training.html
Sample Session:
Hadoop Installation lab: (3000 + Youtube Hits)
http://www.youtube.com/watch?v=i9yckEduQBE
Hadoop HDFS File system Lab:
http://www.youtube.com/watch?v=Pp8SV50S9HM
Case Study:
http://www.linkedin.com/groups/Insurance-Company-Case-Study-Hadoop4838165.S.256068004?qid=19f108a9-f563-4f99-9287-c19a1375ecf4&trk=groups_most_recent-0-bttl&goback=%2Egmr_4838165
LinkedIn-Group ( real time discussion)
Please join linked in group for regular updates on my learning in Hadoop / Bigdata Real time work.
http://www.linkedin.com/groups/Online-Hadoop-Training-4838165
11. Course Material
Recordings – All sessions - 40 Hours
Exercises – 30+ Fully solved
Certification questions – 2 sets
Resumes -2 sets
Online Case Study – Insurance Domain
Virtual Machine – Red Hat OS. ( Oracle Virtual Box
Manager).
Linked in group discussion – Online Hadoop Learning