O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
Facebook,Twitter, Google generating petabytes of data everyday.
Hadron Collider project discarding large amount of data as they won’t be able to
analyse. Hoping that they haven’t thrown anything valuable.
Interesting facts but ….Why is Big Data important?
Lets understand via an example
3rd Party Survey Expert Debates
Organisations behaving like
Biological nervous system
Mobile Alert with
International DataCorporation’s (IDC) 6th annual study:
From 2005 to 2020, the digital universe will grow by a factor of 300, from 130
exabytes to 40,000 exabytes, or 40 trillion gigabytes
More than 5,200 gigabytes for every man, woman, and child in 2020.
From now until 2020, the digital universe will about double every two years.
33% of the digital data might be valuable if analysed, compared with 25%
4.4 Million IT Jobs Globally to Support Big Data By 2015.
2003-041996-2000 2005-06 2010 2013
Google File System
And MapReduce Papers
Next Generation Hadoop
Hadoop spawns off
Big Data problem faced by
All Search engines
0.xx Releases of
1. Clusters use commodity
hardware, cheaper than
one expensive server.
2. Software License is free.
Google File System
map map map map map Reduce
on a small dataset
on a large dataset
1. Complex Algorithms needs to be
correctly sensitive to week
2. Complex Algorithms are thus
difficult to code and design.
Data Engineer Data Scientist
To solve business problems
To engineer software solutions.
More of programing and
technical skills and ability to
architect technical solutions.
Strong of Mathematical Skills
and understanding of statistical
->All the ecosystems need
to be additionally installed.
-> Important ecosystem
-> Few Proprietary tools
like Enterprise Manager.
-> Proprietary Hadoop code
written in C.
-> Integrated with Hadoop
-> Based out of Apache
-> Supports .NET framework
-> Launches Hadoop
Distribution: Pivotal HD