SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
PRESENTATION TITLE GOES HEREHadoop 2 : New and Noteworthy
Sujee Maniyam, ElephantScale
sujee@ElephantScale.com
http://ElephantScale.com
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
SNIA Legal Notice
!   The material contained in this tutorial is copyrighted by the SNIA unless
otherwise noted.
!   Member companies and individual members may use this material in
presentations and literature under the following conditions:
!   Any slide or slides used must be reproduced in their entirety without modification
!   The SNIA must be acknowledged as the source of any material used in the body of
any document containing material from these presentations.
!   This presentation is a project of the SNIA Education Committee.
!   Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be, or should be construed as legal advice or an
opinion of counsel. If you need legal advice or a legal opinion please
contact your attorney.
!   The information presented herein represents the author's personal opinion
and current understanding of the relevant issues involved. The author, the
presenter, and the SNIA do not assume any responsibility or liability for
damages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Abstract
!   Hadoop 2 : New And Noteworthy Features
!   This session will appeal to Data Center Managers, Development
Managers, and those that are looking for an overview of ‘whats
new’ in Hadoop 2 platform. The session will highlight some of the
notable features in Hadoop 2.
3
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Quick Poll
!   How many of you are NEW to Hadoop?
!   How many of you are USING Hadoop?
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Timeline
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Versions – J
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Versions – Simplified
Hadoop 1 Hadooop 2
1.2.1 (aug 2013) 2.2.0 : (oct 2013)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Feature Matrix
Component Feature V1 v2
HDFS NameNode High Availability X
Namenode federation X
Snapshots X
NFS v3 access to HDFS X
Improved IO X
Processing MapReduce v1 X
YARN (MapReduce v2) X
Other Kerberos security X X
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
9
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Architecture (V1)
10
Name Node
Data Node Data NodeData NodeData Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Name Node High Availability
!   HDFS has (had) a ONE NameNode/ many Datanode
design
!   This leads to ‘Single Point of Failure’ (SPOF) for Name
Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode Is Very Important In A
Cluster
12
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Is Hadoop NN Failure A Big Deal?
!   At Yahoo study
!   18 month study
!   22 failure on 25 clusters
!   0.58 failures per cluster per year
!   Only half of them would have benefited from HA
!   à 0.23 failure / year / cluster
! http://www.slideshare.net/Hadoop_Summit/hdfs-
namenode-high-availability
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Still Needs To Be Fixed
!   Downtime may be acceptable for batch workloads
!   But not acceptable for running real time workloads like
HBase that depend on HDFS
!   Downtime (even minutes) is not acceptable
!   Make Hadoop more Enterprise friendly
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
How Do We Fix A Single
NameNode Failure?
!   Have two Namenodes !
!   One ACTIVE and another PASSIVE
!   When Active NN fails, Passive one will take over
!   Fail over can be automated
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Architecture (v1)
16
Name Node
Data Node Data NodeData NodeData Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA (V2)
17
Name Node
1
(active)
Data Node Data NodeData NodeData Node
Name Node
2
(passive)
Shared
storage
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA : Shared Storage
(c) ElephantScale.com, 2014
18
Name Node
1
(active)
Data Node Data NodeData NodeData Node
Name Node
2
(passive)
Filer
Option 1) external filer
Option 2) Quorum Journal
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Namenode HA
!   Namenode meta data is written to a shared storage
(external filer or Quorum Journal Manager)
!   Only ONE active NN can write to shared storage
!   Passive NN reads and replays meta data from shared
storage
!   When Active NN fails, passive NN is promoted to active
!   Can be manual or automatic
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA Setup
20
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
21
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Namenode Federation
!   Namenode stores meta data in memory
!   For large (very large) clusters, NN could exhaust
memory
!   Spread meta-data over mulitiple namenodes
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
!   Now the namespace is divided
!   /hbase à NN1
!   /user à NN2
!   /hive à NN3
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
!   Namespace is partitioned into ‘block pools’
!   Datanodes are shared across cluster
!   They store blocks for different pools
!   Datanodes send heart-beats to all NNs
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
26
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Snapshots
!   Wait, doesn’t HDFS makes replicas?
!   Yes
!   But it doesn’t save you from :
hdfs dfs –rm –r /data
!   ‘Trash’ feature only works for CLI utilities
!   You can delete files using API.. Poof gone
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Snapshots
!   Recover from user errors, other disasters
!   Peroidic snapshots
!   E.g : daily backups… keep them for 15 days
!   Snapshotting is
!   Efficient (no data duplication, copy on write)
!   Fast
!   snapshot part of file system (not the whole thing)
! http://cdn.oreillystatic.com/en/assets/1/event/100/HDFS
%20Snapshots%20and%20Beyond%20Presentation.pdf
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
29
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NFS Access to HDFS
!   HDFS is a userland file system
!   Not a kernel file system
!   So most linux programs can not read/write data to HDFS
!   We use ‘hdfs’ command line utils
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NFS Access to HDFS
!   HDFS supports NFS protocol starting with v2
!   NFS is done via gateway machine
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
32
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Improved IO
!   Lots of performance fixes from v1 à v2
!   Quick comparison
!   Multi threaded random-read
!   HDFS v1 : 264 MB/sec
!   HDFS v2 : 1395 MB /sec ( 5x !)
Source :
http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-
hadoop-forum
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
V2 Features
! HDFS
!   Processing
!   YARN
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce V1
!   MRV1 proved itself as a reliable batch processing
framework!
!   One Job Tracker (master) and many task tracker
(workers)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce Architecture
36
Job Tracker
Task Tracker Task TrackerTask TrackerTask Tracker
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MRV1 Limitations
!   Only supports one programming paradigm
!   Batch processing
!   Alternate processing is hard to (or not possible)
implement on top of MRV1
!   Real time processing
!   In-memory data
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MRV1 Limitations
!   Single Job Tracker (JT) à single point of failure
!   JT Failure kills all running jobs (and queued jobs)
!   JT started hit scalability limitations for very large clusters
!   4,000 nodes
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Looking Ahead
HDFS
MRV1
1) Processing
2) Resource
management
HDFS
YARN
(resource management)
mapreduce other
Hadoop v1 Hadoop v2
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved. 40
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Yarn
!   MRV1 did
!   Resource Management
!   And Processing
!   Separate both out
!   Yarn for resource management
!   Mapreduce / other frameworks for processing
!   Now mapreduce is ‘just another app’
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Yarn Architecture
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
YARN Architecture
!   resource manager : manages the resource for entire
cluster
!   node manager : manages resources a single node
!   Containers : resource buckets ( 2 cpu + 8 G RAM)
!   application masters : one for each application
!   batch mapreduce, storm …etc
!   Manages application scheduling and execution
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Adoption of YARN
!   Standard on Hadoop v2
!   Already running at Yahoo at scale
!   Lot of applications are already moving to YARN
architecture
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Apps on Yarn
HDFS
YARN
Batch
(mapreduce)
Streaming
(storm, S4)
In-memory
(spark)
Graph
(giraph)
realtime
(hbase)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Apps on YARN
!   Storm : real time event processing
!   Giraph : graph processing (in memory)
!   Spark : in-memory, iterative processing
!   Hbase
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce on YARN
!   MapReduce is NOT going anywhere
!   Works very well for batch processing
!   Proven
!   Lots of code out there
!   No more single JobTracker
!   Each MapReduce job runs an Application
!   So failure one AppMaster only causes that job to fail
!   Other jobs are insulated
!   Better performance
!   MR jobs scale / utilize cluster better in Yarn (1.5 x – 2x )
(c) ElephantScale.com, 2014
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce on YARN
(c) ElephantScale.com, 2014
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Writing A YARN Application
! http://hadoop.apache.org/docs/stable/hadoop-yarn/
hadoop-yarn-site/WritingYarnApplications.html
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
So Which Hadoop Should I Use?
!   If you are starting now…
!   Hadoop 2
!   Already using Hadoop 1
!   Worth the upgrade (new features / performance)
!   How do I migrate?
!   Recommended : Standup a separate v2 cluster and migrate data
over
!   In place update? (yeek!)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Distributions
Distribution Hadoop v1 Hadoop v2
Cloudera CDH 3.x / CDH 4.x CDH 5.x
Horton Works HDP 1.x HDP 2.x
Pivotal HD
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Future…
!   HDFS
!   Mirroring across data centers
!   Work well with SSD (solid state drives / flash drives)
!   YARN
!   Better containers (not just JVMs)
!   Performance
!   Make Resource Manager HA
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Thanks & Questions?
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Attribution & Feedback
54
Please send any questions or comments regarding this SNIA
Tutorial to tracktutorials@snia.org
The SNIA Education Committee thanks the following
individuals for their contributions to this Tutorial.
Authorship History
Sujee Maniyam (Sept 2014)
Additional Contributors
Joseph White : Review & Feedback
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Backup Slides
55

Mais conteúdo relacionado

Mais procurados

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopDataWorks Summit
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 

Mais procurados (20)

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 

Destaque

Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and HadoopCynthia Saracco
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Cynthia Saracco
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab Cynthia Saracco
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (10)

Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Hadoop2 new and noteworthy SNIA conf

field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Pramod Gosavi
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product pageJanu Jahnavi
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 

Semelhante a Hadoop2 new and noteworthy SNIA conf (20)

field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
big data
big databig data
big data
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 

Mais de Sujee Maniyam

Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 
Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Sujee Maniyam
 
Launching your career in Big Data
Launching your career in Big DataLaunching your career in Big Data
Launching your career in Big DataSujee Maniyam
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscapeSujee Maniyam
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Sujee Maniyam
 

Mais de Sujee Maniyam (8)

Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014
 
Launching your career in Big Data
Launching your career in Big DataLaunching your career in Big Data
Launching your career in Big Data
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscape
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Hadoop2 new and noteworthy SNIA conf

  • 1. PRESENTATION TITLE GOES HEREHadoop 2 : New and Noteworthy Sujee Maniyam, ElephantScale sujee@ElephantScale.com http://ElephantScale.com
  • 2. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. SNIA Legal Notice !   The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. !   Member companies and individual members may use this material in presentations and literature under the following conditions: !   Any slide or slides used must be reproduced in their entirety without modification !   The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. !   This presentation is a project of the SNIA Education Committee. !   Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. !   The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2
  • 3. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Abstract !   Hadoop 2 : New And Noteworthy Features !   This session will appeal to Data Center Managers, Development Managers, and those that are looking for an overview of ‘whats new’ in Hadoop 2 platform. The session will highlight some of the notable features in Hadoop 2. 3
  • 4. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Quick Poll !   How many of you are NEW to Hadoop? !   How many of you are USING Hadoop?
  • 5. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Timeline
  • 6. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Versions – J
  • 7. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Versions – Simplified Hadoop 1 Hadooop 2 1.2.1 (aug 2013) 2.2.0 : (oct 2013)
  • 8. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Feature Matrix Component Feature V1 v2 HDFS NameNode High Availability X Namenode federation X Snapshots X NFS v3 access to HDFS X Improved IO X Processing MapReduce v1 X YARN (MapReduce v2) X Other Kerberos security X X
  • 9. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 9
  • 10. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Architecture (V1) 10 Name Node Data Node Data NodeData NodeData Node
  • 11. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Name Node High Availability !   HDFS has (had) a ONE NameNode/ many Datanode design !   This leads to ‘Single Point of Failure’ (SPOF) for Name Node
  • 12. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode Is Very Important In A Cluster 12
  • 13. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Is Hadoop NN Failure A Big Deal? !   At Yahoo study !   18 month study !   22 failure on 25 clusters !   0.58 failures per cluster per year !   Only half of them would have benefited from HA !   à 0.23 failure / year / cluster ! http://www.slideshare.net/Hadoop_Summit/hdfs- namenode-high-availability
  • 14. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Still Needs To Be Fixed !   Downtime may be acceptable for batch workloads !   But not acceptable for running real time workloads like HBase that depend on HDFS !   Downtime (even minutes) is not acceptable !   Make Hadoop more Enterprise friendly
  • 15. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. How Do We Fix A Single NameNode Failure? !   Have two Namenodes ! !   One ACTIVE and another PASSIVE !   When Active NN fails, Passive one will take over !   Fail over can be automated
  • 16. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Architecture (v1) 16 Name Node Data Node Data NodeData NodeData Node
  • 17. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA (V2) 17 Name Node 1 (active) Data Node Data NodeData NodeData Node Name Node 2 (passive) Shared storage
  • 18. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA : Shared Storage (c) ElephantScale.com, 2014 18 Name Node 1 (active) Data Node Data NodeData NodeData Node Name Node 2 (passive) Filer Option 1) external filer Option 2) Quorum Journal
  • 19. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Namenode HA !   Namenode meta data is written to a shared storage (external filer or Quorum Journal Manager) !   Only ONE active NN can write to shared storage !   Passive NN reads and replays meta data from shared storage !   When Active NN fails, passive NN is promoted to active !   Can be manual or automatic
  • 20. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA Setup 20
  • 21. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 21
  • 22. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Namenode Federation !   Namenode stores meta data in memory !   For large (very large) clusters, NN could exhaust memory !   Spread meta-data over mulitiple namenodes
  • 23. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation
  • 24. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation !   Now the namespace is divided !   /hbase à NN1 !   /user à NN2 !   /hive à NN3
  • 25. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation !   Namespace is partitioned into ‘block pools’ !   Datanodes are shared across cluster !   They store blocks for different pools !   Datanodes send heart-beats to all NNs
  • 26. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 26
  • 27. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Snapshots !   Wait, doesn’t HDFS makes replicas? !   Yes !   But it doesn’t save you from : hdfs dfs –rm –r /data !   ‘Trash’ feature only works for CLI utilities !   You can delete files using API.. Poof gone
  • 28. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Snapshots !   Recover from user errors, other disasters !   Peroidic snapshots !   E.g : daily backups… keep them for 15 days !   Snapshotting is !   Efficient (no data duplication, copy on write) !   Fast !   snapshot part of file system (not the whole thing) ! http://cdn.oreillystatic.com/en/assets/1/event/100/HDFS %20Snapshots%20and%20Beyond%20Presentation.pdf
  • 29. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 29
  • 30. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NFS Access to HDFS !   HDFS is a userland file system !   Not a kernel file system !   So most linux programs can not read/write data to HDFS !   We use ‘hdfs’ command line utils
  • 31. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NFS Access to HDFS !   HDFS supports NFS protocol starting with v2 !   NFS is done via gateway machine
  • 32. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 32
  • 33. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Improved IO !   Lots of performance fixes from v1 à v2 !   Quick comparison !   Multi threaded random-read !   HDFS v1 : 264 MB/sec !   HDFS v2 : 1395 MB /sec ( 5x !) Source : http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache- hadoop-forum
  • 34. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. V2 Features ! HDFS !   Processing !   YARN
  • 35. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce V1 !   MRV1 proved itself as a reliable batch processing framework! !   One Job Tracker (master) and many task tracker (workers)
  • 36. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce Architecture 36 Job Tracker Task Tracker Task TrackerTask TrackerTask Tracker
  • 37. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MRV1 Limitations !   Only supports one programming paradigm !   Batch processing !   Alternate processing is hard to (or not possible) implement on top of MRV1 !   Real time processing !   In-memory data
  • 38. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MRV1 Limitations !   Single Job Tracker (JT) à single point of failure !   JT Failure kills all running jobs (and queued jobs) !   JT started hit scalability limitations for very large clusters !   4,000 nodes
  • 39. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Looking Ahead HDFS MRV1 1) Processing 2) Resource management HDFS YARN (resource management) mapreduce other Hadoop v1 Hadoop v2
  • 40. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. 40
  • 41. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Yarn !   MRV1 did !   Resource Management !   And Processing !   Separate both out !   Yarn for resource management !   Mapreduce / other frameworks for processing !   Now mapreduce is ‘just another app’
  • 42. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Yarn Architecture
  • 43. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. YARN Architecture !   resource manager : manages the resource for entire cluster !   node manager : manages resources a single node !   Containers : resource buckets ( 2 cpu + 8 G RAM) !   application masters : one for each application !   batch mapreduce, storm …etc !   Manages application scheduling and execution
  • 44. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Adoption of YARN !   Standard on Hadoop v2 !   Already running at Yahoo at scale !   Lot of applications are already moving to YARN architecture
  • 45. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Apps on Yarn HDFS YARN Batch (mapreduce) Streaming (storm, S4) In-memory (spark) Graph (giraph) realtime (hbase)
  • 46. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Apps on YARN !   Storm : real time event processing !   Giraph : graph processing (in memory) !   Spark : in-memory, iterative processing !   Hbase
  • 47. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce on YARN !   MapReduce is NOT going anywhere !   Works very well for batch processing !   Proven !   Lots of code out there !   No more single JobTracker !   Each MapReduce job runs an Application !   So failure one AppMaster only causes that job to fail !   Other jobs are insulated !   Better performance !   MR jobs scale / utilize cluster better in Yarn (1.5 x – 2x ) (c) ElephantScale.com, 2014
  • 48. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce on YARN (c) ElephantScale.com, 2014
  • 49. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Writing A YARN Application ! http://hadoop.apache.org/docs/stable/hadoop-yarn/ hadoop-yarn-site/WritingYarnApplications.html
  • 50. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. So Which Hadoop Should I Use? !   If you are starting now… !   Hadoop 2 !   Already using Hadoop 1 !   Worth the upgrade (new features / performance) !   How do I migrate? !   Recommended : Standup a separate v2 cluster and migrate data over !   In place update? (yeek!)
  • 51. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Distributions Distribution Hadoop v1 Hadoop v2 Cloudera CDH 3.x / CDH 4.x CDH 5.x Horton Works HDP 1.x HDP 2.x Pivotal HD
  • 52. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Future… !   HDFS !   Mirroring across data centers !   Work well with SSD (solid state drives / flash drives) !   YARN !   Better containers (not just JVMs) !   Performance !   Make Resource Manager HA
  • 53. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Thanks & Questions?
  • 54. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Attribution & Feedback 54 Please send any questions or comments regarding this SNIA Tutorial to tracktutorials@snia.org The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial. Authorship History Sujee Maniyam (Sept 2014) Additional Contributors Joseph White : Review & Feedback
  • 55. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Backup Slides 55