Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)

Hadoop architecture An overview Hari Shankar Sreekumar Software Engineer @Clickable

Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .

Hadoop Distributed File System A distributed filesystem designed for storing very large files with streaming data access running on clusters of commodity hardware . HDFS has been designed keeping MapReduce in mind Consists of a cluster of machines, each machine performing one or more of the following roles: Namenode (Only one per cluster) Secondary namenode (Checkpoint node) (Only one per cluster) Datanodes (Many per cluster)

HDFS Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Namenode and Datanodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Datanodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Secondary namenode/Checkpoint node ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Image: Hadoop, The definitive Guide (Tom White)

Replication and rack-awareness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Reading from HDFS Image: Hadoop, The definitive Guide (Tom White) Failure=>Move to next 'closest' node with the block. Direct connection between client and datanode

Writing to HDFS Minimum replication for successful write: dfs.replication.min Files in HDFS are write-once and have strictly one writer at any time. Image: Hadoop, The definitive Guide (Tom White)

Hadoop Common File system abstraction: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. Service-level authorization: Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.

[object Object],[object Object],[object Object],[object Object],[object Object],Data Integrity

Compression utilities ,[object Object],[object Object],Ref: Hadoop, The definitive Guide (Tom White) Splittable LZO is available separately and is a good trade-off between compression speed and compressed size.

Serialization utilities ,[object Object],[object Object],[object Object]

MapReduce Framework ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)

Similar to Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011) (20)

Recently uploaded

Recently uploaded (20)

Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)