2.introduction to hdfs

Introduction to
HDFS
(Hadoop Distributed File System)

 Hadoop
 What is HDFS
 Core components
 Architecture
 Name Node
 Metadata
 Secondary Name Node
 HDFS Blocks
 Limitation
 File System commands

Hadoop is a framework that
allows for distributed processing
of large data sets across clusters of
commodity computers using a
simple programming model

Hadoop was designed to enable applications to make most out of cluster
architecture by addressing two key points:
1. Layout of data across the cluster ensuring data is evenly distributed
2. Design of applications to benefit from data locality
It brings us two main mechanism of hadoop hdfs and hadoop MapReduce

Splits, Scatter, Replicate and manage data across nodes

HDFS is a file system designed
for storing very large files with
streaming data access patterns,
running clusters on commodity
hardware

 Highly fault tolerant
 Suitable for application with large data sets
 Streaming access to file system data
 Can be built out of commodity hardware
Features

Hadoop can handle small datasets but you can’t unleash the power of
hadoop.
There is overhead associated with each data distribution. If dataset is small
you won’t get huge advantage in hadoop.
If dataset is small and unstructured, you will try to collate the data.
Areas where Hadoop is not good fit Today

2.introduction to hdfs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (15)

Semelhante a 2.introduction to hdfs

Semelhante a 2.introduction to hdfs (20)

Último

Último (20)

2.introduction to hdfs

Notas do Editor