2. Hadoop
What is HDFS
Core components
Architecture
Name Node
Metadata
Secondary Name Node
HDFS Blocks
Limitation
File System commands
3. Hadoop is a framework that
allows for distributed processing
of large data sets across clusters of
commodity computers using a
simple programming model
4. Hadoop was designed to enable applications to make most out of cluster
architecture by addressing two key points:
1. Layout of data across the cluster ensuring data is evenly distributed
2. Design of applications to benefit from data locality
It brings us two main mechanism of hadoop hdfs and hadoop MapReduce
7. HDFS is a file system designed
for storing very large files with
streaming data access patterns,
running clusters on commodity
hardware
8. Highly fault tolerant
Suitable for application with large data sets
Streaming access to file system data
Can be built out of commodity hardware
Features
16. Hadoop can handle small datasets but you can’t unleash the power of
hadoop.
There is overhead associated with each data distribution. If dataset is small
you won’t get huge advantage in hadoop.
If dataset is small and unstructured, you will try to collate the data.
Areas where Hadoop is not good fit Today