3. ++
Big Data for What?
Service
CAP Theorem, Fast Response ,Scale Out , Schema
Free ...
Distributor with RDBMS
NoSQL
MongoDB , HBASE , CouchDB ...
Analysis
Hadoop <--- today’s topic!!!
6. ++
single master
Strong Point
simple architecture
master have global knowledge.
file and block namespace (memory and disk)
mapping from files to blocks (memory and disk)
location of each block’s replicas ( only memory)
master can make sophisticated decisions.
8. ++
Fast Recovery for NameNode
Secondary Namenode
crawls namenode’s
operation log
maintains
namenode’s data
NameNodeNameNodeNameNodeNameNode
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
Secondary NameNodeSecondary NameNodeSecondary NameNodeSecondary NameNode
9. ++
HA for NameNode
active namenode
do normal
namenode’s
operation
standby namenode
maintain
namenode’s data
ready to be active
namenode
NameNode(active)NameNode(active)NameNode(active)NameNode(active)
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
NameNode(standby)NameNode(standby)NameNode(standby)NameNode(standby)
15. ++ namespace management and locking
goal
ensure proper serialization
use read lock/write lock
16. ++
block replica placement
goal
maximize data reliability and availability
maximize network bandwidth utilization
default strategy is ...
one on same datanode.
one on other datanode in same rack.
one on other datanode in other rack.
17. ++ creation, re-replication, rebalancing
creation
client create new files
consider
disk space utilization
number of recent creation
spread replicas
re-replication
number of available replica falls below proper goal
datanode down, replica corruption ...
rebalancing
move replicas for better disk space and load balancing
18. ++
garbage collection
what’s garbage?
block not in namenode’s metadata.
mechanism
when exchanging HeartBeat with namenode,
datanode reports subset of block it has.
master replies with garbage blocks.
datanode deletes grabage blocks.