Hadoop is an open-source software framework for distributed storage and processing of large datasets using the MapReduce programming model. It includes HDFS for data storage and MapReduce for data processing across clusters of compute nodes. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. It provides reliability through data replication and distributed architecture.
28. • Hadoop document and installation
– http://hadoop.apache.org/
• Hadoop Wiki
– http://wiki.apache.org/hadoop/
• Google File System Paper
– http://labs.google.com/papers/gfs.html
Copyright 2009 - Trend Micro Inc.