3. Context: Exponential for Decades
Abundance of
- computing & storage
- generated data (estimated 8ZB in ’15)
- things
More data provides greater value
Traditional data doesn’t scale well
It’s time for a new approach!
Introduction to the Hadoop Ecosystem
3
4. New Hardware Approach
Traditional Big Data
Exotic HW Commodity HW
- big central servers -racks of pizza boxes
- SAN -Ethernet
- RAID -JBOD
Hardware reliability Unreliable HW
Scales further
Limited scalability
Cost effective
Expensive
Introduction to the Hadoop Ecosystem
4
5. New Software Approach
Traditional Big Data
Monolotic Distributed
- Centralized -storage & compute nodes
- RDBMS Raw data
Schema first Open source
Proprietary
Introduction to the Hadoop Ecosystem
5
6. Hadoop
De facto big data industry standard (batch)
Vendor adoption
- IBM, Microsoft, Oracle, EMC, ...
A collection of projects at Apache
- HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ...
Main components
- HDFS
- MapReduce
Cluster
Set of machines running HDFS and MapReduce
Introduction to the Hadoop Ecosystem
6
8. MapReduce
Introduction to the Hadoop Ecosystem
8
9. MapReduce
Introduction to the Hadoop Ecosystem
9
10. MapReduce
Introduction to the Hadoop Ecosystem
10
11. Typical Adoption Pattern
An idea that’s impractical without Hadoop
Build Hadoop-based POC
Move initial application to production
Add more datasets and users
- removing data silos in organizations
- permitting easy experiments on real data
Snowballs into institution’s central repository for
- analysis
data processing
data service layer
Introduction to the Hadoop Ecosystem
11
12. Use Case 1: Truvo
Introduction to the Hadoop Ecosystem
12
13. Use Case 2: UZ Brussel
Introduction to the Hadoop Ecosystem
13
14. How can you use Hadoop?
What data are you ignoring?
- How can you use it?
How can you combine internal and external data?
- Business partners
- Feedback from you customers through social media
- End your data silos
- ...
Introduction to the Hadoop Ecosystem
14