Intro to the Hadoop Stack @ April 2011 JavaMUG

About Me
david.engfer@gmail.com

@engfer

Meetup organizer for DFWBigData.org
> Hadoop, Cassandra, and all other things
BigData and NoSQL
> Join up!

Sr. Consultant @
> Rapidly growing national IT consulting firm
focused on career development while
operating within an local-office project model

What is Hadoop?
0 “framework for running [distributed] applications on
large cluster built of commodity hardware”
–from Hadoop Wiki
Marty McFly?

0 Originally created by Doug Cutting
> Named the project after his son’s toy

0 The name “Hadoop” has now evolved
to cover a family of products, but at its
core, it’s essentially just the
MapReduce programming paradigm
+ a distributed file system

History

>_< Growing Pains
+
Jeffery Dean: lots of data + tape backup +
expensive servers + high network bandwidth +
expensive databases + non-linear scalability + etc.
(http://bit.ly/ec31VL + http://bit.ly/gq84Ot)

History

+
>_< Growing Pains

+
Solutions

History

White Papers:
+
>_< Growing Pains
Google File System
• 2003

MapReduce

+ • 2004

BigTable
• 2006
Solutions

History

White Papers: Hadoop Core
Google File System c. 2005
• 2003

MapReduce
• 2004

BigTable
• 2006

Hadoop Distributed
File System (HDFS)
0 OSS implementation of Google File System (bit.ly/ihXkof)
0 Master/slave architecture
0 Designed to run on commodity hardware
0 Hardware failures assumed in design
0 Fault-tolerant via replication
0 Semi-POSIX compliance; relaxed for performance
0 Unix-like permissions; ties into host’s users & groups

Hadoop Distributed
File System (HDFS)
0 Written in Java
0 Optimized for larger files
0 Focus on streaming data (high-throughput > low-latency)
0 Rack-aware
0 Only *nix for production env.
0 Web consoles for stats

HDFS Client API’s
0 “Shell-like” commands (hadoop dfs [cmd])
> cat chgrp chmod chown
copyFromLocal copyToLocal cp du, dus
expunge get getmerge ls, lsr
mkdir movefromLocal mv put
rm, rmr setrep stat tail
test text touchz

0 Native Java API

0 API for other languages (http://bit.ly/fLgCJC)
> C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa,
Smalltalk, and OCaml

Other HDFS Admin Tools
0 hadoop dfsadmin [opts]
> Basic admin utilities for the DFS cluster
> Change file-level replication factors, set quotas, upgrade,
safemode, reporting, etc

0 hadoop fsck [opts]
> Runs distributed file system checking and fixing utility

0 hadoop balancer
> Utility that rebalances block storage across the nodes

HDFS Node Types
Master

NameNode
0 Single node responsible for:
> Filesystem metadata operations on cluster
> Replication and locations of file blocks
0 SPOF

=(
(backups)
CheckpointNode
or 0 Nodes responsible for:
BackupNode > NameNode backup mechanisms

Slaves
0 Nodes responsible for:
DataNode
DataNode > Storage of file blocks
DataNode
> Serving actual file data to client

HDFS Architecture
FS/namespace/meta ops

NameNode BackupNode
(namespace backups)

(heartbeats, balancing, replication, etc)

DataNode DataNode DataNode DataNode DataNode

serving data -->

nodes write to local disk

HDFS Architecture
Giant
File: (block locations, FS ops, etc)
110010101001
010100101010
<No file data!!>
011001010100 HDFS
101010010101 NameNode BackupNode
001100101010
010101001010
Client
100110010101
001010100101
010011001010
100101010010
10100101101...

data Xfer


Putting files on HDFS
client buffers blocks to local disk…
{64MB}

Giant HDFS
File: Client
110010101001 return block size and
010100101010
011001010100
nodes for each block
(based on “replication factor”)
101010010101
001100101010
010101001010
100110010101
(3 by default)
001010100101
010011001010
100101010010
10100101101... NameNode BackupNode


{node1, node2, node3}

Giant (based on “replication factor”)
HDFS
File: Client
110010101001
010100101010
011001010100
101010010101
001100101010
010101001010
100110010101
001010100101
010011001010
100101010010

While buffering to local disk,
the client Xfers block directly
to assigned data nodes



Giant HDFS
File: Client
110010101001
010100101010
011001010100
101010010101
001100101010
010101001010
100110010101
001010100101
010011001010
100101010010



Giant HDFS
File: Client
110010101001
010100101010
011001010100
101010010101
001100101010
010101001010
100110010101
001010100101
010011001010
100101010010

Ad noseum…


Getting files from HDFS
Giant HDFS
File: Client
return locations of
110010101001
010100101010
011001010100
101010010101
001100101010
blocks for file
010101001010
100110010101
001010100101
010011001010
100101010010

Stream blocks
from data nodes


Fault Tolerance?

NameNode BackupNode

NameNode detects
DataNode loss


Fault Tolerance?

NameNode BackupNode

Blocks are auto-replicated on remaining
nodes to satisfy replication factor

DataNode DataNode DataNode DataNode

Fault Tolerance?

NameNode BackupNode
NameNode loss = FAIL
(requires manual intervention)

not an EPIC fail because you
have the backup node to replay
any FS operations


**automatic failover is in the works

Live horizontal scaling
and rebalancing
NameNode BackupNode

NameNode detects new DataNode
is added to cluster


and rebalancing
NameNode BackupNode

Blocks are re-balanced and
re-distributed


and rebalancing
NameNode BackupNode

Once replication factor is satisfied,
extra replicas are removed


Other HDFS Utils
0 HDFS Raid (http://bit.ly/fqnzs5)
> Uses distributed RAID instead of
replication (useful at Petabyte from flume wiki
scale)

0 Flume/Scribe/Chukwa
> Log collection and aggregation
frameworks that support streaming
log data to HDFS
> Flume = Cloudera (http://bit.ly/gX8LeO)
> Scribe = Facebook (http://bit.ly/dIh3If)

MapReduce
0 Distributed programming paradigm and framework that is
the OSS implementation of Google’s MapReduce
(http://bit.ly/gXZbsk)

0 Modeled using the ideas behind functional programming
map() and reduce() operations
> Distributed on as many nodes as you would like

0 2 phase process:

map( )  reduce( )
sub-divide & combine & reduce
conquer cardinality

MapReduce ABC’s
0 Essentially, it’s…
1. Take a large problem and divide it into sub-problems
2. Perform the same function on all sub-problems
3. Combine the output from all sub-problems

0 Ex: Searching
1. Take a large problem and divide it into sub-problems
# Different groups of rows in DB; different parts of files; 1 user from a list of
users; etc.
2. Perform the same function on all sub-problems
# Search for a key in the given partition of data for the sub-problem; count
words; etc.
3. Combine the output from all sub-problems
# Combine the results into a result-set and return to the client

M/R Facts
0 M/R is excellent for problems where the “sub-problems”
are not interdependent
> For example, the output of one “mapper” should not depend
on the output or communication with another “mapper”
0 The reduce phase does not begin execution until all
mappers have finished
0 Failed map and reduce tasks get auto-restarted
0 Rack/HDFS-aware

MapReduce Visualized
<keyA, valuea>
<keyB, valueb>
<keyi, valuei> Mapper <keyC, valuec>
<keyA, list(valuea,valueb, valuec,…)>
…
Reducer

<keyA, valuea>
<keyi, valuei> Mapper <keyB, valueb>
<keyC, valuec> Sort <keyB, list(valuea,valueb, valuec,…)>
… and
Input group Reducer Output
by
<keyA, valuea>
<keyi, valuei> Mapper <keyB, valueb> key
<keyC, valuec> <keyC, list(valuea,valueb, valuec,…)>
…
Reducer
<keyA, valuea>
<keyB, valueb>
<keyi, valuei> Mapper <keyC, valuec>
…

Example: Word Count
<“foo”, 3>
<“bar”, 14>
<?, file1_part1> Mapper <“baz”, 6>
<“foo”, (3, 21, 11, 1)>
count() …
Reducer
sum()
<“foo”, 21>
<?, file1_part2> Mapper <“bar”, 78>
<“baz”, 12> Sort <“bar”, (14, 78, 22, 41)>
Lots of count() … and bar,155
Input
Input
Big Files group Reducer baz,59
foo,36
by sum() …
<“foo”, 11>
<?, file2_part1> Mapper <“bar”, 22> key
<“baz”, 31> <“baz”, (6, 12, 31, 10)>
count() …
Reducer
<“foo”, 1> sum()
<“bar”, 41>
<?, file2_part2> Mapper <“baz”, 10>
count() …

Hadoop’s MapReduce
0 MapReduce tasks are submitted as a “job”
> Jobs can be assigned to a specified “queue” of jobs
# By default, jobs are submitted to the “default” queue
> Job submission is controlled by ACL’s for each queue

0 Rack-aware and HDFS-aware
> The JobTracker communicates with the HDFS NameNode
and schedules map/reduce operations using input data
locality on HDFS DataNodes

M/R Nodes
Master
0 Single node responsible for:
JobTracker > Coordinating all M/R tasks & events
> Managing job queues and scheduling
> Maintains and Controls TaskTrackers
> Moves/restarts map/reduce tasks if needed
0 SPOF
=(

> Uses “checkpointing” to combat this
Slaves
0 Worker nodes responsible for:
TaskTracker
TaskTracker > Executing individual map and reduce tasks
TaskTracker
as assigned by JobTracker (in separate JVM)

Conceptual Overview

JobTracker

JobTracker controls and heartbeats
TaskTracker nodes

TaskTracker TaskTracker TaskTracker TaskTracker

TaskTrackers store temp data on HDFS
Temporary data stored on HDFS

Job Submission
M/R submit jobs to JobTracker
M/R
M/R
Client
Client
Client JobTracker jobs get queued

map()’s are assigned to TaskTrackers
(HDFS DataNode locality aware)

Mapper Mapper Mapper Mapper
mappers spawned in separate
JVM and execute mappers store results on HDFS


Job Submission
M/R submit jobs to JobTracker
M/R
M/R
Client
Client
Client JobTracker jobs get queued

reduce phase begins


Reducer Reducer Reducer Reducer

tmp data read from HDFS

MapReduce Tips
0 Keys and values can be any type of object
> Can specify custom data splitters, partitoners, combiners,
InputFormat’s, and OutputFormat’s
0 Use ToolRunner.run(Tool) to run your Java jobs…
> Will use GenericOptionsParser and DistributedCache so that
-files, -libjars, & -archives options are available to distribute
your mappers, reducers, and any
> Without this, your mappers, reducers, and other utilites will
not be propagated and added to the classpath of the other
nodes (ClassNotFoundException)

Other M/R Utils
0 $HADOOP_HOME/contrib/*
> PriorityScheduler & FairScheduler
> HOD (Hadoop On Demand)
# Uses TORQUE resource manager to dynamically allocate, use,
and destroy MapReduce clusters on an as-needed basis
# Great for development and testing
> Hadoop Streaming (next slide...)

0 Amazon’s Elastic MapReduce (EMR)
> Essentially production HOD for EC2 data/clusters

Hadoop Streaming
0 Allows you to write MapReduce jobs in languages other than
Java by running any command line process
> Input data is partitioned and given to the standard input (STDIN) of
the command line mappers and reducers specified
> Output (STDOUT) from the command line mappers and reducers
get combined into the M/R pipeline
0 Can specify custom partitioners and combiners
0 Can specify files & archives to propagate to all nodes and
unpack on local file system (-archives & -file)
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar
-input “/foo/bar/input.txt”
-mapper splitz.py
-reducer /bin/wc
-output “/foo/baz/out”
-archives „hdfs://hadoop1/foo/bar/cachedir.jar‟
-file ~/scripts/splitz.py
-D mapred.job.name=“Foo bar”

Pig
0 Framework and language (Pig Latin) for creating and
submitting Hadoop MapReduce jobs
0 Common data operations (not supported by POJO-M/R)
like join, group, filter, sort, select, etc. are provided
0 Don’t need to know Java
0 Removes boilerplate aspect from M/R
> 200 lines in Java  15 lines in Pig!
0 Relational qualities (reads and feels SQL-ish)

Pig
0 Fact from Wiki: 40% of Yahoo’s M/R jobs are in Pig
0 Interactive shell (grunt) exists
0 User Defined Functions (UDF)
> Allows you to specify Java code where the logic may be too
complex for Pig Latin
> UDF’s can be part of most every operation in Pig Latin
> Great for loading and storing custom formats as well as
transforming data

Pig Relational Operations
COGROUP JOIN SPLIT
CROSS LIMIT STORE
DISTINCT LOAD STREAM
FILTER MAPREDUCE UNION
FOREACH ORDER BY
GROUP SAMPLE

most of these are pretty self-explanatory

Example Pig Script
Taken from Pig tutorial on Pig wiki: The Temporal Query Phrase Popularity script processes a
search query log file from the Excite search engine and compares the occurrence of frequency of
search phrases across two time periods separated by twelve hours.
01: REGISTER ./tutorial.jar;
02: raw = LOAD 'excite.log' USING PigStorage('t') AS (user, time, query);
03: clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);
04: clean2 = FOREACH clean1 GENERATE user, time,
org.apache.pig.tutorial.ToLower(query) as query;
05: houred = FOREACH clean2 GENERATE user,
org.apache.pig.tutorial.ExtractHour(time) as hour, query;
06: ngramed1 = FOREACH houred GENERATE user, hour,
flatten(org.apache.pig.tutorial.NGramGenerator(query)) as ngram;
07: ngramed2 = DISTINCT ngramed1;
08: hour_frequency1 = GROUP ngramed2 BY (ngram, hour);
09: hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0),
COUNT($1) AS count;
10: hour_frequency3 = FOREACH hour_frequency2 GENERATE $0 as ngram,
$1 as hour, $2 as count;
11: hour00 = FILTER hour_frequency2 BY hour eq '00';
13: same = JOIN hour00 BY $0, hour12 BY $0;
14: same1 = FOREACH same GENERATE hour_frequency2::hour00::group::ngram as
ngram, $2 as count00, $5 as count12;
15: STORE same1 INTO '/tmp/tutorial-join-results' USING PigStorage();

Example Pig Script
Taken from Pig tutorial on Pig wiki: The Temporal Query Phrase Popularity script processes a
search query log file from the Excite search engine and compares the occurrence of frequency of
search phrases across two time periods separated by twelve hours.
01: REGISTER ./tutorial.jar;
02: raw = LOAD 'excite.log' USING PigStorage('t') AS (user, time, query);
03: clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);
04: clean2 = FOREACH clean1 GENERATE user, time,
org.apache.pig.tutorial.ToLower(query) as query;
05: houred = FOREACH clean2 GENERATE user,
org.apache.pig.tutorial.ExtractHour(time) as hour, query; UDF’’s
06: ngramed1 = FOREACH houred GENERATE user, hour,
flatten(org.apache.pig.tutorial.NGramGenerator(query)) as ngram;
07: ngramed2 = DISTINCT ngramed1;
08: hour_frequency1 = GROUP ngramed2 BY (ngram, hour);
09: hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0),
COUNT($1) AS count;
10: hour_frequency3 = FOREACH hour_frequency2 GENERATE $0 as ngram,
$1 as hour, $2 as count;
Now... image this equivalent in Java...
13: same = JOIN hour00 BY $0, hour12 BY $0;
14: same1 = FOREACH same GENERATE hour_frequency2::hour00::group::ngram as
ngram, $2 as count00, $5 as count12;
15: STORE same1 INTO '/tmp/tutorial-join-results' USING PigStorage();

<- ?
ZooKeeper
0 Centralized coordination service for use by distributed
applications
> Configuration, naming, synchronization (locks), ownership (master
election), etc.
ZooKeeper Service
Leader!

Server Server Server Server Server

Client Client Client Client Client Client Client Client

0 Important system guarantees:
> Sequential consistency (great for locking)
> Atomicity – all or nothing at all
> Data consistency – all clients view same system state regardless of
the server it connects to

<- ?
ZooKeeper
0 Hierarchical namespace of “znodes” (like directories)

0 Operations:
> create a node at a location in the tree
> delete a node
> exists - tests if a node exists at a location
> get data from a node
> set data on a node
> get children from a node
> sync - waits for data to be propagated
leaf znodes

HBase
0 Sparse, non-relational, column-oriented distributed database
built on top of Hadoop Core (HDFS + MapReduce)

0 Modeled after Google’s BigTable (http://bit.ly/fQ1NMA)

0 NoSQL Not Only SQL...
...not “SQL is terrible”
0 HBase also has:
> Strong consistency model
> In-memory operation
> LZO compression (optional)
> Live migrations
> MapReduce support for querying

What HBase Is…
0 Good at fast/streaming writes
0 Fault tolerant
0 Good at linear horizontal scalability
0 Very efficient at managing billions of rows and millions of
columns
0 Good at keeping row history
0 Good at auto-balancing
0 A complement to a SQL DB/warehouse
0 Great with non-normalized data

What HBase Is NOT…
0 Made for table joins
0 Made for splitting into normalized tables (see previous)
0 A complete replacement for a SQL relational database
0 A complete replacement for a SQL data warehouse
0 Great for storing small amounts of data
0 Great for storing gobs of large binary data
0 The best way to do OLTP
0 The best way to do live adhoc querying of any column
0 A replacement for a proper caching mechanism
0 ACID compliant (http://bit.ly/hhFXCS)

HBase Facts
0 Written in Java
0 Uses ZooKeeper to store metadata and -ROOT- region
0 Column-oriented store = flexible schema
> Can alter the schema simply by adding the column name and
data on insert (“put”)
> No schema migrations!
0 Every column has a timestamp associated with it
> Same column with most recent timestamp wins
0 Can export metrics for use with Ganglia, or as JMX
0 hbase hbck
> Check for errors and fix them (like HDFS fsck)

HBase Client API’s
0 jRuby interactive shell (hbase shell)
> DDL/DML commands
> Admin commands
> Cluster commands

0 Java API (http://bit.ly/ij0MgF)

0 REST API
> Provided using Stargate

0 API for other languages (http://bit.ly/fLgCJC)

Column-Oriented?
0 Traditional RDBMS are stored using row-oriented storage
which stores entire rows sequentially on disk
Row 1 – Cols 1-3 Row 2 – Cols 1-3
Row 3 – Cols 1-3

0 Whereas column-oriented storage only stores columns for
each row (or column-families) sequentially on disk
Row 1 – Col 1 Row 2 – Col 1 Row 1 – Col 2 Row 2 – Col 2
Row 3 – Col 1 Row 3 – Col 2

Row 1 – Col 3 Row 3 – Col 3

Where’s Row 2 - Col 2? Not needed because columns are stored
sequentially, so rows have flexible schema!

Think of HBase Tables As…
0 More like JSON
> And less like spreadsheets row id
{
"1" : {
"A" : { v: "x", ts: 4282 },
"B" : { v: "z", ts: 4282 }
},
columns
"aaaaa" : {
"A" : { v: "y", ts: 4282 }
column families allow grouping of
}, columns (faster retrieval)
"xyz" : {
“address” : {
“line1" : { v: "hello", ts: 4282 },
flexible “line2" : { v: "there", ts: 4282 },
recent TS = default col value
schema “line2" : { v: "there", ts: 1234 }
}, old TS
“fooo" : { v: "wow!", ts: 4282 }
},
"zzzzz" : {

value & timestamp (TS)
"A" : { v: "woot", ts: 4282 },
"B" : { v: "1337", ts: 4282 }
}
}
Modified from http://bit.ly/hbGWIG

HBase Overview
Data is
sent using The Master server keeps track of the
the client metadata for RegionServer’s and their
containing Regions and stores it in
Zookeeper

The HBase client communicates with the Zookeeper cluster
only to get Region information; moreover, no data is sent
through the Master

The actual row “data” (bytes) is sent directly
to and from the RegionServers

Pretty diagrams from Lars George
Therefore, the Master server nor the Zookeeper
http://goo.gl/wRLJP & http://goo.gl/6ehnV cluster don’t serve as data bottlenecks

HBase Overview
http://goo.gl/wRLJP

All HBase data (HLog and HFiles) are stored on HDFS

HDFS breaks files into 64MB chucks and replicates the chunks N times (3 by
default) to store on “actual” disk (giving HBase it’s fault tolerance)

Understanding HBase
Tables are split into groups of ~100 Regions are assigned to particular
rows (configurable) called Regions RegionServer’s by the Master
server. The Master only contains
region-location metadata and
Table HRegions contains no “real” row data.

http://goo.gl/wRLJP & http://goo.gl/6ehnV

Writing to HBase
1) HBase client gets the assigned region servers (and Pretty diagrams from Lars George
regions) from Master server for the particular keys http://goo.gl/wRLJP & http://goo.gl/6ehnV

(rows) in question and sends commands/data

HDFS

4) In memory store is
periodically flushed to
HDFS (disk) when size
reaches threshold

2) Transaction is written to write- 3) Same data is written to in memory
ahead-log on HDFS (disk) first HDFS store for the assigned region (row group)

HBase Scalability

Additional RegionServers can
be added to the live system.
The master server will then
rebalance the cluster to
migrate Regions onto the new
RegionServers

Moreover, additional HDFS data
nodes can be added to disk give
more space to the HDFS cluster

http://goo.gl/wRLJP & http://goo.gl/6ehnV

Hive
0 Data warehouse infrastructure on top of Hadoop Core
> Stores data on HDFS
> Allows you to add custom MapReduce plugins
0 HiveQL
> SQL-like language pretty close to ANSI SQL
# Supports joins
> JDBC driver exists
0 Has interactive shell (like MySQL & PostgreSQL) to run
interactive queries

Hive
0 When running a HiveQL query/script, in > SHOW TABLES;
the background Hive creates and runs a
series of MapReduce jobs to > CREATE TABLE rating (
> BigData means it can take a long time to run userid INT,
queries movieid INT,
rating INT,
unixtime STRING)
0 Therefore, it’s good for offline BigETL, but ROW FORMAT DELIMITED
not good replacement for OLTP/OLAP data FIELDS TERMINATED BY 't'
warehouse (like Oracle) STORED AS TEXTFILE;

> DESCRIBE rating;
0 Learn more from wiki: http://bit.ly/epauio

Other useful utilities around
Hadoop
0 Sqoop (http://bit.ly/eRfVEJ)
> Load SQL data from a table into HDFS or Hive
> Generates Java classes to interact with the loaded data
0 Oozie (http://bit.ly/eNLi3B)
> Orchestrates complex workflows around multiple MapReduce jobs
0 Mahout (http://bit.ly/hCXRjL)
> Algorithm library for collaborative filtering, clustering, classifiers,
and machine learning
0 Cascading (http://bit.ly/gyZNiI)
> Data query abstraction layer similar to Pig
> Java API that sits on top of MapReduce framework
> Since it’s a Java API you can use it with any program that uses a JVM
language: Groovy, Scala, Clojure, jRuby, jython, etc.

What about support?
0 Community, wikis, forumns, IRC

0 Cloudera provides enterprise support
> Offerings:
# Cloudera Enterprise
# Support, professional services, training, management apps
> Cloudera Distribution of Hadoop (CDH)
# Tested and hardened version of Hadoop products plus some
other goodies (oozie, flume, hue, sqoop, whirr)
~ Separate codebase, but patches are made to and form the Apache versions
# Packages: debian, redhat, EC2, VM
if you want to try Hadoop, CDH is probably the way to go.
I recommended this instead of downloading each project individually.

Who uses this stuff?

and many more

Where the heck can I use this stuff?
0 The hardest part, is finding the right use-cases to apply Hadoop
(and any NoSQL system)
> SQL databases are great for data that fits on one machine
> Lots of tooling support for SQL; not as much for Hadoop (yet)

0 A few questions to think about:
> How much data are you processing?
> Are you throwing away valuable data due to space?
> Are you processing data where steps aren’t interdependent?

0 Log storage, log processing, utility data, research data,
biological data, medical records, events, mail, tweets, market
data, financial data

The Law Of the Instrument

“It is tempting, if the only tool you have is
a hammer, to treat everything as if it
were a nail.”
-Abraham Maslow

Thank You

david.engfer@gmail.com

submit feedback here!

Intro to the Hadoop Stack @ April 2011 JavaMUG

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Intro to the Hadoop Stack @ April 2011 JavaMUG

Semelhante a Intro to the Hadoop Stack @ April 2011 JavaMUG (20)

Último

Último (20)

Intro to the Hadoop Stack @ April 2011 JavaMUG