My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
96. I bet you can understand the following
A = LOAD 'mydata' USING PigStorage() AS (url, time, size);
B = GROUP A BY url;
C = FOREACH B GENERATE group,
COUNT(A),
SUM(size),
SUM(time)/COUNT(A);
DUMP C;
100. Offers HiveQL, close to SQL
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
friends ARRAY<BIGINT>, properties MAP<STRING, STRING>
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '1'
COLLECTION ITEMS TERMINATED BY '2'
MAP KEYS TERMINATED BY '3'
STORED AS SEQUENCEFILE;
INSERT OVERWRITE TABLE xyz_com_page_views
SELECT page_views.*
FROM page_views
WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'
AND page_views.referrer_url like '%xyz.com';
112. A znode can have data AND children
2f 85 1e 4a 73 47 c5 e4 39 ff 0f b6 46 79 ac c5
48 c1 99 85 48 16 df 04 6a 2c cc ce 9e 4f ae cb
20 a5 9d 62 57 96 35 c3 eb 3d cb c3 1c cb 91 f8
a2 4d 90 57 0a 62 24 f9 5e a4 50 00 6a bd 3c ea
68 61 3f bf 7a 48 8f 26 63 24 e9 d4 3b b4 55 c2
113. A znode has Access Control Lists
CREATE (children)
READ (list children and get data)
WRITE (set data)
DELETE (children)
ADMIN (setACL)
114. Znodes can be persistent or ephemeral
Ephemeral znodes vanish with their creator's session
Persistent znodes outlive their creator's session
Ephemeral znodes cannot have children
115. Znodes can be sequential
A monotonically increasing counter is appended to the znode name
/zk/foo-1
/zk/foo-3
/zk/foo-4
...
This can be used to impose a global order among direct children
116. Znodes can have watches
Watches allow clients to be notifed when znodes change
NodeCreated
NodeDeleted
NodeDataChanged
NodeChildrenChanged
117. ZooKeeper has a simple API
exists
getData
getChildren } set watches
create
delete
setData
setACL
getACL
sync
118. ZooKeeper consistency
Sequential Consistency
Updates from a client are applied in the order sent
Atomicity
Updates either succeed or fail
Single System Image
Unique view, regardless of the server we connect to
Durability
Updates once succeeded will not be undone
Timeliness
Lag is bounded
119. ZooKeeper use cases
Confguration service
Get latest confg and get notifed when it changes
Lock service
Provide mutual exclusion
Leader election
There can be only one...
Group membership
Dynamically determine members of a group
Queue
Producer/Consumer paradigm