4. 4
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,
IBM, 70k
Lisa, 1997,
BSIT
Column Families
€10 €5
€0€7
5. 5
HBase API
put ‘pers‘, ‘Carl‘,
‘info:born‘, ‘1982‘
put ‘pers‘, ‘Carl‘,
‘info:school‘, ‘BSIT‘
put ‘pers‘, ‘Carl‘,
‘info:school‘, ‘BUIT‘
get ‘pers‘, ‘Carl‘
19. 19
RowId info
Eve
Carl
Julia
Lisa
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary):
born cmpny salary
1965 IBM 70k
born cmpny job
1966 IBM intern
born cmpny salary
1967 IBM 80k
born school salary
1997 BSIT 1k
salsum
IBM 70k
salsum
IBM 80k
salsum
IBM 150k
21. 21
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,
IBM, 70k
Lisa, 1997,
BSIT
€10 €5
€0€7
22. 22
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
23. 23
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
24. 24
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c) <- IN._v;
25. 25
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c?(@>5))
<- IN._v;
cell predicate
26. 26
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c?(!Carl))
<- IN._v;
cell predicate
28. 28
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
RowId info
Peter born cmpny salary
1965 IBM 70k
salsum
IBM 70k
((IBM, salsum), 70k)
31. 31
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,
IBM, 70k
Lisa, 1997,
BSIT
€10 €5
€0€7
33. 33
RowId info links
Wikipedia
Twitter
Google
crawled pr
17:35 0.333
Twitter Google
- -
Google
-
Wikipedia
-
crawled pr
17:36 0.333
crawled pr
17:36 0.333
OUT._r <- IN.links._c,
OUT.incoming.$(IN._r) <- IN._v;
34. 34
RowId info links incoming
Wikipedia
Twitter
Google
crawled pr
17:35 0.333
Twitter Google
- -
Google
-
Wikipedia
-
crawled pr
17:36 0.333
crawled pr
17:36 0.333
OUT._r <- IN.links._c,
OUT.incoming.$(IN._r) <- IN._v;
Twitter Wikipedia
- -
Google
-
Wikipedia
-
Reverting
the graph
35. 35
RowId info links incoming
Wikipedia
Twitter
Google
crawled pr
17:35 0.333
Twitter Google
- -
Google
-
Wikipedia
-
crawled pr
17:36 0.333
crawled pr
17:36 0.333
OUT._r <- IN.links._c,
OUT.info.pr <-
SUM(IN.pr/COL_COUNT(links));
Twitter Wikipedia
- -
Google
-
Wikipedia
-
36. 36
RowId info links incoming
Wikipedia
Twitter
Google
crawled pr
17:35 0.333
Twitter Google
- -
Google
-
Wikipedia
-
crawled pr
17:36 0.167
crawled pr
17:36 0.5
OUT._r <- IN.links._c,
OUT.info.pr <-
SUM(IN.pr/COL_COUNT(links));
Twitter Wikipedia
- -
Google
-
Wikipedia
-
PageRank
38. 38
RowId info
Wikipedia
Twitter
crawled pr body
17:35 0.333 all information can be found here
OUT._r <- IN._r,
OUT.words
<- COUNT(IN.body.split(‘ ‘));
crawled pr body
17:36 0.167 click here for more information
39. 39
RowId info
Wikipedia
Twitter
crawled pr body words
17:35 0.333 all information can be found here 6
OUT._r <- IN._r,
OUT.words
<- COUNT(IN.body.split(‘ ‘));
crawled pr body
17:36 0.167 click here for more information 5
Word
Count
40. 40
RowId info
Wikipedia
Twitter
crawled pr body words
17:35 0.333 all information can be found here 6
OUT._r <- IN.body.split(‘ ‘),
OUT.$(IN._r) <- COUNT(*);
crawled pr body
17:36 0.167 click here for more information 5
RowId info
here
…
Wikipedia Twitter
1 1
Term Index