3. Normal Operators Arithmetic Operators X = FOREACH A GENERATE f1, f2, f1 % f2; Boolean Operators X = FILTER A BY (f1 == 8) OR (NOT (f2+f3 > f1)); Cast operators X = FOREACH B GENERATE group, (chararray) COUNT(A) AS total; Comparison Operators X = FILTER A BY (f1 matches '.*apache.*') OR (NOT (f2+f3 > f1)); Flatten Operator Tuple: remove a level of nesting Bag :remove a level of nesting, may cause cross product
5. Relational Operators LOADa bag of tuples A = LOAD 'data' [USING function] [AS schema]; STORE A = STORE alias INTO 'directory' [USING function]; FOREACHtuple in the bag, produce a new tuple A = FOREACH queries GENERATE uid, expandQuery(query); FILTERa bag to produce a subset of it A = FILTER queries BY uidneq ‘bot’ OR notBot(uid);
6. Relational Operators COGROUP/GROUPone or less than 127 relations alias = GROUP … by …, … by… {group: int, A: {name: chararray,age: int,gpa: float}} (18,{(John,18,4.0F),(Joe,18,3.8F)})
7. Relational Operators JOIN(inner/outer) Replicated Joins one or more relations are small enough to fit into main memory. Skewed Joins computes a histogram of the key space and uses this data to allocate reducers for a given key. Merge Joins Sorted, perform join on map phase
9. Relational Operators ORDERalias by filed DESC/ASC Unstable SPLITalias INTO alias IF …, alias IF … CROSS cross product X = CROSS A, B; DISTINCT Removes duplicate tuples in a relation. X = DISTINCT A; LIMIT LIMITE A 3; SAMPLE SAMPLE alias size; IMPORT Import other .pig file DEFINE Define a Pig macro.
10. Built In Eval Function AVG/MAX/MIN/SUM on a single column of a bag; group it first COUNT/ COUNT_STAR number of elements in a bag; COUNT_STAR counts null CONCAT DIFF IsEmpty SIZE TOKENIZE
11. Other Built In Function Load/Store Functions Math Functions String Functions
12. Map-Reduce Plan Compilation Compile each GROUP into distinct Map-Reduce job Push commands between LOAD and GROUP to the Map Side Commands between subsequent GROUP Gi and Gi+1 pushed into the Reduce Side of Gi
14. User Defined Function Simple Eval Function public class UPPER extends EvalFunc<String>{ public String exec(Tuple input) throws IOException { // ....... }}
15. User Defined Function Aggregate Functions Algebraic Interface they can be computed incrementally in a distributed fashion. Accumulator Interface designed to decrease memory usage