SlideShare a Scribd company logo
1 of 15
Apache Pig user defined functions (UDFs)
Python UDF example 
• Motivation 
– Simple tasks like string manipulation and math 
computations are easier with a scripting language. 
– Users can also develop custom scripting engines 
– Currently only Python is supported due to the 
availability of Jython 
• Example 
– Calculate the square of a column 
– Write Hello World
Python UDF 
• Pig script 
register 'test.py' using jython as myfuncs; 
register 'test.py' using 
org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs; 
b = foreach a generate myfuncs.helloworld(), myfuncs.square(3); 
• test.py 
@outputSchema("x:{t:(word:chararray)}") 
def helloworld(): 
return ('Hello, World’) 
@outputSchema("y:{t:(word:chararray,num:long)}") 
def complex(word): 
return(str(word),long(word)*long(word)) 
@outputSchemaFunction("squareSchema") 
def square(num): 
return ((num)*(num)) 
@schemaFunction("squareSchema") 
def squareSchema(input): 
return input
UDF’s 
• UDF’s are user defined functions and are of 
the following types: 
– EvalFunc 
• Used in the FOREACH clause 
– FilterFunc 
• Used in the FILTER by clause 
– LoadFunc 
• Used in the LOAD clause 
– StoreFunc 
• Used in the STORE clause
Writing a Simple EvalFunc 
• Eval is the most common function and can be used in 
FOREACH statement of Pig 
--myscript.pig 
REGISTER myudfs.jar; 
A = LOAD 'student_data' AS (name:chararray, age: 
int, gpa:float); 
B = FOREACH A GENERATE myudfs.UPPER(name); 
DUMP B;
Source for UPPER UDF 
package myudfs; 
import java.io.IOException; 
import org.apache.pig.EvalFunc; 
import org.apache.pig.data.Tuple; 
import org.apache.pig.impl.util.WrappedIOException; 
public class UPPER extends EvalFunc<String> 
{ 
public String exec(Tuple input) throws IOException 
{ 
if (input == null || input.size() == 0) 
return null; 
try 
{ 
String str = (String)input.get(0); 
return str.toUpperCase(); 
} 
catch(Exception e) 
{ 
throw WrappedIOException.wrap("Caught exception processing input 
row ", e); 
} 
} 
}
EvalFunc’s returning Complex Types 
Create a jar of the UDFs 
$ls ExpectedClick/Eval 
LineAdToMatchtype.java 
$javac -cp pig.jar ExpectedClick/Eval/*.java 
$jar -cf ExpectedClick.jar ExpectedClick/Eval/* 
Use your function in the Pig Script 
register ExpectedClick.jar; 
offer = LOAD '/user/viraj/dataseta' USING Loader() AS (a,b,c); 
… 
offer_projected = FOREACH offer_filtered 
(chararray)a#'canon_query' AS a_canon_query, 
FLATTEN(ExpectedClick.Evals.LineAdToMatchtype((chararray)a#‘source')) AS matchtype, …
EvalFunc’s returning Complex Types 
package ExpectedClick.Evals; 
public class LineAdToMatchtype extends EvalFunc<DataBag> 
{ 
private String lineAdSourceToMatchtype (String lineAdSource) 
{ 
if (lineAdSource.equals("0") 
{ return "1"; } 
else if (lineAdSource.equals("9")) { return "2"; } 
else if (lineAdSource.equals("13")) { return "3"; } 
else return "0“; 
} 
…
EvalFunc’s returning Complex Types 
public DataBag exec (Tuple input) throws IOException 
{ 
if (input == null || input.size() == 0) 
return null; 
String lineAdSource; 
try { 
lineAdSource = (String)input.get(0); 
} catch(Exception e) { 
System.err.println("ExpectedClick.Evals.LineAdToMatchType: Can't 
convert field to a string; error = " + e.getMessage()); 
return null; 
} 
Tuple t = DefaultTupleFactory.getInstance().newTuple(); 
try { 
t.set(0,lineAdSourceToMatchtype(lineAdSource)); 
}catch(Exception e) {} 
DataBag output = DefaultBagFactory.getInstance().newDefaultBag(); 
output.add(t); 
return output; 
}
FilterFunc 
• Filter functions are eval functions that return a boolean value 
• Filter functions can be used anywhere a Boolean expression is 
appropriate 
– FILTER operator or Bincond 
• Example use Filter Func to implement outer join 
A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float); 
B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararay, contributions: 
float); 
C = COGROUP A BY name, B BY name; 
D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), flatten((IsEmpty(B) ? null : 
B)); 
dump D;
isEmpty FilterFunc 
import java.io.IOException; 
import java.util.Map; 
import org.apache.pig.FilterFunc; 
import org.apache.pig.backend.executionengine.ExecException; 
import org.apache.pig.data.DataBag; 
import org.apache.pig.data.Tuple; 
import org.apache.pig.data.DataType; 
import org.apache.pig.impl.util.WrappedIOException; 
public class IsEmpty extends FilterFunc 
{ 
public Boolean exec(Tuple input) throws IOException 
{ 
if (input == null || input.size() == 0) return null; 
try { 
Object values = input.get(0); 
if (values instanceof DataBag) 
return ((DataBag)values).size() == 0; 
else if (values instanceof Map) 
return ((Map)values).size() == 0; 
else { 
throw new IOException("Cannot test a " + DataType.findTypeName(values) + " for emptiness."); 
} 
} 
catch (ExecException ee) { 
throw WrappedIOException.wrap("Caught exception processing input row ", ee); 
} 
} 
}
LoadFunc 
• LoadFunc abstract class has the main methods for loading data 
• 3 important interfaces 
– LoadMetadata has methods to deal with metadata 
– LoadPushDown has methods to push operations from pig runtime into 
loader implementations 
– LoadCaster has methods to convert byte arrays to specific types 
• implement this method if your loader casts (implicit or explicit) from 
DataByteArray fields to other types 
• Functions to be implemented 
– getInputFormat() 
– setLocation() 
– prepareToRead() 
– getNext() 
– setUdfContextSignature() 
– relativeToAbsolutePath()
Regexp Loader Example 
public class RegexLoader extends LoadFunc { 
private LineRecordReader in = null; 
long end = Long.MAX_VALUE; 
private final Pattern pattern; 
public RegexLoader(String regex) { 
pattern = Pattern.compile(regex); 
} 
public InputFormat getInputFormat() throws IOException { 
return new TextInputFormat(); 
} 
public void prepareToRead(RecordReader reader, PigSplit split) 
throws IOException { 
in = (LineRecordReader) reader; 
} 
public void setLocation(String location, Job job) throws IOException { 
FileInputFormat.setInputPaths(job, location); 
}
Regexp Loader 
public Tuple getNext() throws IOException { 
if (!in.nextKeyValue()) { 
return null; 
} 
Matcher matcher = pattern.matcher(""); 
TupleFactory mTupleFactory = DefaultTupleFactory.getInstance(); 
String line; 
boolean tryNext = true; 
while (tryNext) { 
Text val = in.getCurrentValue(); 
if (val == null) { 
break; 
} 
line = val.toString(); 
if (line.length() > 0 && line.charAt(line.length() - 1) == 'r') { 
line = line.substring(0, line.length() - 1); 
} 
matcher = matcher.reset(line); 
ArrayList<DataByteArray> list = new ArrayList<DataByteArray>(); 
if (matcher.find()) { 
tryNext=false; 
for (int i = 1; i <= matcher.groupCount(); i++) { 
list.add(new DataByteArray(matcher.group(i))); 
} 
return mTupleFactory.newTuple(list); 
} 
} 
return null; 
} }
End of session 
Day – 3: Apache Pig user defined functions (UDFs)

More Related Content

What's hot

SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceExperfy
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database Avnish Patel
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theorybutest
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Text field and textarea
Text field and textareaText field and textarea
Text field and textareamyrajendra
 
0 1 knapsack using branch and bound
0 1 knapsack using branch and bound0 1 knapsack using branch and bound
0 1 knapsack using branch and boundAbhishek Singh
 
Mongo Nosql CRUD Operations
Mongo Nosql CRUD OperationsMongo Nosql CRUD Operations
Mongo Nosql CRUD Operationsanujaggarwal49
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxvishal choudhary
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 

What's hot (20)

Apache PIG
Apache PIGApache PIG
Apache PIG
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Text field and textarea
Text field and textareaText field and textarea
Text field and textarea
 
0 1 knapsack using branch and bound
0 1 knapsack using branch and bound0 1 knapsack using branch and bound
0 1 knapsack using branch and bound
 
Mongo Nosql CRUD Operations
Mongo Nosql CRUD OperationsMongo Nosql CRUD Operations
Mongo Nosql CRUD Operations
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Message queues
Message queuesMessage queues
Message queues
 

Similar to 05 pig user defined functions (udfs)

FP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondFP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondMario Fusco
 
Ast transformations
Ast transformationsAst transformations
Ast transformationsHamletDRC
 
golang_getting_started.pptx
golang_getting_started.pptxgolang_getting_started.pptx
golang_getting_started.pptxGuy Komari
 
TypeScript Introduction
TypeScript IntroductionTypeScript Introduction
TypeScript IntroductionDmitry Sheiko
 
Introducing PHP Latest Updates
Introducing PHP Latest UpdatesIntroducing PHP Latest Updates
Introducing PHP Latest UpdatesIftekhar Eather
 
Web2Day 2017 - Concilier DomainDriveDesign et API REST
Web2Day 2017 - Concilier DomainDriveDesign et API RESTWeb2Day 2017 - Concilier DomainDriveDesign et API REST
Web2Day 2017 - Concilier DomainDriveDesign et API RESTNicolas Faugout
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()daewon jeong
 
Functional Programming In Java
Functional Programming In JavaFunctional Programming In Java
Functional Programming In JavaAndrei Solntsev
 
Beauty and Power of Go
Beauty and Power of GoBeauty and Power of Go
Beauty and Power of GoFrank Müller
 
The definitive guide to java agents
The definitive guide to java agentsThe definitive guide to java agents
The definitive guide to java agentsRafael Winterhalter
 
Binary patching for fun and profit @ JUG.ru, 25.02.2012
Binary patching for fun and profit @ JUG.ru, 25.02.2012Binary patching for fun and profit @ JUG.ru, 25.02.2012
Binary patching for fun and profit @ JUG.ru, 25.02.2012Anton Arhipov
 
Better Software: introduction to good code
Better Software: introduction to good codeBetter Software: introduction to good code
Better Software: introduction to good codeGiordano Scalzo
 
ekb.py - Python VS ...
ekb.py - Python VS ...ekb.py - Python VS ...
ekb.py - Python VS ...it-people
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsAzul Systems, Inc.
 
Type script, for dummies
Type script, for dummiesType script, for dummies
Type script, for dummiessantiagoaguiar
 

Similar to 05 pig user defined functions (udfs) (20)

FP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondFP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyond
 
Ast transformations
Ast transformationsAst transformations
Ast transformations
 
golang_getting_started.pptx
golang_getting_started.pptxgolang_getting_started.pptx
golang_getting_started.pptx
 
TypeScript Introduction
TypeScript IntroductionTypeScript Introduction
TypeScript Introduction
 
Beyond java8
Beyond java8Beyond java8
Beyond java8
 
Introducing PHP Latest Updates
Introducing PHP Latest UpdatesIntroducing PHP Latest Updates
Introducing PHP Latest Updates
 
Txjs
TxjsTxjs
Txjs
 
Web2Day 2017 - Concilier DomainDriveDesign et API REST
Web2Day 2017 - Concilier DomainDriveDesign et API RESTWeb2Day 2017 - Concilier DomainDriveDesign et API REST
Web2Day 2017 - Concilier DomainDriveDesign et API REST
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()
 
Functional Programming In Java
Functional Programming In JavaFunctional Programming In Java
Functional Programming In Java
 
Beauty and Power of Go
Beauty and Power of GoBeauty and Power of Go
Beauty and Power of Go
 
Let's Go-lang
Let's Go-langLet's Go-lang
Let's Go-lang
 
The definitive guide to java agents
The definitive guide to java agentsThe definitive guide to java agents
The definitive guide to java agents
 
Binary patching for fun and profit @ JUG.ru, 25.02.2012
Binary patching for fun and profit @ JUG.ru, 25.02.2012Binary patching for fun and profit @ JUG.ru, 25.02.2012
Binary patching for fun and profit @ JUG.ru, 25.02.2012
 
Mastering Java ByteCode
Mastering Java ByteCodeMastering Java ByteCode
Mastering Java ByteCode
 
Better Software: introduction to good code
Better Software: introduction to good codeBetter Software: introduction to good code
Better Software: introduction to good code
 
ekb.py - Python VS ...
ekb.py - Python VS ...ekb.py - Python VS ...
ekb.py - Python VS ...
 
Intro to Pig UDF
Intro to Pig UDFIntro to Pig UDF
Intro to Pig UDF
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
 
Type script, for dummies
Type script, for dummiesType script, for dummies
Type script, for dummies
 

More from Subhas Kumar Ghosh

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descentSubhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hiveSubhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 

More from Subhas Kumar Ghosh (20)

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
01 hbase
01 hbase01 hbase
01 hbase
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 

Recently uploaded

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Recently uploaded (20)

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

05 pig user defined functions (udfs)

  • 1. Apache Pig user defined functions (UDFs)
  • 2. Python UDF example • Motivation – Simple tasks like string manipulation and math computations are easier with a scripting language. – Users can also develop custom scripting engines – Currently only Python is supported due to the availability of Jython • Example – Calculate the square of a column – Write Hello World
  • 3. Python UDF • Pig script register 'test.py' using jython as myfuncs; register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs; b = foreach a generate myfuncs.helloworld(), myfuncs.square(3); • test.py @outputSchema("x:{t:(word:chararray)}") def helloworld(): return ('Hello, World’) @outputSchema("y:{t:(word:chararray,num:long)}") def complex(word): return(str(word),long(word)*long(word)) @outputSchemaFunction("squareSchema") def square(num): return ((num)*(num)) @schemaFunction("squareSchema") def squareSchema(input): return input
  • 4. UDF’s • UDF’s are user defined functions and are of the following types: – EvalFunc • Used in the FOREACH clause – FilterFunc • Used in the FILTER by clause – LoadFunc • Used in the LOAD clause – StoreFunc • Used in the STORE clause
  • 5. Writing a Simple EvalFunc • Eval is the most common function and can be used in FOREACH statement of Pig --myscript.pig REGISTER myudfs.jar; A = LOAD 'student_data' AS (name:chararray, age: int, gpa:float); B = FOREACH A GENERATE myudfs.UPPER(name); DUMP B;
  • 6. Source for UPPER UDF package myudfs; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import org.apache.pig.impl.util.WrappedIOException; public class UPPER extends EvalFunc<String> { public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) return null; try { String str = (String)input.get(0); return str.toUpperCase(); } catch(Exception e) { throw WrappedIOException.wrap("Caught exception processing input row ", e); } } }
  • 7. EvalFunc’s returning Complex Types Create a jar of the UDFs $ls ExpectedClick/Eval LineAdToMatchtype.java $javac -cp pig.jar ExpectedClick/Eval/*.java $jar -cf ExpectedClick.jar ExpectedClick/Eval/* Use your function in the Pig Script register ExpectedClick.jar; offer = LOAD '/user/viraj/dataseta' USING Loader() AS (a,b,c); … offer_projected = FOREACH offer_filtered (chararray)a#'canon_query' AS a_canon_query, FLATTEN(ExpectedClick.Evals.LineAdToMatchtype((chararray)a#‘source')) AS matchtype, …
  • 8. EvalFunc’s returning Complex Types package ExpectedClick.Evals; public class LineAdToMatchtype extends EvalFunc<DataBag> { private String lineAdSourceToMatchtype (String lineAdSource) { if (lineAdSource.equals("0") { return "1"; } else if (lineAdSource.equals("9")) { return "2"; } else if (lineAdSource.equals("13")) { return "3"; } else return "0“; } …
  • 9. EvalFunc’s returning Complex Types public DataBag exec (Tuple input) throws IOException { if (input == null || input.size() == 0) return null; String lineAdSource; try { lineAdSource = (String)input.get(0); } catch(Exception e) { System.err.println("ExpectedClick.Evals.LineAdToMatchType: Can't convert field to a string; error = " + e.getMessage()); return null; } Tuple t = DefaultTupleFactory.getInstance().newTuple(); try { t.set(0,lineAdSourceToMatchtype(lineAdSource)); }catch(Exception e) {} DataBag output = DefaultBagFactory.getInstance().newDefaultBag(); output.add(t); return output; }
  • 10. FilterFunc • Filter functions are eval functions that return a boolean value • Filter functions can be used anywhere a Boolean expression is appropriate – FILTER operator or Bincond • Example use Filter Func to implement outer join A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float); B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararay, contributions: float); C = COGROUP A BY name, B BY name; D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), flatten((IsEmpty(B) ? null : B)); dump D;
  • 11. isEmpty FilterFunc import java.io.IOException; import java.util.Map; import org.apache.pig.FilterFunc; import org.apache.pig.backend.executionengine.ExecException; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.DataType; import org.apache.pig.impl.util.WrappedIOException; public class IsEmpty extends FilterFunc { public Boolean exec(Tuple input) throws IOException { if (input == null || input.size() == 0) return null; try { Object values = input.get(0); if (values instanceof DataBag) return ((DataBag)values).size() == 0; else if (values instanceof Map) return ((Map)values).size() == 0; else { throw new IOException("Cannot test a " + DataType.findTypeName(values) + " for emptiness."); } } catch (ExecException ee) { throw WrappedIOException.wrap("Caught exception processing input row ", ee); } } }
  • 12. LoadFunc • LoadFunc abstract class has the main methods for loading data • 3 important interfaces – LoadMetadata has methods to deal with metadata – LoadPushDown has methods to push operations from pig runtime into loader implementations – LoadCaster has methods to convert byte arrays to specific types • implement this method if your loader casts (implicit or explicit) from DataByteArray fields to other types • Functions to be implemented – getInputFormat() – setLocation() – prepareToRead() – getNext() – setUdfContextSignature() – relativeToAbsolutePath()
  • 13. Regexp Loader Example public class RegexLoader extends LoadFunc { private LineRecordReader in = null; long end = Long.MAX_VALUE; private final Pattern pattern; public RegexLoader(String regex) { pattern = Pattern.compile(regex); } public InputFormat getInputFormat() throws IOException { return new TextInputFormat(); } public void prepareToRead(RecordReader reader, PigSplit split) throws IOException { in = (LineRecordReader) reader; } public void setLocation(String location, Job job) throws IOException { FileInputFormat.setInputPaths(job, location); }
  • 14. Regexp Loader public Tuple getNext() throws IOException { if (!in.nextKeyValue()) { return null; } Matcher matcher = pattern.matcher(""); TupleFactory mTupleFactory = DefaultTupleFactory.getInstance(); String line; boolean tryNext = true; while (tryNext) { Text val = in.getCurrentValue(); if (val == null) { break; } line = val.toString(); if (line.length() > 0 && line.charAt(line.length() - 1) == 'r') { line = line.substring(0, line.length() - 1); } matcher = matcher.reset(line); ArrayList<DataByteArray> list = new ArrayList<DataByteArray>(); if (matcher.find()) { tryNext=false; for (int i = 1; i <= matcher.groupCount(); i++) { list.add(new DataByteArray(matcher.group(i))); } return mTupleFactory.newTuple(list); } } return null; } }
  • 15. End of session Day – 3: Apache Pig user defined functions (UDFs)