SlideShare uma empresa Scribd logo
1 de 7
Pig 0.6 and 0.7 Alan Gates What’s New With Pig
Accumulator A = load ‘clicks’; B = group A by user; C = foreach B { 	C1 = order A by timestamp; 	generate user, sessionize(C1); } … Many aggregate operations cannot use combiner but do not need all records for a single key together New in 0.6, Accumulator interface which can be implemented by UDFs Pig calls accumulate multiple times with partial list of tuples, then when the key changes calls getValue
Also in 0.6 UDFContext, allows UDFs to pass info from frontend to backend and to access JobConf A lot of work with memory manager to reduce the number of GCOverhead and out of heap errors
New Load and Store Interfaces 0.6 and before Want to write a LoadFunc that works on files and uses standard splits?  Easy Want to write a LoadFunc that works on something other than files or uses non-standard splits?  Hard; have to write a Slicer (which mostly duplicates Hadoop’sInputFormat) Want to write a StoreFunc that works on something other than files?  Sorry 0.7 LoadFunc now sits atop InputFormat, so if you have an InputFormat for your data, writing a LoadFunc is easy StoreFunc now sits atop OutputFormat, … Not backward compatible, will require rewrite of custom Load and StoreFuncs
Also in 0.7 Moved local mode to Hadoop’sLocalJobRunner; means debugging environment much closer to runtime environment More aggressive use of Hadoop distributed cache for features such as replicated join and order by
What We Are Working On Now Runtime statistics – track what features your script used, how many records it processed, etc.  Results stored in Pig logs and job history files Adding UDFs in scripting languages (python initially) - PIG-928 Allow users to set a custom partitioner in some cases - PIG-282 Make Pig available in Maven repositories - PIG-1334 Label Interfaces for audience and stability - PIG-1311 Part of Hadoop’s compatibility plan, see the following blog posthttp://bit.ly/9yRDlH
Questions

Mais conteúdo relacionado

Destaque

August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
Yahoo Developer Network
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
Hadoop User Group
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
Yahoo Developer Network
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Yahoo Developer Network
 

Destaque (20)

Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka PresentationJanuary 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
 
AWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmonyAWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmony
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
 

Mais de Hadoop User Group

2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
Hadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
Hadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
Hadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
Hadoop User Group
 
Twitter Protobufs And Hadoop Hug 021709
Twitter Protobufs And Hadoop   Hug 021709Twitter Protobufs And Hadoop   Hug 021709
Twitter Protobufs And Hadoop Hug 021709
Hadoop User Group
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In Python
Hadoop User Group
 

Mais de Hadoop User Group (18)

Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Release Plan Feb17
Hadoop Release Plan Feb17Hadoop Release Plan Feb17
Hadoop Release Plan Feb17
 
Twitter Protobufs And Hadoop Hug 021709
Twitter Protobufs And Hadoop   Hug 021709Twitter Protobufs And Hadoop   Hug 021709
Twitter Protobufs And Hadoop Hug 021709
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Searching At Scale
Searching At ScaleSearching At Scale
Searching At Scale
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In Python
 
File Context
File ContextFile Context
File Context
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Yahoo! Hadoop User Group - May 2010 Meetup - What's new with Pig? Alan Gates, Yahoo!

  • 1. Pig 0.6 and 0.7 Alan Gates What’s New With Pig
  • 2. Accumulator A = load ‘clicks’; B = group A by user; C = foreach B { C1 = order A by timestamp; generate user, sessionize(C1); } … Many aggregate operations cannot use combiner but do not need all records for a single key together New in 0.6, Accumulator interface which can be implemented by UDFs Pig calls accumulate multiple times with partial list of tuples, then when the key changes calls getValue
  • 3. Also in 0.6 UDFContext, allows UDFs to pass info from frontend to backend and to access JobConf A lot of work with memory manager to reduce the number of GCOverhead and out of heap errors
  • 4. New Load and Store Interfaces 0.6 and before Want to write a LoadFunc that works on files and uses standard splits? Easy Want to write a LoadFunc that works on something other than files or uses non-standard splits? Hard; have to write a Slicer (which mostly duplicates Hadoop’sInputFormat) Want to write a StoreFunc that works on something other than files? Sorry 0.7 LoadFunc now sits atop InputFormat, so if you have an InputFormat for your data, writing a LoadFunc is easy StoreFunc now sits atop OutputFormat, … Not backward compatible, will require rewrite of custom Load and StoreFuncs
  • 5. Also in 0.7 Moved local mode to Hadoop’sLocalJobRunner; means debugging environment much closer to runtime environment More aggressive use of Hadoop distributed cache for features such as replicated join and order by
  • 6. What We Are Working On Now Runtime statistics – track what features your script used, how many records it processed, etc. Results stored in Pig logs and job history files Adding UDFs in scripting languages (python initially) - PIG-928 Allow users to set a custom partitioner in some cases - PIG-282 Make Pig available in Maven repositories - PIG-1334 Label Interfaces for audience and stability - PIG-1311 Part of Hadoop’s compatibility plan, see the following blog posthttp://bit.ly/9yRDlH

Notas do Editor

  1. Brief description of combiner and algebraicOnly used if all UDFs in a foreach can use it