SlideShare a Scribd company logo
1 of 32
Pig at Linkedin
Chris Riccomini
9/29/10
Who?
What?
LinkedIn Analytics
Pig at LinkedIn
Why?
Production Quality
Streaming
Serialization
VoldemortStorage ~ Avro
views = LOAD '/data/awesome' USING VoldemortStorage();
Voldemort ♥ Pig
Partitioning
YYYY/MM/DD
Last N days?
views = LOAD '/data/etl/tracking/extracted/profile-view' USING
VoldemortStorage('date.range', 'num.days=90;days.ago=1’)
Some-file-YYYY-MM-DD
member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST'
USING VoldemortStorage()
Scheduling
Azkaban
type=pig
pig.script=myscript.pig
Ad hoc?
Future at LinkedIn
Wishes
Dates
Fix Data Types
JSON
Cross Platform
Questions?
• criccomini@linkedin.com
• http://www.riccomini.name
• http://www.sna-projects.com
• http://www.project-voldemort.com
• @criccomini
• LinkedIn is Hiring! Email me!

More Related Content

Similar to Pig at Linkedin

Economies of Scaling Software
Economies of Scaling SoftwareEconomies of Scaling Software
Economies of Scaling SoftwareJoshua Long
 
Solving performance issues in Django ORM
Solving performance issues in Django ORMSolving performance issues in Django ORM
Solving performance issues in Django ORMSian Lerk Lau
 
Brandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder
 
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2Chris Gates
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
 
Building Awesome API with Spring
Building Awesome API with SpringBuilding Awesome API with Spring
Building Awesome API with SpringVladimir Tsukur
 
D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web Outliers Collective
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack MattersAmazon Web Services
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays
 
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...Amazon Web Services
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeAlberto Diaz Martin
 
Empowering red and blue teams with osint c0c0n 2017
Empowering red and blue teams with osint   c0c0n 2017Empowering red and blue teams with osint   c0c0n 2017
Empowering red and blue teams with osint c0c0n 2017reconvillage
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingDhruv Majumdar
 
Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)johnwilander
 
BrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkBrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkJosh Holmes
 

Similar to Pig at Linkedin (20)

Economies of Scaling Software
Economies of Scaling SoftwareEconomies of Scaling Software
Economies of Scaling Software
 
Solving performance issues in Django ORM
Solving performance issues in Django ORMSolving performance issues in Django ORM
Solving performance issues in Django ORM
 
Brandfolder - JSON + Postgres
Brandfolder - JSON + PostgresBrandfolder - JSON + Postgres
Brandfolder - JSON + Postgres
 
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
 
Building Awesome API with Spring
Building Awesome API with SpringBuilding Awesome API with Spring
Building Awesome API with Spring
 
D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web D3.js: Data Visualization for the Web
D3.js: Data Visualization for the Web
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Threat Modeling Using STRIDE
Threat Modeling Using STRIDEThreat Modeling Using STRIDE
Threat Modeling Using STRIDE
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...
 
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data store
 
Empowering red and blue teams with osint c0c0n 2017
Empowering red and blue teams with osint   c0c0n 2017Empowering red and blue teams with osint   c0c0n 2017
Empowering red and blue teams with osint c0c0n 2017
 
Intro GraphQL
Intro GraphQLIntro GraphQL
Intro GraphQL
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat Hunting
 
Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)Application Security for Rich Internet Applicationss (Jfokus 2012)
Application Security for Rich Internet Applicationss (Jfokus 2012)
 
BrazilJS Perf Doctor Talk
BrazilJS Perf Doctor TalkBrazilJS Perf Doctor Talk
BrazilJS Perf Doctor Talk
 
Threatcrowd
ThreatcrowdThreatcrowd
Threatcrowd
 

More from Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 

More from Hadoop User Group (20)

Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Pig at Linkedin

Editor's Notes

  1. Chris Riccomini Senior Data Scientist at LinkedIn Involved in People you may know, Who’s viewed my profile, Avatara, and Distributed Computing at LinkedIn Worked in PayPal’s anti-fraud team as a data visualization engineer
  2. talking about linkedin’s analytic environment, motivation for pig at linkedin, how we integrated it, and pig in the future
  3. aster, hadoop, voldemort, azkaban, pig
  4. aster, hadoop, voldemort, azkaban, pig
  5. 40% of jobs run are pig # of production products that use pig pymk, ads, profile stats, jobs for you, talent match, groups you might like, browse maps, experimentation platform
  6. in early 2009 we were working on converting PYMK from Aster to Hadoop everything was java based tired of writing joins, filters, etc (glue) built and deployed pig on laptop while at a conference wrote serializer in a few days significantly sped up delivery time for PYMK
  7. motivation was not ad-hoc/sql/business analytics. motivation was product analytics, and PRODUCTION products. stability was key. reproducability was key. simplicity/understandability was key. (both the scripts and the system itself) "if it runs now, it will always run"
  8. as streaming became more popular, pig is still used as glue, but complex jobs are now just python instead of java.
  9. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  10. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  11. we use "voldemort" serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
  12. we use pig to read from and write to voldemort all writes are currently done with read-only stores reads are done using roshan's voldemort loader func can also use roshan's voldemort store func to write directly to read-write stores
  13. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  14. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  15. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  16. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  17. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  18. one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1'); member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage();
  19. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  20. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  21. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  22. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  23. we use azkaban (like a very simple version of oozie) azkaban contains a "pig" job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
  24. just starting to use pig for ad hoc analysis mostly engineers using it now some business analytics are starting to use it also looking at hive
  25. pig 0.8, avro, hive, UDFs
  26. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  27. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  28. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  29. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
  30. dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json