Pig at Linkedin

•Download as PPT, PDF•

1 like•2,848 views

H

Hadoop User Group

Pig at LinkedIn by Chris Riccomini from LinkedIn Pig is an integral part of data analytics at LinkedIn. Learn about LinkedIn’s analytic stack, and see how Pig is used to design, develop, and deliver data products at LinkedIn. We’ll explore a successful example of Pig deployment at LinkedIn, pain points, and integration with Azkaban, Voldemort, Hadoop, and the rest of LinkedIn’s ecosystem.

Education Technology Business

Pig at Linkedin
Chris Riccomini
9/29/10

Who?

What?

LinkedIn Analytics

Pig at LinkedIn

Why?

Production Quality

Streaming

Serialization

VoldemortStorage ~ Avro

views = LOAD '/data/awesome' USING VoldemortStorage();

Voldemort ♥ Pig

Partitioning

YYYY/MM/DD

Last N days?

Some-file-YYYY-MM-DD

member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST'
USING VoldemortStorage()

Scheduling

Azkaban

type=pig
pig.script=myscript.pig

Ad hoc?

Future at LinkedIn

Wishes

Dates

Fix Data Types

JSON

Cross Platform

Questions?
• criccomini@linkedin.com
• http://www.riccomini.name
• http://www.sna-projects.com
• http://www.project-voldemort.com
• @criccomini
• LinkedIn is Hiring! Email me!

Recommended

Hdfs high availability

Hdfs high availability

Hdfs high availabilityHadoop User Group

Common crawlpresentation

Common crawlpresentation

Common crawlpresentationHadoop User Group

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2Rob Fuller

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2Chris Gates

Building Intelligent bots using microsoft bot framework and cognitive service...

Building Intelligent bots using microsoft bot framework and cognitive service...

Building Intelligent bots using microsoft bot framework and cognitive service...Prashant G Bhoyar (Microsoft MVP)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)Ontico

Prisma the ORM that node was waiting for

Prisma the ORM that node was waiting for

Prisma the ORM that node was waiting forCommit University

From the proposal to ECMAScript – Step by Step

From the proposal to ECMAScript – Step by Step

From the proposal to ECMAScript – Step by StepIgalia

Recommended

Hdfs high availability

Hdfs high availability

Hdfs high availabilityHadoop User Group

Common crawlpresentation

Common crawlpresentation

Common crawlpresentationHadoop User Group

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2Rob Fuller

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2

Dirty Little Secrets They Didn't Teach You In Pentest Class v2Chris Gates

Building Intelligent bots using microsoft bot framework and cognitive service...

Building Intelligent bots using microsoft bot framework and cognitive service...

Building Intelligent bots using microsoft bot framework and cognitive service...Prashant G Bhoyar (Microsoft MVP)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)

Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)Ontico

Prisma the ORM that node was waiting for

Prisma the ORM that node was waiting for

Prisma the ORM that node was waiting forCommit University

From the proposal to ECMAScript – Step by Step

From the proposal to ECMAScript – Step by Step

From the proposal to ECMAScript – Step by StepIgalia

Economies of Scaling Software

Economies of Scaling Software

Economies of Scaling SoftwareJoshua Long

Solving performance issues in Django ORM

Solving performance issues in Django ORM

Solving performance issues in Django ORMSian Lerk Lau

Brandfolder - JSON + Postgres

Brandfolder - JSON + Postgres

Brandfolder - JSON + PostgresBrandfolder

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2Chris Gates

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services

Building Awesome API with Spring

Building Awesome API with Spring

Building Awesome API with SpringVladimir Tsukur

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web Outliers Collective

Tactical Information Gathering

Tactical Information Gathering

Tactical Information GatheringChristian Martorella

Threat Modeling Using STRIDE

Threat Modeling Using STRIDE

Threat Modeling Using STRIDEGirindro Pringgo Digdo

Why Visibility into Your Stack Matters

Why Visibility into Your Stack Matters

Why Visibility into Your Stack MattersAmazon Web Services

Scalable data structures for data science

Scalable data structures for data science

Scalable data structures for data scienceTuri, Inc.

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...Amazon Web Services

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data storeAlberto Diaz Martin

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017reconvillage

Intro GraphQLSimona Cotin

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat HuntingDhruv Majumdar

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)johnwilander

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor TalkJosh Holmes

ThreatcrowdChristopher Doman

Cascalog internal dsl_preso

Cascalog internal dsl_preso

Cascalog internal dsl_presoHadoop User Group

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-toolsHadoop User Group

More Related Content

Similar to Pig at Linkedin

Economies of Scaling Software

Economies of Scaling Software

Economies of Scaling SoftwareJoshua Long

Solving performance issues in Django ORM

Solving performance issues in Django ORM

Solving performance issues in Django ORMSian Lerk Lau

Brandfolder - JSON + Postgres

Brandfolder - JSON + Postgres

Brandfolder - JSON + PostgresBrandfolder

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2Chris Gates

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services

Building Awesome API with Spring

Building Awesome API with Spring

Building Awesome API with SpringVladimir Tsukur

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web Outliers Collective

Tactical Information Gathering

Tactical Information Gathering

Tactical Information GatheringChristian Martorella

Threat Modeling Using STRIDE

Threat Modeling Using STRIDE

Threat Modeling Using STRIDEGirindro Pringgo Digdo

Why Visibility into Your Stack Matters

Why Visibility into Your Stack Matters

Why Visibility into Your Stack MattersAmazon Web Services

Scalable data structures for data science

Scalable data structures for data science

Scalable data structures for data scienceTuri, Inc.

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...apidays

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...Amazon Web Services

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data storeAlberto Diaz Martin

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017reconvillage

Intro GraphQLSimona Cotin

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat HuntingDhruv Majumdar

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)johnwilander

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor TalkJosh Holmes

ThreatcrowdChristopher Doman

Similar to Pig at Linkedin (20)

Economies of Scaling Software

Economies of Scaling Software

Economies of Scaling Software

Solving performance issues in Django ORM

Solving performance issues in Django ORM

Solving performance issues in Django ORM

Brandfolder - JSON + Postgres

Brandfolder - JSON + Postgres

Brandfolder - JSON + Postgres

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

hackcon2013-Dirty Little Secrets They Didn't Teach You In Pentesting Class v2

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...

Building Awesome API with Spring

Building Awesome API with Spring

Building Awesome API with Spring

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web

D3.js: Data Visualization for the Web

Tactical Information Gathering

Tactical Information Gathering

Tactical Information Gathering

Threat Modeling Using STRIDE

Threat Modeling Using STRIDE

Threat Modeling Using STRIDE

Why Visibility into Your Stack Matters

Why Visibility into Your Stack Matters

Why Visibility into Your Stack Matters

Scalable data structures for data science

Scalable data structures for data science

Scalable data structures for data science

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

apidays LIVE Australia 2021 - API Horror Stories from an Unnamed Coworking Co...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

(BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale...

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data store

GAB 2019 - Graph as a data store

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017

Empowering red and blue teams with osint c0c0n 2017

Intro GraphQL

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat Hunting

Bsides 2019 - Intelligent Threat Hunting

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)

Application Security for Rich Internet Applicationss (Jfokus 2012)

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor Talk

BrazilJS Perf Doctor Talk

Threatcrowd

More from Hadoop User Group

Cascalog internal dsl_preso

Cascalog internal dsl_preso

Cascalog internal dsl_presoHadoop User Group

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-toolsHadoop User Group

Building a Scalable Web Crawler with Hadoop

Building a Scalable Web Crawler with Hadoop

Building a Scalable Web Crawler with HadoopHadoop User Group

Hdfs high availability

Hdfs high availability

Hdfs high availabilityHadoop User Group

HUG August 2010: Best practices

HUG August 2010: Best practices

HUG August 2010: Best practicesHadoop User Group

2 hadoop@e bay-hug-2010-07-21

2 hadoop@e bay-hug-2010-07-21

2 hadoop@e bay-hug-2010-07-21Hadoop User Group

1 content optimization-hug-2010-07-21

1 content optimization-hug-2010-07-21

1 content optimization-hug-2010-07-21Hadoop User Group

3 avro hug-2010-07-21

3 avro hug-2010-07-21

3 avro hug-2010-07-21Hadoop User Group

1 hadoop security_in_details_hadoop_summit2010

1 hadoop security_in_details_hadoop_summit2010

1 hadoop security_in_details_hadoop_summit2010Hadoop User Group

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group

Hadoop, Hbase and Hive- Bay area Hadoop User Group

Hadoop, Hbase and Hive- Bay area Hadoop User Group

Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group

Yahoo! Mail antispam - Bay area Hadoop user group

Yahoo! Mail antispam - Bay area Hadoop user group

Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security PreviewHadoop User Group

Flightcaster Presentation Hadoop

Flightcaster Presentation Hadoop

Flightcaster Presentation HadoopHadoop User Group

Map Reduce Online

Map Reduce Online

Map Reduce OnlineHadoop User Group

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security PreviewHadoop User Group

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security PreviewHadoop User Group

More from Hadoop User Group (20)

Cascalog internal dsl_preso

Cascalog internal dsl_preso

Cascalog internal dsl_preso

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-tools

Karmasphere hadoop-productivity-tools

Building a Scalable Web Crawler with Hadoop

Building a Scalable Web Crawler with Hadoop

Building a Scalable Web Crawler with Hadoop

Hdfs high availability

Hdfs high availability

Hdfs high availability

HUG August 2010: Best practices

HUG August 2010: Best practices

HUG August 2010: Best practices

2 hadoop@e bay-hug-2010-07-21

2 hadoop@e bay-hug-2010-07-21

2 hadoop@e bay-hug-2010-07-21

1 content optimization-hug-2010-07-21

1 content optimization-hug-2010-07-21

1 content optimization-hug-2010-07-21

3 avro hug-2010-07-21

3 avro hug-2010-07-21

3 avro hug-2010-07-21

1 hadoop security_in_details_hadoop_summit2010

1 hadoop security_in_details_hadoop_summit2010

1 hadoop security_in_details_hadoop_summit2010

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce

Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce

Hadoop, Hbase and Hive- Bay area Hadoop User Group

Hadoop, Hbase and Hive- Bay area Hadoop User Group

Hadoop, Hbase and Hive- Bay area Hadoop User Group

Yahoo! Mail antispam - Bay area Hadoop user group

Yahoo! Mail antispam - Bay area Hadoop user group

Yahoo! Mail antispam - Bay area Hadoop user group

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security Preview

Flightcaster Presentation Hadoop

Flightcaster Presentation Hadoop

Flightcaster Presentation Hadoop

Map Reduce Online

Map Reduce Online

Map Reduce Online

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security Preview

Hadoop Security Preview

Recently uploaded

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The BasicsTechSoup

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection

Making communications land - Are they received and understood as intended? we...

Making communications land - Are they received and understood as intended? we...

Making communications land - Are they received and understood as intended? we...Association for Project Management

Unit-V; Pricing (Pharma Marketing Management).pptx

Unit-V; Pricing (Pharma Marketing Management).pptx

Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417

Single or Multiple melodic lines structure

Single or Multiple melodic lines structure

Single or Multiple melodic lines structuredhanjurrannsibayan2

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma

How to Manage Global Discount in Odoo 17 POS

How to Manage Global Discount in Odoo 17 POS

How to Manage Global Discount in Odoo 17 POSCeline George

Spellings Wk 3 English CAPS CARES Please Practise

Spellings Wk 3 English CAPS CARES Please Practise

Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella

Towards a code of practice for AI in AT.pptx

Towards a code of practice for AI in AT.pptx

Towards a code of practice for AI in AT.pptxJisc

Google Gemini An AI Revolution in Education.pptx

Google Gemini An AI Revolution in Education.pptx

Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand

The basics of sentences session 3pptx.pptx

The basics of sentences session 3pptx.pptx

The basics of sentences session 3pptx.pptxheathfieldcps1

Food safety_Challenges food safety laboratories_.pdf

Food safety_Challenges food safety laboratories_.pdf

Food safety_Challenges food safety laboratories_.pdfSherif Taha

Understanding Accommodations and Modifications

Understanding Accommodations and Modifications

Understanding Accommodations and ModificationsMJDuyan

Holdier Curriculum Vitae (April 2024).pdf

Holdier Curriculum Vitae (April 2024).pdf

Holdier Curriculum Vitae (April 2024).pdfagholdier

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi

How to Give a Domain for a Field in Odoo 17

How to Give a Domain for a Field in Odoo 17

How to Give a Domain for a Field in Odoo 17Celine George

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Making communications land - Are they received and understood as intended? we...

Making communications land - Are they received and understood as intended? we...

Making communications land - Are they received and understood as intended? we...

Unit-V; Pricing (Pharma Marketing Management).pptx

Unit-V; Pricing (Pharma Marketing Management).pptx

Unit-V; Pricing (Pharma Marketing Management).pptx

Single or Multiple melodic lines structure

Single or Multiple melodic lines structure

Single or Multiple melodic lines structure

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

How to Manage Global Discount in Odoo 17 POS

How to Manage Global Discount in Odoo 17 POS

How to Manage Global Discount in Odoo 17 POS

Spellings Wk 3 English CAPS CARES Please Practise

Spellings Wk 3 English CAPS CARES Please Practise

Spellings Wk 3 English CAPS CARES Please Practise

Towards a code of practice for AI in AT.pptx

Towards a code of practice for AI in AT.pptx

Towards a code of practice for AI in AT.pptx

Google Gemini An AI Revolution in Education.pptx

Google Gemini An AI Revolution in Education.pptx

Google Gemini An AI Revolution in Education.pptx

The basics of sentences session 3pptx.pptx

The basics of sentences session 3pptx.pptx

The basics of sentences session 3pptx.pptx

Food safety_Challenges food safety laboratories_.pdf

Food safety_Challenges food safety laboratories_.pdf

Food safety_Challenges food safety laboratories_.pdf

Understanding Accommodations and Modifications

Understanding Accommodations and Modifications

Understanding Accommodations and Modifications

Holdier Curriculum Vitae (April 2024).pdf

Holdier Curriculum Vitae (April 2024).pdf

Holdier Curriculum Vitae (April 2024).pdf

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

How to Give a Domain for a Field in Odoo 17

How to Give a Domain for a Field in Odoo 17

How to Give a Domain for a Field in Odoo 17

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Pig at Linkedin

1. Pig at Linkedin Chris Riccomini 9/29/10

4. LinkedIn Analytics

5.

6. Pig at LinkedIn

8. Production Quality

10. Serialization

11. VoldemortStorage ~ Avro

12. views = LOAD '/data/awesome' USING VoldemortStorage();

13. Voldemort ♥ Pig

14. Partitioning

16. Last N days?

17. views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1’)

18. Some-file-YYYY-MM-DD

19. member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage()

22. type=pig pig.script=myscript.pig

23.

24.

26. Future at LinkedIn

29. Fix Data Types

31. Cross Platform

32. Questions? • criccomini@linkedin.com • http://www.riccomini.name • http://www.sna-projects.com • http://www.project-voldemort.com • @criccomini • LinkedIn is Hiring! Email me!

Editor's Notes

Chris Riccomini Senior Data Scientist at LinkedIn Involved in People you may know, Who’s viewed my profile, Avatara, and Distributed Computing at LinkedIn Worked in PayPal’s anti-fraud team as a data visualization engineer
talking about linkedin’s analytic environment, motivation for pig at linkedin, how we integrated it, and pig in the future
aster, hadoop, voldemort, azkaban, pig
aster, hadoop, voldemort, azkaban, pig
40% of jobs run are pig # of production products that use pig pymk, ads, profile stats, jobs for you, talent match, groups you might like, browse maps, experimentation platform
in early 2009 we were working on converting PYMK from Aster to Hadoop everything was java based tired of writing joins, filters, etc (glue) built and deployed pig on laptop while at a conference wrote serializer in a few days significantly sped up delivery time for PYMK
motivation was not ad-hoc/sql/business analytics. motivation was product analytics, and PRODUCTION products. stability was key. reproducability was key. simplicity/understandability was key. (both the scripts and the system itself) &quot;if it runs now, it will always run&quot;
as streaming became more popular, pig is still used as glue, but complex jobs are now just python instead of java.
we use &quot;voldemort&quot; serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
we use &quot;voldemort&quot; serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
we use &quot;voldemort&quot; serialization (binary json) .. basically the same as avro not much csv (pigstorage) used some pain was involved in writing/updating the serializer (0.3 interface was insufficient)
we use pig to read from and write to voldemort all writes are currently done with read-only stores reads are done using roshan&apos;s voldemort loader func can also use roshan&apos;s voldemort store func to write directly to read-write stores
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
one problem that we had with pig was how to handle folders partitioned by date (yyyy/mm/dd) people were querying the root directory, and filtering out only the days they needed other people were writing custom jobs that would add only the sub folders that they were interested in as input paths our solution was to add a filter parameter to voldemort views = LOAD &apos;/data/etl/tracking/extracted/profile-view&apos; USING VoldemortStorage(&apos;date.range&apos;, &apos;num.days=90;days.ago=1&apos;); member_position = LOAD &apos;/data/etl/replicated/member/member_position/#LATEST&apos; USING VoldemortStorage();
we use azkaban (like a very simple version of oozie) azkaban contains a &quot;pig&quot; job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
we use azkaban (like a very simple version of oozie) azkaban contains a &quot;pig&quot; job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
we use azkaban (like a very simple version of oozie) azkaban contains a &quot;pig&quot; job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
we use azkaban (like a very simple version of oozie) azkaban contains a &quot;pig&quot; job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
we use azkaban (like a very simple version of oozie) azkaban contains a &quot;pig&quot; job. specify type=pig, pig.script=path/to/pig/script.pig supports parameter passing between azkaban properties and pig parameters azkaban also provides resource locking and dependencies and scheduling makes it very easy to write a production pig job write pig file write job file throw pig and job file into a zip upload the zip
just starting to use pig for ad hoc analysis mostly engineers using it now some business analytics are starting to use it also looking at hive
pig 0.8, avro, hive, UDFs
dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json
dates the promise of pig as a generic map reduce language (not just hadoop) fix the data structures more json