SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Hadoop Applications at Facebook


Jeff Hammerbacher
Manager, Data
May 28 - 29, 2008
Initial Hadoop Deployment
▪   Tested in mid-2006: not great performance, small community
▪   Already had Cheetah and another Hadoop-like project underway
▪   Strong resistance to Java
▪   Early adopters: Yahoo!, Powerset, Quantcast, Last.fm
▪   First serious cluster: spring 2007
    ▪   Pulled sixty web server boxes and put 3 x 500 GB SATA disks in the back
    ▪   Loaded two separate log files: clickstream and activity logs
    ▪   Clickstream was nearly 600 GB per day, activity logs around 200 GB
    ▪   Lots of difficulties just getting data into the system
    ▪   All sorts of fun learning to operate the file system
Initial Hadoop Applications
Hadoop Streaming
▪   Almost all applications at Facebook use Hadoop Streaming
▪   Mapper and Reducer take inputs from a pipe and write outputs to a pipe
▪   Facebook users write in Python, PHP, C++ (though Pipes would be better)
▪   Allows for library reuse, faster development
▪   Eats way too much CPU
▪   More info: http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
Initial Hadoop Applications
Unstructured text analysis
▪   Intern asked to understand brand sentiment and influence
▪   First began by building an online language classifier for wall posts
▪   Ported application to Hadoop for offline processing
▪   Many tools for supporting his project had to be built
    ▪   Understanding serialization format of wall post logs
    ▪   Common data operations: project, filter, join, group by
    ▪   Developed using Hadoop streaming for rapid prototyping in Python
    ▪   Scheduling regular processing and recovering from failures
    ▪   Making it easy to regularly load new data
Lexicon
Initial Hadoop Applications
Lexicon: Future Directions
▪   Further segmentation and visualization of term intensities
    ▪   Age
    ▪   Gender
    ▪   Geography
▪   TF-IDF
▪   Topic modeling
▪   Sentiment analysis
▪   Augment with data sources from around the internet
Initial Hadoop Applications
Ensemble Learning
▪   Build a lot of Decision Trees and average them
    ▪   Random Forests are a combination of tree predictors such that each
        tree depends on the values of a random vector sampled independently
        and with the same distribution for all trees in the forest
    ▪   Can be used for regression or classification
    ▪   See “Random Forests” by Leo Breiman
More Hadoop Applications
Insights
▪   Monitor performance of your Facebook Ad, Page, Application
▪   Regular aggregation of high volumes of log file data
▪   First hourly pipelines
▪   Publish data back to a MySQL tier
▪   System currently only running partially on Hadoop
Insights
More Hadoop Applications
Platform Application Reputation Scoring
▪   Users complaining about being spammed by Platform applications
▪   Now, every Platform Application has a set of quotas
    ▪   Notifications
    ▪   News Feed story insertion
    ▪   Invitations
    ▪   Emails
▪   Quotas determined by calculating a “reputation score” for the
    application
Platform Application Reputation Scoring
More Hadoop Applications
Recommendation Engines and Affinity Scores
▪   People You May Know (PYMK)
▪   Other application areas
    ▪   Pages
    ▪   Applications
    ▪   News Feed
    ▪   Search
    ▪   Ads
    ▪   Chat
More Hadoop Applications
Miscellaneous
▪   Experimentation Platform back end
    ▪   A/B Testing
    ▪   Champion/Challenger Testing
▪   Lots of internal analyses
    ▪   Export smaller data sets to R
▪   Ad targeting optimization
▪   Search index building
▪   Load testing for new storage systems
▪   Language prediction for translation targeting
(c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Mais conteúdo relacionado

Mais procurados

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and developmentconline training
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 

Mais procurados (20)

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
20080611accel
20080611accel20080611accel
20080611accel
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 

Destaque (12)

Animal Discovery
Animal DiscoveryAnimal Discovery
Animal Discovery
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100418sos
20100418sos20100418sos
20100418sos
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
Kurzpräsentation Stendal
Kurzpräsentation StendalKurzpräsentation Stendal
Kurzpräsentation Stendal
 
Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02
 
Lettya nologia impirica (SER)
Lettya nologia impirica (SER)Lettya nologia impirica (SER)
Lettya nologia impirica (SER)
 
El tranvia comentarios
El tranvia comentariosEl tranvia comentarios
El tranvia comentarios
 
Etica
EticaEtica
Etica
 
Quatro meses
Quatro mesesQuatro meses
Quatro meses
 
Berthold
BertholdBerthold
Berthold
 

Semelhante a 20080528dublinpt2

Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...guest5b1607
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentTraining Institute
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big DataSri Ambati
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsPablo Godel
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to SchoolAdam Doyle
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 

Semelhante a 20080528dublinpt2 (20)

20080529dublinpt1
20080529dublinpt120080529dublinpt1
20080529dublinpt1
 
20081022cca
20081022cca20081022cca
20081022cca
 
Qcon
QconQcon
Qcon
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 

Mais de Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hdfs Dhruba
Hdfs DhrubaHdfs Dhruba
Hdfs Dhruba
 

Último

Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseribangash
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 

Último (20)

Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 

20080528dublinpt2

  • 1.
  • 2. Hadoop Applications at Facebook Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  • 3. Initial Hadoop Deployment ▪ Tested in mid-2006: not great performance, small community ▪ Already had Cheetah and another Hadoop-like project underway ▪ Strong resistance to Java ▪ Early adopters: Yahoo!, Powerset, Quantcast, Last.fm ▪ First serious cluster: spring 2007 ▪ Pulled sixty web server boxes and put 3 x 500 GB SATA disks in the back ▪ Loaded two separate log files: clickstream and activity logs ▪ Clickstream was nearly 600 GB per day, activity logs around 200 GB ▪ Lots of difficulties just getting data into the system ▪ All sorts of fun learning to operate the file system
  • 4. Initial Hadoop Applications Hadoop Streaming ▪ Almost all applications at Facebook use Hadoop Streaming ▪ Mapper and Reducer take inputs from a pipe and write outputs to a pipe ▪ Facebook users write in Python, PHP, C++ (though Pipes would be better) ▪ Allows for library reuse, faster development ▪ Eats way too much CPU ▪ More info: http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
  • 5. Initial Hadoop Applications Unstructured text analysis ▪ Intern asked to understand brand sentiment and influence ▪ First began by building an online language classifier for wall posts ▪ Ported application to Hadoop for offline processing ▪ Many tools for supporting his project had to be built ▪ Understanding serialization format of wall post logs ▪ Common data operations: project, filter, join, group by ▪ Developed using Hadoop streaming for rapid prototyping in Python ▪ Scheduling regular processing and recovering from failures ▪ Making it easy to regularly load new data
  • 7. Initial Hadoop Applications Lexicon: Future Directions ▪ Further segmentation and visualization of term intensities ▪ Age ▪ Gender ▪ Geography ▪ TF-IDF ▪ Topic modeling ▪ Sentiment analysis ▪ Augment with data sources from around the internet
  • 8. Initial Hadoop Applications Ensemble Learning ▪ Build a lot of Decision Trees and average them ▪ Random Forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest ▪ Can be used for regression or classification ▪ See “Random Forests” by Leo Breiman
  • 9. More Hadoop Applications Insights ▪ Monitor performance of your Facebook Ad, Page, Application ▪ Regular aggregation of high volumes of log file data ▪ First hourly pipelines ▪ Publish data back to a MySQL tier ▪ System currently only running partially on Hadoop
  • 11. More Hadoop Applications Platform Application Reputation Scoring ▪ Users complaining about being spammed by Platform applications ▪ Now, every Platform Application has a set of quotas ▪ Notifications ▪ News Feed story insertion ▪ Invitations ▪ Emails ▪ Quotas determined by calculating a “reputation score” for the application
  • 13. More Hadoop Applications Recommendation Engines and Affinity Scores ▪ People You May Know (PYMK) ▪ Other application areas ▪ Pages ▪ Applications ▪ News Feed ▪ Search ▪ Ads ▪ Chat
  • 14. More Hadoop Applications Miscellaneous ▪ Experimentation Platform back end ▪ A/B Testing ▪ Champion/Challenger Testing ▪ Lots of internal analyses ▪ Export smaller data sets to R ▪ Ad targeting optimization ▪ Search index building ▪ Load testing for new storage systems ▪ Language prediction for translation targeting
  • 15. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0