SlideShare uma empresa Scribd logo
1 de 28
Mahout in Action
          Part 1


    Yasmine M. Gaber
      28 February 2013
Agenda

    Meet Apache Mahout

    Part 1: Recommendation

    Part 2: Clustering

    Part 3: Classification
Meet Apache Mahout

  It is an open source machine learning library
from Apache

    It is scalable

    It is a Java library

 It can be used with Hadoop to deal with large
scale data.
Famous Engines

  Recommender engines:

 Amazon.com

 Netflix

 Dating sites like Líbímseti

 Social networking sites like Facebook

  Clustering engines:

 Google News

 Search engines like Clusty

  Classification engines:

 Spam emails

 Google’s Picasa

 Optical character recognition software

 Apple’s Genius feature in iTunes
Recommendations
Recommender Input

    A preference consists of a user ID and an item
    ID, user’s preference for the item

    It is .csv file
Create Recommender
Recommender Evaluation

    Average difference vs Root-mean-square
Mahout RecommenderEvaluator
Precision and Recall
RecommenderIRStatsEvaluator
Representing Recommender Data

    Preference object
    −   new GenericPreference(123, 456, 3.0f)

    Preference Array
Representing Recommender Data

    Preference Array





    FastByIDMap and FastIDSet
In-memory DataModels

    GenericDataModel


    File-based data


    Refreshable components


    Database-based data
Coping without preference values
Coping without preference values
User-based Recommender

    The algorithm

for every item i that u has no preference for yet
 for every other user v that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running
    average
return the top items, ranked by weighted average
Recommender Components

    Data model, implemented via DataModel


    User-user similarity metric, implemented via
    UserSimilarity


    User neighborhood definition, implemented via
    UserNeighborhood


    Recommender engine, implemented via a
    Recommender (here,
GenericUserBasedRecommender
User Neighborhoods

    Fixed-size neighborhoods





    Threshold-based neighborhood
similarity metrics

    Pearson correlation–based similarity
    −   It is a number between –1 and 1 that measures
        the tendency of two series of numbers, paired up
        one-to-one, to move together
    −   Problems:
        
            It doesn’t take into account the number of items in
            which two users’ preferences overlap, which is probably
            a weakness in the context of recommender engines.
        
            If two users overlap on only one item, no correlation can
            be computed because of how the computation is
            defined
similarity metrics

    Euclidean distance similarity
    −   1 / (1+euclidean distance)

    Cosine measure similarity
    −   between –1 and 1

    Tanimoto coefficient similarity
    −   The ratio of the size of the
    intersection to the size of
    the union of their preferred items
Item-based recommendation

    The algorithm

for every item i that u has no preference for yet
 for every item j that u has a preference for
    compute a similarity s between i and j
    add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
GenericItemBasedRecommender
Slope-one recommender

    The algorithm

for every item i the user u expresses no preference for
 for every item j that user u expresses a preference for
    find the average preference difference between j and i
    add this diff to u's preference value for j
    add this to a running average
return the top items, ranked by these averages
Taking Recommender to Production
User-based recommenders
Thank You



               Contact at:
Email: Yasmine.Gaber@espace.com.eg
Twitter: Twitter.com/yasmine_mohamed

Mais conteúdo relacionado

Mais procurados

Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...
Γιώργος Αλεξανδρίδης
 
intership summary
intership summaryintership summary
intership summary
Junting Ma
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
Thomas Hess
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Lei Guo
 

Mais procurados (20)

Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Project presentation
Project presentationProject presentation
Project presentation
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Dm
DmDm
Dm
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
intership summary
intership summaryintership summary
intership summary
 
Movies Recommendation System
Movies Recommendation SystemMovies Recommendation System
Movies Recommendation System
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
 
Towards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata QualityTowards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata Quality
 
Analyzing Adverse Drug Events Using Data Mining Approach
Analyzing Adverse Drug Events Using Data Mining ApproachAnalyzing Adverse Drug Events Using Data Mining Approach
Analyzing Adverse Drug Events Using Data Mining Approach
 
Recommender system
Recommender systemRecommender system
Recommender system
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Semelhante a Mahout part1

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+FilteringZaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar Ahmed Shaikh
 

Semelhante a Mahout part1 (20)

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
B1802021823
B1802021823B1802021823
B1802021823
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptx
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
 
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+FilteringZaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+Filtering
 
Investigation and application of Personalizing Recommender Systems based on A...
Investigation and application of Personalizing Recommender Systems based on A...Investigation and application of Personalizing Recommender Systems based on A...
Investigation and application of Personalizing Recommender Systems based on A...
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria RatingsA Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
 

Mais de Yasmine Gaber (8)

Capistrano
CapistranoCapistrano
Capistrano
 
Ionic
IonicIonic
Ionic
 
Dyna trace
Dyna traceDyna trace
Dyna trace
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Home Bowling
Home BowlingHome Bowling
Home Bowling
 
Oauth2.0
Oauth2.0Oauth2.0
Oauth2.0
 
Why_do i_hate_shopping
Why_do i_hate_shoppingWhy_do i_hate_shopping
Why_do i_hate_shopping
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Mahout part1

  • 1. Mahout in Action Part 1 Yasmine M. Gaber 28 February 2013
  • 2. Agenda  Meet Apache Mahout  Part 1: Recommendation  Part 2: Clustering  Part 3: Classification
  • 3. Meet Apache Mahout  It is an open source machine learning library from Apache  It is scalable  It is a Java library  It can be used with Hadoop to deal with large scale data.
  • 4. Famous Engines  Recommender engines:  Amazon.com  Netflix  Dating sites like Líbímseti  Social networking sites like Facebook  Clustering engines:  Google News  Search engines like Clusty  Classification engines:  Spam emails  Google’s Picasa  Optical character recognition software  Apple’s Genius feature in iTunes
  • 6. Recommender Input  A preference consists of a user ID and an item ID, user’s preference for the item  It is .csv file
  • 8. Recommender Evaluation  Average difference vs Root-mean-square
  • 12. Representing Recommender Data  Preference object − new GenericPreference(123, 456, 3.0f)  Preference Array
  • 13. Representing Recommender Data  Preference Array  FastByIDMap and FastIDSet
  • 14. In-memory DataModels  GenericDataModel  File-based data  Refreshable components  Database-based data
  • 17. User-based Recommender  The algorithm for every item i that u has no preference for yet for every other user v that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average return the top items, ranked by weighted average
  • 18. Recommender Components  Data model, implemented via DataModel  User-user similarity metric, implemented via UserSimilarity  User neighborhood definition, implemented via UserNeighborhood  Recommender engine, implemented via a Recommender (here,
  • 20. User Neighborhoods  Fixed-size neighborhoods  Threshold-based neighborhood
  • 21. similarity metrics  Pearson correlation–based similarity − It is a number between –1 and 1 that measures the tendency of two series of numbers, paired up one-to-one, to move together − Problems:  It doesn’t take into account the number of items in which two users’ preferences overlap, which is probably a weakness in the context of recommender engines.  If two users overlap on only one item, no correlation can be computed because of how the computation is defined
  • 22. similarity metrics  Euclidean distance similarity − 1 / (1+euclidean distance)  Cosine measure similarity − between –1 and 1  Tanimoto coefficient similarity − The ratio of the size of the intersection to the size of the union of their preferred items
  • 23. Item-based recommendation  The algorithm for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average
  • 25. Slope-one recommender  The algorithm for every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to u's preference value for j add this to a running average return the top items, ranked by these averages
  • 26. Taking Recommender to Production
  • 28. Thank You Contact at: Email: Yasmine.Gaber@espace.com.eg Twitter: Twitter.com/yasmine_mohamed