Exploring content recommendation

•

1 gostou•2,185 visualizações

- Mahout is an Apache project that builds scalable machine learning libraries for large datasets. It includes algorithms for classification, clustering, recommendation, and other tasks. - The Mahout recommender system uses collaborative filtering to recommend items to users based on their preferences and the preferences of similar users. It has item-based and user-based approaches. - An example is described of using the Mahout recommender on a movie recommendation problem using the Netflix dataset, run on Hadoop. It produced recommendations for users in 16 minutes on the described hardware configuration.

Tecnologia Educação

Exploring content
recommendation
Felipe Besson
@fmbesson
March, 2013

“A lot of times, people don't know what they
want until you show it to them.”
Steve Jobs
“We don't make money when we sell things;
we make money when we help customers
make purchase decisions.”
Jeff Bezos, Amazon
Why recommendation is important ?

An Apache project to build scalable machine
learning libraries
●
Focused on large data sets
●
Adaption of standard machine learning algorithms
●
Run on Apache Hadoop (map/reduce paradigm)
… or on a non Hadoop node

Who is using Mahout ?
Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html

Supported core algorithms
●
Classification
●
Clustering
●
Recommendation
●
Pattern Mining
●
Regression
●
Dimension Reduction
●
Evolutionary Algorithms
●
Vector Similarity

Mahout Recommender
Collaborative filtering
People often get the best recommendation from someone
with similar taste
●
People tend to like things that are similar to other things
they like
●
There are patterns in people likes and dislikes
John Bob
movie1 movie1
movie2
movie2
movie42
movie4
movie5
Will Bob like movie4? and
movie5?

Mahout Recommender
Available recommenders
●
Item based
●
User based
Execution modes
●
Taste: online but not distributed
●
Hadoop: offline (batch) but distributed
Parameters
●
Many coefficients to calculate user and item
similarity and neighborhood
●
Data model abstractions

Mahout Recommender (Hadoop)
Input
user_id
item_id
preference_value (optional)
1, 23, 0.9
1, 15, 0.5
1, 89, 0.1
2, 11, 0.3
2, 15, 0.2
9, 10, 0.5
9, 99, 0.9
9, 11, 0.1
8, 11, 0.5
...
Output
user_id
[recommended_item, score]
1: [10, 0.93; 11, 0.84; … ]
2: [23, 0.72; 17, 0.60; … ]
8: [121, 0.98; 23, 0.78; … ]
17: [12, 0.89; 32, 0.56; … ]
42: [129, 0.92; 98, 0.45; … ]
...

1st try!
Movie recommendation
Netflix base (http://www.netflixprize.com/)
●
# of user tastes: 2.817.131
●
# of movies: 17.770
●
# of users: 472891
Environment and performance
●
Hadoop pseudo-distributed
●
Computer
●
Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
●
6Gb RAM
●
Total time: ~ 16 minutes

How to run ?
1. Copy the input file to HDFS (Hadoop distributed
file system)
hadoop fs -put qualifying.txt /netflix/input/data.txt
2. Run the recommender
hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/netflix/input/data.txt
-Dmapred.output.dir=/netflix/output
--numRecommendations 10
--similarityClassname SIMILARITY_LOGLIKELIHOOD

Results
Recommender analyzer
https://github.com/besson/recommender_analyzer
http://rec-analyzer.herokuapp.com/

References
Sean Owen, Robin Anil, Ted Dunning, and Ellen
Friedman. Mahout in Action, Manning publications,
2011.

Mais conteúdo relacionado

Semelhante a Exploring content recommendation

Evc2014Paul Johnston

Apache MahoutAjit Koti

Azure Boot Camp 2017 getting started with azure machine learningSetu Chokshi

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.

Forget the Fairy Dust - How to Create Content That (Actually) WorksJoel Klettke

No Nonsense Content Marketing - MNsearch 2017 - SlideshareJohn Doherty

Machine Learning & Apache MahoutDomingo Suarez Torres

Bootstrapping CoursepadKevin Jun Zeng Chan (@kevincjz)

SDEC2011 Essentials of MahoutKorea Sdec

Q2 HUG - Content in AI.pdfAlexisLyga

Yahoo Help Content Strategy - Chris ToddInformation Development World

Be A Great Product Leader (Amplify, Oct 2019)Adam Nash

Impersonal Recommendation system on top of HadoopKostiantyn Kudriavtsev

Building a Recommendation Engine - A Balancing actElad Rosenheim

How to create searchable contentBeth Browning

Inbound Marketing Conference 2016 SummaryJimmy Smith

Jumpstart - 02/01/2015Tom Hartman

Be a great product leader by Adam Nash, VP Product, DropboxAmplitude

Download Materialsbutest

Better Search Engine TestingOpenSource Connections

Semelhante a Exploring content recommendation (20)

Evc2014

Apache Mahout

Azure Boot Camp 2017 getting started with azure machine learning

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...

Forget the Fairy Dust - How to Create Content That (Actually) Works

No Nonsense Content Marketing - MNsearch 2017 - Slideshare

Machine Learning & Apache Mahout

Bootstrapping Coursepad

SDEC2011 Essentials of Mahout

Q2 HUG - Content in AI.pdf

Yahoo Help Content Strategy - Chris Todd

Be A Great Product Leader (Amplify, Oct 2019)

Impersonal Recommendation system on top of Hadoop

Building a Recommendation Engine - A Balancing act

How to create searchable content

Inbound Marketing Conference 2016 Summary

Jumpstart - 02/01/2015

Be a great product leader by Adam Nash, VP Product, Dropbox

Download Materials

Better Search Engine Testing

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Tech Trends Report 2024 Future Today Institute.pdfhans926745

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Scaling API-first – The story of a global engineering organizationRadu Cotescu

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

GenAI Risks & Security Meetup 01052024.pdflior mazor

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Exploring content recommendation

1. Exploring content recommendation Felipe Besson @fmbesson March, 2013

2. “A lot of times, people don't know what they want until you show it to them.” Steve Jobs “We don't make money when we sell things; we make money when we help customers make purchase decisions.” Jeff Bezos, Amazon Why recommendation is important ?

3. An Apache project to build scalable machine learning libraries ● Focused on large data sets ● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm) … or on a non Hadoop node

4. Who is using Mahout ? Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html

5. Supported core algorithms ● Classification ● Clustering ● Recommendation ● Pattern Mining ● Regression ● Dimension Reduction ● Evolutionary Algorithms ● Vector Similarity

6. Mahout Recommender Collaborative filtering People often get the best recommendation from someone with similar taste ● People tend to like things that are similar to other things they like ● There are patterns in people likes and dislikes John Bob movie1 movie1 movie2 movie2 movie42 movie4 movie5 Will Bob like movie4? and movie5?

7. Mahout Recommender Available recommenders ● Item based ● User based Execution modes ● Taste: online but not distributed ● Hadoop: offline (batch) but distributed Parameters ● Many coefficients to calculate user and item similarity and neighborhood ● Data model abstractions

8. Mahout Recommender (Hadoop) Input user_id item_id preference_value (optional) 1, 23, 0.9 1, 15, 0.5 1, 89, 0.1 2, 11, 0.3 2, 15, 0.2 9, 10, 0.5 9, 99, 0.9 9, 11, 0.1 8, 11, 0.5 ... Output user_id [recommended_item, score] 1: [10, 0.93; 11, 0.84; … ] 2: [23, 0.72; 17, 0.60; … ] 8: [121, 0.98; 23, 0.78; … ] 17: [12, 0.89; 32, 0.56; … ] 42: [129, 0.92; 98, 0.45; … ] ...

9. 1st try! Movie recommendation Netflix base (http://www.netflixprize.com/) ● # of user tastes: 2.817.131 ● # of movies: 17.770 ● # of users: 472891 Environment and performance ● Hadoop pseudo-distributed ● Computer ● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 ● 6Gb RAM ● Total time: ~ 16 minutes

10. How to run ? 1. Copy the input file to HDFS (Hadoop distributed file system) hadoop fs -put qualifying.txt /netflix/input/data.txt 2. Run the recommender hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD

11. Results Recommender analyzer https://github.com/besson/recommender_analyzer http://rec-analyzer.herokuapp.com/

12. Results

13. References Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.

14. Thanks Felipe Besson @fmbesson

Exploring content recommendation

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Exploring content recommendation

Semelhante a Exploring content recommendation (20)

Último

Último (20)

Exploring content recommendation