SlideShare a Scribd company logo
1 of 60
Mahout,[object Object],By: Ariel Kogan,[object Object]
Java Framework Team on IDI,[object Object],10 years of experience on IT,[object Object],6 years of experience on Java,[object Object],Masters in Informatics Engineering specializing on Artificial Intelligence,[object Object],Has a weird accent,[object Object],Who’s this guy?,[object Object],Aliyah,[object Object],http://www.flickr.com/photos/triphenawong/4752510292/,[object Object]
Machine Learning,[object Object],Mahout,[object Object],Recommender Engines,[object Object],Clustering,[object Object],Categorization,[object Object],Hadoop,[object Object],Agenda,[object Object]
Machine Learning,[object Object]
Machine Learning,[object Object],Whatchatalkin' 'bout, Willis?,[object Object]
Recommender Engines,[object Object],Clustering,[object Object],Classification,[object Object],Well known use cases for:,[object Object],Machine Learning,[object Object]
Machine Learning,[object Object],Recommender Engines: Amazon,[object Object]
Machine Learning,[object Object],Recommender Engines: Facebook,[object Object]
Machine Learning,[object Object],Clustering: Google News,[object Object]
Machine Learning,[object Object],Classification: Spam Detection,[object Object]
Machine Learning,[object Object],Classification: Picasa face recognition,[object Object]
Because it’s interesting,[object Object],Because it makes money,[object Object],Why learning “Machine Learning”?,[object Object],Machine Learning,[object Object]
Mahout,[object Object]
Open Source project by the Apache Software Foundation,[object Object],Goal: To build scalable machine learning libraries.,[object Object],Large data sets (Hadoop),[object Object],Commercially friendly Apache Software license,[object Object],Community,[object Object],What is it?,[object Object],Mahout,[object Object]
Mahout - [muh-hout] - (mə’haʊt),[object Object],A mahout is a person who keeps and drives an elephant. The name Mahout comes from the project's use of Apache Hadoop — which has a yellow elephant as its logo — for scalability and fault tolerance.,[object Object],What’s that name?,[object Object],Mahout,[object Object]
Mahout,[object Object],Mahout and its related projects,[object Object]
Mahout,[object Object],History,[object Object]
Mahout,[object Object],History,[object Object],Mahout is presented on AlphaCSP’s The Edge 2010,[object Object],Taste Collaborative Filtering has donated it's codebase to the Mahout project,[object Object],Release 0.1,[object Object],Release 0.2,[object Object],Release 0.3,[object Object],Release 0.4,[object Object],2010,[object Object],2008,[object Object],2009,[object Object],The Lucene Project Management Committee announces the creation of the Mahout subproject,[object Object],Mahout becomes an Apache top level project,[object Object]
Mahout,[object Object],Mailing lists activity,[object Object]
Weka (since 1999),[object Object],38 Java projects listed on mloss.org (Machine Learning Open Source Software),[object Object],Yet another Framework?,[object Object],Similar Products,[object Object],Mahout,[object Object]
Large amount of input data,[object Object],Techniques work better,[object Object],Nature of the deploying context,[object Object],Must produce results quickly,[object Object],The amount of input is so large that it is not feasible to process it all on one computer, even a powerful one,[object Object],Machine Learning Challenges,[object Object],Mahout,[object Object]
Mahout core algorithms are implemented on top of Apache Hadoop using the map/reduce paradigm.,[object Object],Scalability,[object Object],Mahout,[object Object]
Programming model introduced by Google in 2004,[object Object],Many real world tasks are expressible in this model (“Map-Reduce for Machine Learning on Multicore”, Stanford CS Department’s paper, 2006),[object Object],Provides automatic parallelization and distribution,[object Object],Runs on large clusters of compute nodes,[object Object],Highly scalable,[object Object],Hadoop is Apache’s open source implementation,[object Object],MapReduce,[object Object],Mahout,[object Object]
Mahout,[object Object]
Mahout,[object Object]
Recommender Engines,[object Object]
Approaches,[object Object],User based,[object Object],Item based,[object Object],Collaborative filtering vs Content-based recommendation,[object Object],Recommender Engines,[object Object]
Data model,[object Object],Users,[object Object],Items,[object Object],Preferences (ratings),[object Object],ItemSimilarity,[object Object],UserSimilarity,[object Object],UserNeighborhood,[object Object],Recommender,[object Object],What do we need?,[object Object],Recommender Engines,[object Object]
Recommender Engines,[object Object],T-bone,[object Object],Chocolate,[object Object],Lettuce,[object Object],Rump,[object Object],http://www.flickr.com/photos/martinimike/3770274175/,[object Object],http://www.flickr.com/photos/fotoosvanrobin/3182238046/,[object Object],http://www.flickr.com/photos/this_girl_daydreams/3190110968/,[object Object],http://www.flickr.com/photos/19998197@N00/3238445535/,[object Object]
Recommender Engines,[object Object],5,[object Object],-5,[object Object]
Recommender Engines,[object Object],Kuki,[object Object],The Vegan,[object Object],Gilad,[object Object],Ariel,[object Object]
Recommender Engines,[object Object],// We create a DataModel based on the information contained on food.csv,[object Object],DataModel model = newFileDataModel(new File(“food.csv"));,[object Object],// We use one of the several user similarity functions we have available,[object Object],UserSimilarity similarity = newPearsonCorrelationSimilarity(model);,[object Object],// Same thing with the UserNeighborhood definition,[object Object],UserNeighborhood neighborhood = newNearestNUserNeighborhood(hoodSize, similarity, model);,[object Object],// Finally we can build or recommender,[object Object],Recommender recommender = newGenericUserBasedRecommender(model, neighborhood, similarity);,[object Object],// And ask for recommendations for a specific user,[object Object],List<RecommendedItem> recommendations = recommender.recommend(userId, howMany);,[object Object],for (RecommendedItem recommendation : recommendations),[object Object],{,[object Object],System.out.println(recommendation);,[object Object],} ,[object Object],CachingUserSimilarity,[object Object],EuclideanDistanceSimilarity,[object Object],GenericUserSimilarity,[object Object],LogLikelihoodSimilarity,[object Object],PearsonCorrelationSimilarity,[object Object],SpearmanCorrelationSimilarity,[object Object],TanimotoCoefficientSimilarity,[object Object],UncenteredCosineSimilarity,[object Object]
Recommender Engines,[object Object],What would we recommend to Ariel?,[object Object],T-bone,[object Object],rating 4.0,[object Object],Recommendation,[object Object],for Ariel,[object Object]
Recommender Engines,[object Object],Kuki,[object Object],The Vegan,[object Object],Gilad,[object Object],Ariel,[object Object]
10 most popular,[object Object],Random selection,[object Object],What other customers are looking at right now,[object Object],Bestsellers,[object Object],Best prices,[object Object],Nothing at all,[object Object],No initial information,[object Object],Recommender Engines,[object Object]
Clustering,[object Object]
Clustering is about drawing lines,[object Object],Clustering,[object Object]
Clustering,[object Object],Clustering Steps,[object Object]
Possible weather conditions recognition,[object Object],Clustering,[object Object],CLUSTERING,[object Object],temperature,[object Object],wind direction,[object Object],humidity,[object Object],wind speed,[object Object],http://www.icons-land.com,[object Object]
Clustering,[object Object],Vector representation,[object Object],25 / 50 = 0.5,[object Object]
Clustering,[object Object],Samples Generation,[object Object],300 samples,[object Object],Mean: [0.0, 2.0],[object Object],SD: 0.1,[object Object],500 samples,[object Object],Mean: [1.0, 1.0],[object Object],SD: 3.0,[object Object],300 samples,[object Object],Mean: [1.0, 0.0],[object Object],SD: 0.5,[object Object]
Clustering,[object Object],Iterations with Fuzzy K-Means,[object Object]
Clustering,[object Object],Clustering Discovery,[object Object],Original data generation,[object Object],Discovered clusters,[object Object]
Clustering,[object Object],CosineDistanceMeasure,[object Object],EuclideanDistanceMeasure,[object Object],MahalanobisDistanceMeasure,[object Object],ManhattanDistanceMeasure,[object Object],SquaredEuclideanDistanceMeasure,[object Object],TanimotoDistanceMeasure,[object Object],WeightedDistanceMeasure,[object Object],WeightedEuclideanDistanceMeasure,[object Object],WeightedManhattanDistanceMeasure,[object Object]
Categorization,[object Object]
Categorization,[object Object],Categorization Steps,[object Object]
Our example: What do we want to do?,[object Object],Categorization,[object Object],Java,[object Object],Classifier,[object Object],Document,[object Object],Sport,[object Object]
Categorization,[object Object],Documents Preparation,[object Object],Label <tab> evidence1 <space> evidence2,[object Object],BayesFileFormatter,[object Object],(Lucene’s Analyzers),[object Object],Labeled Documents,[object Object],Training,[object Object],Test,[object Object]
Categorization,[object Object],Using the classifier,[object Object]
Categorization,[object Object],Categorization testing, the confusion matrix,[object Object],Summary,[object Object],-------------------------------------------------------,[object Object],Correctly Classified Instances          :     93    93%,[object Object],Incorrectly Classified Instances        :      7     7%,[object Object],Total Classified Instances              :    100,[object Object],=======================================================,[object Object],Confusion Matrix,[object Object],-------------------------------------------------------,[object Object],java   sport  <--Classified as,[object Object],56     3      |  59   java,[object Object],4      37     |  41   sport,[object Object]
Take me to the cluster,[object Object]
The size of our dataset can’t be handled by a single machine. Scale-up vs scale-out.,[object Object],We need the results on nearly real time.,[object Object],Why do we need distributed computing?,[object Object],Hadoop,[object Object]
Hadoop,[object Object],Data,[object Object],Results,[object Object],Hadoop Compute Cluster,[object Object]
We need to:,[object Object],Configure the job,[object Object],Submit it,[object Object],Control its execution,[object Object],Query its state,[object Object],We want to:,[object Object],Just run our machine learning algorithm!,[object Object],Hadoop Jobs,[object Object],Hadoop,[object Object]
Mahout provides an out of the box AbstractJob class and several Jobs and Drivers implementations in order to run Machine Learning algorithms on the cluster without any hassle.,[object Object],Mahout’s AbstractJob and Drivers,[object Object],Hadoop,[object Object]
Our code, including a Job,[object Object],Mahout jars,[object Object],Hadoop jars,[object Object],Everyone’s dependencies jars,[object Object],Resources,[object Object],The dataset,[object Object],What we need,[object Object],Hadoop,[object Object]
Hadoop,[object Object],Packaging a Job – The Maven solution,[object Object],pom.xml,[object Object]
Hadoop,[object Object],Job feeding,[object Object],Job,[object Object],Dataset,[object Object],Hadoop Compute Cluster,[object Object]
Hadoop,[object Object],We take the project’s dependencies,[object Object]
Hadoop,[object Object],Using an Ant task, we pack everything together,[object Object]

More Related Content

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 

Mahout's presentation at AlphaCSP's The Edge 2010

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.

Editor's Notes

  1. למידה חישובית
  2. Make it clear that I don’t want the crowd to read the table, it’s only to generate an overwhelming sensation
  3. מערכות המלצה
  4. Strictly speaking, these are examples of “collaborative filtering” -- producing recommendations based on, and only based on, knowledge of users’ relationships to items. These techniques require no knowledge of the properties of the items themselves. This is, in a way, an advantage. This recommender framework couldn’t care less whether the “items” are books, theme parks, flowers, or even other people, since nothing about their attributes enters into any of the input.
  5. UserSimilarity: Way to compare users (user based approach)ItemSimilarity: Way to compare items (items based approach)Recommender: Interface for providing recommendationsUserNeighborhood: Interface for computing a neighborhood of similar users that can then be used by the Recommenders
  6. http://www.flickr.com/photos/martinimike/3770274175/http://www.flickr.com/photos/fotoosvanrobin/3182238046/http://www.flickr.com/photos/this_girl_daydreams/3190110968/http://www.flickr.com/photos/19998197@N00/3238445535/
  7. Talk aboutRecommenderEvaluator
  8. ממוצעסטיית תקן
  9. Explain that we are going to take all project dependencies, pack them together with Ant and include them on the package phase of Maven