SlideShare uma empresa Scribd logo
1 de 25
Introducing Apache Mahout

 Scalable Machine Learning for All!
          Grant Ingersoll
Agenda
• What is Machine Learning?
  – Definitions
  – Types
  – Applications
• Mahout
  –   What?
  –   Why?
  –   How?
  –   Who?
What is Machine Learning?




                NOT!
                 QuickTimeª and a
                  decompressor                                           QuickTimeª and a
            are needed to see this picture.       Or?                      decompressor
                                                                    are needed to see this picture.




http://en.wikipedia.org/wiki/Image:Hal-9000.jpg




                                                        http://upload.wikimedia.org/wikipedia/en/4/49/Terminator.jpg
How about?



             Google News
Or?




      Amazon.com
Definition
• “Machine Learning is programming
  computers to optimize a performance
  criterion using example data or past
  experience”
  – Intro. To Machine Learning by E.
    Alpaydin
• Subset of Artificial Intelligence
  – Many other fields: comp sci., biology,
    math, psychology, etc.
Characterizations
• Lots of Data

• Identifiable Features in that Data

• Too big/costly for people to handle
  – People still can help
Types
• Supervised
  – Using labeled training data, create
    function that predicts output of unseen
    inputs
• Unsupervised
  – Using unlabeled data, create function
    that predicts output
• Semi-Supervised
  – Uses labeled and unlabeled data
Classification/Categorization
•   Spam Filtering
•   Named Entity Recognition
•   Phrase Identification
•   Sentiment Analysis
•   Classification into a Taxonomy
Clustering
• Find Natural Groupings
  – Documents
  – Search Results
  – People
  – Genetic traits in groups
  – Many, many more uses
Collaborative Filtering
• Recommend people and products
  – User-User
    • User likes X, you might too
  – Item-Item
    • People who bought X also bought Y
Info. Retrieval
• Learning Ranking Functions

• Learning Spelling Corrections

• User Click Analysis and Tracking
Other
• Image Analysis
• Robotics
• Games
• Higher level natural language
  processing
• Many, many others
What is Apache Mahout?
• A Mahout is an elephant
  trainer/driver/keeper, hence…
             QuickTimeª and a
               decompressor
        are needed to see this picture.



                  + (and other distributed techniques)
           Machine Learning
                  =
What?
• Hadoop brings:
  – Map/Reduce API
  – HDFS
  – In other words, scalability and fault-
    tolerance
• Thus, Mahout’s Goal is:
  – Scalable Machine Learning with Apache
    License
Why Mahout?
• Many Open Source ML libraries either:
  –   Lack Community
  –   Lack Documentation and Examples
  –   Lack Scalability
  –   Lack the Apache License ;-)
  –   Or are research-oriented
• Personal: Learn more ML
• Intelligent Apps are the Present and Future
  – See the Hadoop talks tomorrow and Friday!
• Goal: Overcome gaps the Apache Way!
Current Status
• Close to Initial release
   – Focused on examples, docs, bug fixes
• What’s in it:
   – Simple Matrix/Vector library
   – Taste Collaborative Filtering
   – Clustering
      • Canopy/K-Means/Fuzzy K-Means/Mean-shift
   – Classifiers
      • Naïve Bayes
      • Complementary NB
   – Evolutionary
      • Integration with Watchmaker for fitness function
How?
• Examples
  – Taste
  – Clustering
  – Classification
  – Evolutionary
Taste: Movie
       Recommendations
• Given ratings by users of movies,
  recommend other movies

• http://lucene.apache.org/mahout/taste
  .html#demo
Clustering: Synthetic Control
            Data
• http://archive.ics.uci.edu/ml/datasets/Synthetic+


• Each clustering impl. has an example
  Job for running in
  <MAHOUT_HOME>/examples
  – o.a.mahout.clustering.syntheticcontrol.*
• Outputs clusters…
Classification: NB and CNB
          Examples
• 20 Newsgroups
  – http://cwiki.apache.org/confluence/display/MA


• Wikipedia
  – http://cwiki.apache.org/confluence/display/MA
Evolutionary
• Traveling Salesman
  – http://cwiki.apache.org/confluence/displa
    y/MAHOUT/Traveling+Salesman


• Class Discovery
  – http://cwiki.apache.org/confluence/displa
    y/MAHOUT/Class+Discovery
What’s Next?
•   Release 0.1!
•   Shared Amazon Images (others?)
•   More Examples
•   Winnow/Perceptron (MAHOUT-85)
•   Hbase and HAMA support
•   Normalize I/O format for data
•   Solr Integration (SOLR-769)
•   Other Algorithms: SVM, Linear Regression,
    etc.
When, Where, Who
• When? Now!
  – Mahout is growing
• Who? You!
  – We want Java programmers who:
     • Are comfortable with math
     • Like to work on large, hard problems
• Where?
  – http://lucene.apache.org/mahout
  – http://cwiki.apache.org/MAHOUT
  – mahout-{user|dev}@lucene.apache.org
Resources
• “Programming Collective Intelligence”
  by Toby Segaran
• “Data Mining - Practical Machine
  Learning Tools and Techniques” by
  Ian H. Witten and Eibe Frank
• Hadoop - http://hadoop.apache.org
• http://mloss.org/software/

Mais conteúdo relacionado

Semelhante a Download Materials

LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learningsafa cimenli
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?Axel de Romblay
 
Java performance - not so scary after all
Java performance - not so scary after allJava performance - not so scary after all
Java performance - not so scary after allHolly Cummins
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stackMinhaz A V
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jKevin Watters
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
 

Semelhante a Download Materials (20)

Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
MahoutNew
MahoutNewMahoutNew
MahoutNew
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
OpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptxOpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptx
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Java performance - not so scary after all
Java performance - not so scary after allJava performance - not so scary after all
Java performance - not so scary after all
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Download Materials

  • 1. Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll
  • 2. Agenda • What is Machine Learning? – Definitions – Types – Applications • Mahout – What? – Why? – How? – Who?
  • 3. What is Machine Learning? NOT! QuickTimeª and a decompressor QuickTimeª and a are needed to see this picture. Or? decompressor are needed to see this picture. http://en.wikipedia.org/wiki/Image:Hal-9000.jpg http://upload.wikimedia.org/wikipedia/en/4/49/Terminator.jpg
  • 4. How about? Google News
  • 5. Or? Amazon.com
  • 6. Definition • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” – Intro. To Machine Learning by E. Alpaydin • Subset of Artificial Intelligence – Many other fields: comp sci., biology, math, psychology, etc.
  • 7. Characterizations • Lots of Data • Identifiable Features in that Data • Too big/costly for people to handle – People still can help
  • 8. Types • Supervised – Using labeled training data, create function that predicts output of unseen inputs • Unsupervised – Using unlabeled data, create function that predicts output • Semi-Supervised – Uses labeled and unlabeled data
  • 9. Classification/Categorization • Spam Filtering • Named Entity Recognition • Phrase Identification • Sentiment Analysis • Classification into a Taxonomy
  • 10. Clustering • Find Natural Groupings – Documents – Search Results – People – Genetic traits in groups – Many, many more uses
  • 11. Collaborative Filtering • Recommend people and products – User-User • User likes X, you might too – Item-Item • People who bought X also bought Y
  • 12. Info. Retrieval • Learning Ranking Functions • Learning Spelling Corrections • User Click Analysis and Tracking
  • 13. Other • Image Analysis • Robotics • Games • Higher level natural language processing • Many, many others
  • 14. What is Apache Mahout? • A Mahout is an elephant trainer/driver/keeper, hence… QuickTimeª and a decompressor are needed to see this picture. + (and other distributed techniques) Machine Learning =
  • 15. What? • Hadoop brings: – Map/Reduce API – HDFS – In other words, scalability and fault- tolerance • Thus, Mahout’s Goal is: – Scalable Machine Learning with Apache License
  • 16. Why Mahout? • Many Open Source ML libraries either: – Lack Community – Lack Documentation and Examples – Lack Scalability – Lack the Apache License ;-) – Or are research-oriented • Personal: Learn more ML • Intelligent Apps are the Present and Future – See the Hadoop talks tomorrow and Friday! • Goal: Overcome gaps the Apache Way!
  • 17. Current Status • Close to Initial release – Focused on examples, docs, bug fixes • What’s in it: – Simple Matrix/Vector library – Taste Collaborative Filtering – Clustering • Canopy/K-Means/Fuzzy K-Means/Mean-shift – Classifiers • Naïve Bayes • Complementary NB – Evolutionary • Integration with Watchmaker for fitness function
  • 18. How? • Examples – Taste – Clustering – Classification – Evolutionary
  • 19. Taste: Movie Recommendations • Given ratings by users of movies, recommend other movies • http://lucene.apache.org/mahout/taste .html#demo
  • 20. Clustering: Synthetic Control Data • http://archive.ics.uci.edu/ml/datasets/Synthetic+ • Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples – o.a.mahout.clustering.syntheticcontrol.* • Outputs clusters…
  • 21. Classification: NB and CNB Examples • 20 Newsgroups – http://cwiki.apache.org/confluence/display/MA • Wikipedia – http://cwiki.apache.org/confluence/display/MA
  • 22. Evolutionary • Traveling Salesman – http://cwiki.apache.org/confluence/displa y/MAHOUT/Traveling+Salesman • Class Discovery – http://cwiki.apache.org/confluence/displa y/MAHOUT/Class+Discovery
  • 23. What’s Next? • Release 0.1! • Shared Amazon Images (others?) • More Examples • Winnow/Perceptron (MAHOUT-85) • Hbase and HAMA support • Normalize I/O format for data • Solr Integration (SOLR-769) • Other Algorithms: SVM, Linear Regression, etc.
  • 24. When, Where, Who • When? Now! – Mahout is growing • Who? You! – We want Java programmers who: • Are comfortable with math • Like to work on large, hard problems • Where? – http://lucene.apache.org/mahout – http://cwiki.apache.org/MAHOUT – mahout-{user|dev}@lucene.apache.org
  • 25. Resources • “Programming Collective Intelligence” by Toby Segaran • “Data Mining - Practical Machine Learning Tools and Techniques” by Ian H. Witten and Eibe Frank • Hadoop - http://hadoop.apache.org • http://mloss.org/software/