SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Practical Machine Learning
  A Tutorial on Apache Mahout


               Biju B
         NLP R&D Division
         365Media Pvt. Ltd.
         bijub@365Media.in

             FOSSMEET NITC,
                 Calicut


          4-6 February 2011




   Biju B & Jaganadh G   Practical Machine Learning
nlp r d $ whoweare




     Working in Natural Language Processing (NLP), Machine Learning,
     Data Mining
     Passionate about Free and Open source :-)
     When gets free time teaches Python and blogs at
     http://jaganadhg.freeflux.net/blog and contributes to
     Openstreetmap
     Works for 365Media Pvt. Ltd. Coimbatore India.
     twitter handle : @jaganadhg, @bijub




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning
      Dont expect some mathy equations here




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis
     Fraud Detraction




                        Biju B & Jaganadh G   Practical Machine Learning
Mahout



  Mahout
  Open Source project by Apache Foundation
  Goal of this project is to build scalable machine learning libraries




                          Biju B & Jaganadh G   Practical Machine Learning
Mahout




  Mahout
  Mahout: a person who drives elephant ;-)
  The name comes from the project’s use of Apache Hadoop.




                       Biju B & Jaganadh G   Practical Machine Learning
Why a new library ?



  There are more than 30 Java libraries/ tools available for Machine
  Learning.
  Weka , Mallet, Classifier4j, Rapidminer ........
      Large Amount of data processing is not an easy task
      Machine Learning tools are supposed to produce quick results
      If the amount of data is too large it is not easy to process with a
      single machine (Even if it is powerful)
      Mahout is scalable: the core algorithms in Mahout are implemented
      on top of Apache Hadoop using the map/reduce paradigm




                        Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout




                Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier
     Random forest decision tree based classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Recommendation




    Filter information based on user preference
    Searching a large set of people and finding a smaller set with tastes
    similar to you
    e.g :- Amazon’s book recommendation , Netflix movie
    recommendation




                      Biju B & Jaganadh G   Practical Machine Learning
Document Classification




     Classify documents based on its content
     e.g: - spam filtering,priority inbox




                       Biju B & Jaganadh G   Practical Machine Learning
Demo


       Building recommendations engines with Mahout
       Document Classification with Mahout




                       Biju B & Jaganadh G   Practical Machine Learning
Reference




            Biju B & Jaganadh G   Practical Machine Learning
Reference


     Mahout in Action - Book by Sean Owen and Robin Anil, published
     by Manning Publications.
     Taming Text - By Grant Ingersoll and Tom Morton, published by
     Manning Publications.
     Introducing Apache Mahout - Grant Ingersoll - Intro to Apache
     Mahout focused on clustering, classification and collaborative
     filtering. https://www.ibm.com/developerworks/java/library/j-
     mahout/index.html
     Programming Collective Intelligence: Building Smart Web 2.0
     Applications
     http://www.amazon.com/Programming-Collective-Intelligence-
     Building-Applications/dp/0596529325




                      Biju B & Jaganadh G   Practical Machine Learning
Useful Resources




     Apache Mahout Site http://mahout.apache.org/
     Apache Mahout Mailing List user@mahout.apache.org
     The code which I used for Mahout demo is available at
     http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
     Twenty News Group data set
     http://people.csail.mit.edu/jrennie/20Newsgroups/20news-
     bydate.tar.gz




                      Biju B & Jaganadh G   Practical Machine Learning
Questions ??




               Biju B & Jaganadh G   Practical Machine Learning
Acknowledgments



  Thanks to :
      Manning Publications for Review Copy of the book ”Mahout in
      Action”
      Apache Mahout mailing list members
      Ted Dunning and Robin Anil for suggestions
      @chelakkandupoda for review and criticism
      Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and
      encouragement




                       Biju B & Jaganadh G   Practical Machine Learning
Finally




          Biju B & Jaganadh G   Practical Machine Learning

Mais conteúdo relacionado

Semelhante a Mahout Tutorial FOSSMEET NITC

Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
jameshodgkinson9
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
AshokKumarC18
 

Semelhante a Mahout Tutorial FOSSMEET NITC (20)

BotConf..pptx
BotConf..pptxBotConf..pptx
BotConf..pptx
 
Cognitive Automation - Your AI Coworker
Cognitive Automation - Your AI CoworkerCognitive Automation - Your AI Coworker
Cognitive Automation - Your AI Coworker
 
Python Machine Learning Tutorial
Python Machine Learning TutorialPython Machine Learning Tutorial
Python Machine Learning Tutorial
 
AI Training in Lucknow
AI Training in LucknowAI Training in Lucknow
AI Training in Lucknow
 
Projects
ProjectsProjects
Projects
 
Brief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptxBrief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptx
 
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...
 
JAM23-24 session 2 .pptx
JAM23-24 session 2 .pptxJAM23-24 session 2 .pptx
JAM23-24 session 2 .pptx
 
VIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANTVIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANT
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
power-of-generative-ai.pdf
power-of-generative-ai.pdfpower-of-generative-ai.pdf
power-of-generative-ai.pdf
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 

Mais de Jaganadh Gopinadhan

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
Jaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
Jaganadh Gopinadhan
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 

Mais de Jaganadh Gopinadhan (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine Translation
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Hdfs
HdfsHdfs
Hdfs
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Mahout Tutorial FOSSMEET NITC

  • 1. Practical Machine Learning A Tutorial on Apache Mahout Biju B NLP R&D Division 365Media Pvt. Ltd. bijub@365Media.in FOSSMEET NITC, Calicut 4-6 February 2011 Biju B & Jaganadh G Practical Machine Learning
  • 2. nlp r d $ whoweare Working in Natural Language Processing (NLP), Machine Learning, Data Mining Passionate about Free and Open source :-) When gets free time teaches Python and blogs at http://jaganadhg.freeflux.net/blog and contributes to Openstreetmap Works for 365Media Pvt. Ltd. Coimbatore India. twitter handle : @jaganadhg, @bijub Biju B & Jaganadh G Practical Machine Learning
  • 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Biju B & Jaganadh G Practical Machine Learning
  • 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Biju B & Jaganadh G Practical Machine Learning
  • 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Biju B & Jaganadh G Practical Machine Learning
  • 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Biju B & Jaganadh G Practical Machine Learning
  • 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Biju B & Jaganadh G Practical Machine Learning
  • 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Biju B & Jaganadh G Practical Machine Learning
  • 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Biju B & Jaganadh G Practical Machine Learning
  • 12. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Biju B & Jaganadh G Practical Machine Learning
  • 13. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Biju B & Jaganadh G Practical Machine Learning
  • 14. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Fraud Detraction Biju B & Jaganadh G Practical Machine Learning
  • 15. Mahout Mahout Open Source project by Apache Foundation Goal of this project is to build scalable machine learning libraries Biju B & Jaganadh G Practical Machine Learning
  • 16. Mahout Mahout Mahout: a person who drives elephant ;-) The name comes from the project’s use of Apache Hadoop. Biju B & Jaganadh G Practical Machine Learning
  • 17. Why a new library ? There are more than 30 Java libraries/ tools available for Machine Learning. Weka , Mallet, Classifier4j, Rapidminer ........ Large Amount of data processing is not an easy task Machine Learning tools are supposed to produce quick results If the amount of data is too large it is not easy to process with a single machine (Even if it is powerful) Mahout is scalable: the core algorithms in Mahout are implemented on top of Apache Hadoop using the map/reduce paradigm Biju B & Jaganadh G Practical Machine Learning
  • 18. Algorithms in Apache Mahout Biju B & Jaganadh G Practical Machine Learning
  • 19. Algorithms in Apache Mahout Collaborative Filtering Biju B & Jaganadh G Practical Machine Learning
  • 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Biju B & Jaganadh G Practical Machine Learning
  • 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Biju B & Jaganadh G Practical Machine Learning
  • 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Biju B & Jaganadh G Practical Machine Learning
  • 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Biju B & Jaganadh G Practical Machine Learning
  • 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Biju B & Jaganadh G Practical Machine Learning
  • 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Biju B & Jaganadh G Practical Machine Learning
  • 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Biju B & Jaganadh G Practical Machine Learning
  • 27. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Biju B & Jaganadh G Practical Machine Learning
  • 28. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Biju B & Jaganadh G Practical Machine Learning
  • 29. Recommendation Filter information based on user preference Searching a large set of people and finding a smaller set with tastes similar to you e.g :- Amazon’s book recommendation , Netflix movie recommendation Biju B & Jaganadh G Practical Machine Learning
  • 30. Document Classification Classify documents based on its content e.g: - spam filtering,priority inbox Biju B & Jaganadh G Practical Machine Learning
  • 31. Demo Building recommendations engines with Mahout Document Classification with Mahout Biju B & Jaganadh G Practical Machine Learning
  • 32. Reference Biju B & Jaganadh G Practical Machine Learning
  • 33. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective-Intelligence- Building-Applications/dp/0596529325 Biju B & Jaganadh G Practical Machine Learning
  • 34. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Biju B & Jaganadh G Practical Machine Learning
  • 35. Questions ?? Biju B & Jaganadh G Practical Machine Learning
  • 36. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Biju B & Jaganadh G Practical Machine Learning
  • 37. Finally Biju B & Jaganadh G Practical Machine Learning