SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Tools andTechnologies for Large Scale Data
                 Mining

                      Jaganadh G
                Project Lead NLP R&D
                  365Media Pvt. Ltd.
                 jaganadhg@gmail.com

           DRDO Sponsored National Level Seminar
                              on
      Challenging Issues on Data Mining Semantic Web,
      Sri Krishna College of Engineering and Technology,
                          Coimbatore


                     27th Jan 2012


                   Jaganadh G    Tools andTechnologies for Large Scale Data Mining
About me !!



     Software Engineer Specializing in Text Analytics Research &
     Development
     When free, teaches Python, Speaks about FOSS and blogs at
     http://jaganadhg.in
     Working as Project Lead (NLP) 365Media Pvt. Ltd.
     Coimbatore
     I am a computational linguist / Linguist and Indologist, Book
     reviewer
     Maters Degree Holder in Sanskrit from University of Kerala




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI)
  concerned with algorithms that allow computers to learn.




                           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI)
  concerned with algorithms that allow computers to learn.




                           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI)
  concerned with algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine
      Learning




                           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI)
  concerned with algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine
      Learning
      Dont expect some mathy equations here




                           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life




     Do you think that Machine Learning has any impact in our life
     ??




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life




     Do you think that Machine Learning has any impact in our life
     ??
     Yes




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life




     Do you think that Machine Learning has any impact in our life
     ??
     Yes
     In our day to day life we may use many Machine Learning
     powered tools




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life




     Do you think that Machine Learning has any impact in our life
     ??
     Yes
     In our day to day life we may use many Machine Learning
     powered tools
     E-mail spam filtering , product recommendations etc ..




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life




     Do you think that Machine Learning has any impact in our life
     ??
     Yes
     In our day to day life we may use many Machine Learning
     powered tools
     E-mail spam filtering , product recommendations etc ..
     Fraud detection




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Examples




           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Examples




           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Examples




           Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Tool for building Machine Learning powerd product/service




  Apache Mahout
  Apache Mahout is a scalable machine learning library that supports
  large data sets. Apache Mahout’s goal is to build scalable machine
  learning libraries.
      Commercially friendly licence
      Well documented
      Healthy community
      Targeted to developers


                          Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout




                   Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout


     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier
     Random forest decision tree based classifier



                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Demo

       Building recommendations engines with Mahout
       Document Classification with Mahout
       Some Python stuff on Machine Learning




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Reference




            Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Reference

     Mahout in Action - Book by Sean Owen and Robin Anil,
     published by Manning Publications.
     Taming Text - By Grant Ingersoll and Tom Morton, published
     by Manning Publications.
     Introducing Apache Mahout - Grant Ingersoll - Intro to
     Apache Mahout focused on clustering, classification and
     collaborative filtering.
     https://www.ibm.com/developerworks/java/library/j-
     mahout/index.html
     Programming Collective Intelligence: Building Smart Web 2.0
     Applications
     http://www.amazon.com/Programming-Collective-
     Intelligence-Building-Applications/dp/0596529325


                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Useful Resources




      Apache Mahout Site http://mahout.apache.org/
      Apache Mahout Mailing List user@mahout.apache.org
      The code which I used for Mahout demo is available at
      http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
      Twenty News Group data set
      http://people.csail.mit.edu/jrennie/20Newsgroups/20news-
      bydate.tar.gz




                        Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Questions ??




               Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Acknowledgments



  Thanks to :
      Manning Publications for Review Copy of the book ”Mahout
      in Action”
      Apache Mahout mailing list members
      Ted Dunning and Robin Anil for suggestions
      Sreejith S and Biju B for Java help
      @chelakkandupoda for review and criticism
      Mukundhanchari R&D Director 365Media Pvt. Ltd. for
      support and encouragement




                         Jaganadh G   Tools andTechnologies for Large Scale Data Mining
Finally




          Jaganadh G   Tools andTechnologies for Large Scale Data Mining

Mais conteúdo relacionado

Semelhante a Tools andTechnologies for Large Scale Data Mining

A day in the life of a data scientist in an AI company
A day in the life of a data scientist in an AI companyA day in the life of a data scientist in an AI company
A day in the life of a data scientist in an AI companyFrancesca Lazzeri, PhD
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...AgileNetwork
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Building a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupBuilding a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupParul Pandey
 
Data science using r multisoft systems
Data science using r  multisoft systemsData science using r  multisoft systems
Data science using r multisoft systemsMultisoft Systems
 
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...BayaReddy M
 
Data Science Environment with R on openSUSE Leap 15.1
Data Science Environment with R on openSUSE Leap 15.1Data Science Environment with R on openSUSE Leap 15.1
Data Science Environment with R on openSUSE Leap 15.1Sabar Suwarsono
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...Agile Testing Alliance
 
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptx
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptxCBITSS - Empowering Tomorrow's Tech Leaders Today.pptx
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptxCbitss Technologies
 
How to program your way into data science?
How to program your way into data science?How to program your way into data science?
How to program your way into data science?DeZyre
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderProduct School
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesRudiger Wolf
 
A great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream jobA great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream jobMamathaSharma4
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkDavid Chiu
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Google Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryGoogle Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryCARTO
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Brochure for 2022 Webinar Dr.Himadri AI.pdf
Brochure for 2022 Webinar Dr.Himadri AI.pdfBrochure for 2022 Webinar Dr.Himadri AI.pdf
Brochure for 2022 Webinar Dr.Himadri AI.pdfHIMADRI BANERJI
 

Semelhante a Tools andTechnologies for Large Scale Data Mining (20)

A day in the life of a data scientist in an AI company
A day in the life of a data scientist in an AI companyA day in the life of a data scientist in an AI company
A day in the life of a data scientist in an AI company
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
 
All thingspython@pivotal
All thingspython@pivotalAll thingspython@pivotal
All thingspython@pivotal
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Building a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupBuilding a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetup
 
Data science using r multisoft systems
Data science using r  multisoft systemsData science using r  multisoft systems
Data science using r multisoft systems
 
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...
Navigating the Era of Big Data Analytics: A Roadmap for Data Analyst Courses ...
 
Data Science Environment with R on openSUSE Leap 15.1
Data Science Environment with R on openSUSE Leap 15.1Data Science Environment with R on openSUSE Leap 15.1
Data Science Environment with R on openSUSE Leap 15.1
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
 
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptx
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptxCBITSS - Empowering Tomorrow's Tech Leaders Today.pptx
CBITSS - Empowering Tomorrow's Tech Leaders Today.pptx
 
How to program your way into data science?
How to program your way into data science?How to program your way into data science?
How to program your way into data science?
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slides
 
A great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream jobA great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream job
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Google Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryGoogle Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQuery
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Brochure for 2022 Webinar Dr.Himadri AI.pdf
Brochure for 2022 Webinar Dr.Himadri AI.pdfBrochure for 2022 Webinar Dr.Himadri AI.pdf
Brochure for 2022 Webinar Dr.Himadri AI.pdf
 

Mais de Jaganadh Gopinadhan

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with PerJaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Jaganadh Gopinadhan
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine TranslationJaganadh Gopinadhan
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for OooJaganadh Gopinadhan
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands Jaganadh Gopinadhan
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Jaganadh Gopinadhan
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Jaganadh Gopinadhan
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Jaganadh Gopinadhan
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...Jaganadh Gopinadhan
 

Mais de Jaganadh Gopinadhan (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine Translation
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Hdfs
HdfsHdfs
Hdfs
 
Will Foss get me a Job?
Will Foss get me a Job?Will Foss get me a Job?
Will Foss get me a Job?
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Tools andTechnologies for Large Scale Data Mining

  • 1. Tools andTechnologies for Large Scale Data Mining Jaganadh G Project Lead NLP R&D 365Media Pvt. Ltd. jaganadhg@gmail.com DRDO Sponsored National Level Seminar on Challenging Issues on Data Mining Semantic Web, Sri Krishna College of Engineering and Technology, Coimbatore 27th Jan 2012 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 2. About me !! Software Engineer Specializing in Text Analytics Research & Development When free, teaches Python, Speaks about FOSS and blogs at http://jaganadhg.in Working as Project Lead (NLP) 365Media Pvt. Ltd. Coimbatore I am a computational linguist / Linguist and Indologist, Book reviewer Maters Degree Holder in Sanskrit from University of Kerala Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Fraud detection Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 12. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 13. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 14. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 15. Tool for building Machine Learning powerd product/service Apache Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Apache Mahout’s goal is to build scalable machine learning libraries. Commercially friendly licence Well documented Healthy community Targeted to developers Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 16. Algorithms in Apache Mahout Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 17. Algorithms in Apache Mahout Collaborative Filtering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 18. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 19. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 27. Demo Building recommendations engines with Mahout Document Classification with Mahout Some Python stuff on Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 28. Reference Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 29. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective- Intelligence-Building-Applications/dp/0596529325 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 30. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 31. Questions ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 32. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions Sreejith S and Biju B for Java help @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • 33. Finally Jaganadh G Tools andTechnologies for Large Scale Data Mining