SlideShare uma empresa Scribd logo
1 de 34
Machine Learning
con Apache Mahout
  Domingo Suarez Torres
Machine Learning (ML)
        Introduction
Definition

     • Machine learning, a branch of artificial
        intelligence, is a scientific discipline
        concerned with the design and
        development of algorithms that allow
        computers to evolve behaviors based on
        empirical data (1)


1http://en.wikipedia.org/wiki/Machine_learning
• “Machine Learning is programming
  computers to optimize a performance
  criterion using example data or past
  experience”
 • Intro. To Machine Learning by E. Alpaydin
Applications
•   Recommend friends/dates/        •   Detect anomalies in machine
    products                            output

•   Classify content into           •   Ranking search results
    predefined groups
                                    •   Fraud detection
•   Find similar content based
    on object properties            •   Spam detection

•   Find associations/patterns in   •   Medical diagnostics
    actions/behaviors
                                    •   Translators
•   Identify key topics in large
    collections of text             •   Much more¡
Math

• Stadistics
• Discrete Math
• Linear algebra
• Probability
Starting with ML
•   Get your data
•   Decide on your features per your algorithm
•   Prep the data
    •   Different approaches for different algorithms
•   Run your algorithm(s)
    •   Lather, rinse, repeat
•   Validate your results
    •   Smell test, A/B testing
Apache Mahout

• Machine Learning library. Platform?
• Extensible, we can use our own algorithm.
• Hadoop support
• 2005. Taste Framework
• 2008. Included in Lucene
Scalability
•   Huge amount of data, growing every second¡
•   Be as fast and efficient as possible given the intrinsic design of
    the algorithm
    •   Some algorithms won’t scale to massive machine clusters
    •   Others fit logically on a Map Reduce framework like
        Apache Hadoop
    •   Still others will need alternative distributed programming
        models
    •   Be pragmatic
•   Most Mahout implementations are Map Reduce enabled
Who uses Mahout?
Components

• Recommender Engines (collaborative
  filtering, content-based)
• Clustering
• Classification
When to use?
• Recommendation
 • Rank large datasets
• Clustering
 • Group your data
• Classification
 • Train me to think like you
Recommenders
•   Given a data set. Make a recomendation.
    •   Item recomendation (Book, Movie, etc)
•   Ranking based
•   Recomendations
    •   User based
    •   Item based
•   knowledge of user’s relationships to items (user
    preferences)
Colaborative filtering
• User based
• Item based
• Both techniques require no knowledge of
  the properties of the items themselves.
• Item Type is irrelevant. Apache Mahout is
  happy
17
Content based
• Domain-specific approaches
• Hard to meaningfully codify into a
  framework
• We are responsables of choosing which
  item's attributes to use.
• Apache Mahout can’t handle this out-of-
  the-box, but can built on top.
Making recommendations

 • What we need?
  • Input data
  • Neighborhood
  • Similarity
Input Data
•   In Mahout terms: Preferences
•   A preference contains:
    •   User ID
    •   Item ID
    •   Preference value
    •   Example:
        •   1,101,5.0
        •   USER ID: 1, ITEM ID: 101, PrefValue: 5.0
21
Neighborhood
Nearest N Users    Threshold
Similarity
Clustering

• Surface naturally occurring groups of data
• A notion of similarity (and dissimilarity)
• Algorithms do not require training
• Stopping condition - iterate until close
  enough
Clustering
•   Document level
    •   Group documents based on a notion of similarity
    •   K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift
    •   Distance Measures
    •   Manhattan, Euclidean, other
•   Topic Modeling
    •   Cluster words across documents to identify topics
    •   Latent Dirichlet Allocation
Classification

• Require training (supervised)
• Make a single decision with a very limited
  set of outcomes
• Typical answers naturally fit into categories
Classification samples

• Credit card fraud prediction
• Customer attrition
• Diabetes detector
• Search Engine
Mahout/Hadoop
• For large data sets
• Online
• Offline (Hadoop prefered)
• You can build your solution with Mahout
• Take a look into Weka
 • http://www.cs.waikato.ac.nz/ml/weka/
Resources
Resources
Resources
Join us¡
• GIAMA.
 • Agustin Ramos iniciative

Mais conteúdo relacionado

Mais procurados

2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learningmahutte
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Machine Learning with Earth Observation Imagery
Machine Learning with Earth Observation ImageryMachine Learning with Earth Observation Imagery
Machine Learning with Earth Observation ImageryAmazon Web Services
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 

Mais procurados (20)

18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learning
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Machine Learning with Earth Observation Imagery
Machine Learning with Earth Observation ImageryMachine Learning with Earth Observation Imagery
Machine Learning with Earth Observation Imagery
 
Em Algorithm | Statistics
Em Algorithm | StatisticsEm Algorithm | Statistics
Em Algorithm | Statistics
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Destaque (6)

SGCE 2015 REST APIs
SGCE 2015 REST APIsSGCE 2015 REST APIs
SGCE 2015 REST APIs
 
Serling dev team, development process
Serling dev team, development processSerling dev team, development process
Serling dev team, development process
 
SGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page InterfaceSGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page Interface
 
SGNext Elasticsearch
SGNext ElasticsearchSGNext Elasticsearch
SGNext Elasticsearch
 
JVM Reactive Programming
JVM Reactive ProgrammingJVM Reactive Programming
JVM Reactive Programming
 
SGCE 2014 micro services
SGCE 2014 micro servicesSGCE 2014 micro services
SGCE 2014 micro services
 

Semelhante a Machine Learning & Apache Mahout

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...Lucas Jellema
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesHelen Bailey
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 

Semelhante a Machine Learning & Apache Mahout (20)

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital Humanities
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 

Mais de Domingo Suarez Torres

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesDomingo Suarez Torres
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivirDomingo Suarez Torres
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosDomingo Suarez Torres
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDomingo Suarez Torres
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesDomingo Suarez Torres
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureDomingo Suarez Torres
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyDomingo Suarez Torres
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityDomingo Suarez Torres
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextDomingo Suarez Torres
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosDomingo Suarez Torres
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 

Mais de Domingo Suarez Torres (20)

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivir
 
Contenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMXContenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMX
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de Microservicios
 
Java Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDLJava Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDL
 
meetup digital ocean kubernetes
meetup digital ocean kubernetesmeetup digital ocean kubernetes
meetup digital ocean kubernetes
 
Peru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVMPeru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVM
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
 
Cloud Native Development in the JVM
Cloud Native Development in the JVMCloud Native Development in the JVM
Cloud Native Development in the JVM
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a Kubernetes
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architecture
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 Observability
 
Cloud Native Mexico Presentacion
Cloud Native Mexico PresentacionCloud Native Mexico Presentacion
Cloud Native Mexico Presentacion
 
gRPC: Beyond REST
gRPC: Beyond RESTgRPC: Beyond REST
gRPC: Beyond REST
 
Devops Landscape
Devops LandscapeDevops Landscape
Devops Landscape
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNext
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de Microservicios
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Machine Learning & Apache Mahout

  • 1. Machine Learning con Apache Mahout Domingo Suarez Torres
  • 2. Machine Learning (ML) Introduction
  • 3. Definition • Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data (1) 1http://en.wikipedia.org/wiki/Machine_learning
  • 4. • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” • Intro. To Machine Learning by E. Alpaydin
  • 5. Applications • Recommend friends/dates/ • Detect anomalies in machine products output • Classify content into • Ranking search results predefined groups • Fraud detection • Find similar content based on object properties • Spam detection • Find associations/patterns in • Medical diagnostics actions/behaviors • Translators • Identify key topics in large collections of text • Much more¡
  • 6. Math • Stadistics • Discrete Math • Linear algebra • Probability
  • 7.
  • 8. Starting with ML • Get your data • Decide on your features per your algorithm • Prep the data • Different approaches for different algorithms • Run your algorithm(s) • Lather, rinse, repeat • Validate your results • Smell test, A/B testing
  • 9. Apache Mahout • Machine Learning library. Platform? • Extensible, we can use our own algorithm. • Hadoop support • 2005. Taste Framework • 2008. Included in Lucene
  • 10. Scalability • Huge amount of data, growing every second¡ • Be as fast and efficient as possible given the intrinsic design of the algorithm • Some algorithms won’t scale to massive machine clusters • Others fit logically on a Map Reduce framework like Apache Hadoop • Still others will need alternative distributed programming models • Be pragmatic • Most Mahout implementations are Map Reduce enabled
  • 12. Components • Recommender Engines (collaborative filtering, content-based) • Clustering • Classification
  • 13. When to use? • Recommendation • Rank large datasets • Clustering • Group your data • Classification • Train me to think like you
  • 14. Recommenders • Given a data set. Make a recomendation. • Item recomendation (Book, Movie, etc) • Ranking based • Recomendations • User based • Item based • knowledge of user’s relationships to items (user preferences)
  • 15.
  • 16. Colaborative filtering • User based • Item based • Both techniques require no knowledge of the properties of the items themselves. • Item Type is irrelevant. Apache Mahout is happy
  • 17. 17
  • 18. Content based • Domain-specific approaches • Hard to meaningfully codify into a framework • We are responsables of choosing which item's attributes to use. • Apache Mahout can’t handle this out-of- the-box, but can built on top.
  • 19. Making recommendations • What we need? • Input data • Neighborhood • Similarity
  • 20. Input Data • In Mahout terms: Preferences • A preference contains: • User ID • Item ID • Preference value • Example: • 1,101,5.0 • USER ID: 1, ITEM ID: 101, PrefValue: 5.0
  • 21. 21
  • 22.
  • 25. Clustering • Surface naturally occurring groups of data • A notion of similarity (and dissimilarity) • Algorithms do not require training • Stopping condition - iterate until close enough
  • 26. Clustering • Document level • Group documents based on a notion of similarity • K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift • Distance Measures • Manhattan, Euclidean, other • Topic Modeling • Cluster words across documents to identify topics • Latent Dirichlet Allocation
  • 27. Classification • Require training (supervised) • Make a single decision with a very limited set of outcomes • Typical answers naturally fit into categories
  • 28. Classification samples • Credit card fraud prediction • Customer attrition • Diabetes detector • Search Engine
  • 29. Mahout/Hadoop • For large data sets • Online • Offline (Hadoop prefered) • You can build your solution with Mahout • Take a look into Weka • http://www.cs.waikato.ac.nz/ml/weka/
  • 33.
  • 34. Join us¡ • GIAMA. • Agustin Ramos iniciative

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n