SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Apache

   The Elephant Driver
          Presenters:
      Antonio Loureiro Severien
     Emmanouil Dimogerontakis
     Muhammad Anis uddin Nasir
What is Apache Mahout?
● Machine learning and data mining framework for
  classification, clustering and recommendation

● The Apache Mahout free machine learning library's goal
  is to build scalable machine learning tools for use on
  analysing big data on a distributed manner
Machine Learning
"Machine Learning is programming computers to optimize a
performance criterion using example data or past
experience" - Alpaydin, 2004

Machine learning is concerned with the design and
development of algorithms that allow machines to make
decisions or even evolve behaviors based on collection of
empirical data.
Data Mining
Data mining, also called knowledge discovery in
databases(KDD) is the process of discovering interesting
and useful patterns and relationships in large volumes of
data.
Combines tools from:
    ● statistics
    ● artificial intelligence (such as neural networks and
       machine learning)
with database management to analyze large data sets.
-Britannica Online Encyclopedia
Why Machine Learning and Data
Mining?

● Data, Data, DATA!!!


● Tasks too Hard to Program


● Customizing software
Available Machine Learning Tools


●   WEKA
●   R
●   KEEL
●   Others...


Not enough?
Apache Mahout vs others?
Many open source Machine Learning
libraries either:
● Lack Community
● Lack Documentation and Examples
● Lack the Apache License
    (business opportunity)
● Are research-oriented
    (not fit for production yet)
● Lack Scalability
Mahout = Elephant Driver?
Why we need scalability?
● Big Data
Applications
● Recommendation features
● Clustering of information
● Classification

Examples: Movie recommendations, stock
analysis, fraud detection, ad-sense
recommendation, etc...

            How do we do this?
Supported Algorithms
●   Classification
●   Clustering
●   Recommender / Collaborative Filtering
●   Evolutionary Algorithms
●   Pattern Mining
●   Regression
●   Dimension reduction
●   Similarity Vectors
Classification
(learn to assign categories to documents)

Fully functional
 ● Logistic Regression (SGD)
 ● Bayesian

Integrated to Mahout Development
 ● Random Forests (integrated)
 ● Online Passive Aggressive (integrated)
 ● Boosting (awaiting patch commit)

Open to be worked on...
 ● Hidden Markov Models (HMM) - Training is done in Map-Reduce
 ● Support Vector Machines (SVM) (open)
 ● Perceptron and Winnow (open)
 ● Neural Network (open)
Clustering
(group items that are topically related)

Fully functional
 ● Expectation Maximization (EM)
 ● Hierarchical Clustering

Integrated to Mahout Development
 ● Canopy Clustering
 ● K-Means Clustering
 ● Fuzzy K-Means
 ● Mean Shift Clustering
 ● Dirichlet Process Clustering
 ● Latent Dirichlet Allocation
 ● Spectral Clustering
 ● Minhash Clustering
 ● Top Down Clustering
Recommenders /
Collaborative Filtering
(find items a user might like /
find items that appear together)

Integrated to Mahout Development
●   Non-distributed recommenders ("Taste") (integrated)
●   Distributed Item-Based Collaborative Filtering (integrated)
●   Collaborative Filtering using a parallel matrix factorization (integrated)
Who is using it?
Opportunities
●   Developers
●   Researchers
●   Small Business
●   Large Business
●   Consultancy...
    ○ on Mahout
    ○ on specific data analysis
● Open data
● etc...
Apache Mahout
Business?

Ideas?

Suggestions?

Questions?
Where to start?
● Wikipedia Bayes Example
   ○   https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html


● What does it do?
   ○ Classify wikipedia data dump by countries.
   ○ Objective: Predict what country an unseen article
     should be categorized into.
References
General
http://www.slideshare.net/sdec2011/sdec2011-mahout-the-what-the-how-and-
the-why
http://www.slideshare.net/gsingers/intro-to-mahout-dc-hadoop
http://www.slideshare.net/aneeshabakharia/lca2011-mahout
Hands-on
http://www.slideshare.net/OReillyOSCON/hands-on-mahout
Who is using it?
https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Apache Mahout
http://mahout.apache.org/
Quickstart
https://cwiki.apache.org/MAHOUT/quickstart.html

Mais conteúdo relacionado

Mais procurados

Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache MahoutAman Adhikari
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache MahoutDaniel Glauser
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)Jee Vang, Ph.D.
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenderssscdotopen
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDCDrew Farris
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Mahout classification presentation
Mahout classification presentationMahout classification presentation
Mahout classification presentationNaoki Nakatani
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkEvan Casey
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to MahoutUri Lavi
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Apache Mahout Architecture Overview
Apache Mahout Architecture OverviewApache Mahout Architecture Overview
Apache Mahout Architecture OverviewStefano Dalla Palma
 

Mais procurados (20)

Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
mahout introduction
mahout  introductionmahout  introduction
mahout introduction
 
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache Mahout
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Mahout classification presentation
Mahout classification presentationMahout classification presentation
Mahout classification presentation
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Apache Mahout Architecture Overview
Apache Mahout Architecture OverviewApache Mahout Architecture Overview
Apache Mahout Architecture Overview
 

Destaque

Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
MAHOUT classifier tour
MAHOUT classifier tourMAHOUT classifier tour
MAHOUT classifier tourTed Dunning
 
Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010Yahoo Developer Network
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Diane Richey Resume4
Diane Richey Resume4Diane Richey Resume4
Diane Richey Resume4Diane Richey
 
How to make mobile convert - usertesting webinar with michael mace
How to make mobile convert - usertesting webinar with michael maceHow to make mobile convert - usertesting webinar with michael mace
How to make mobile convert - usertesting webinar with michael maceUserTesting
 
Few words about happiness (Polish talk) / O szczęściu słów kilka
Few words about happiness (Polish talk) / O szczęściu słów kilkaFew words about happiness (Polish talk) / O szczęściu słów kilka
Few words about happiness (Polish talk) / O szczęściu słów kilkaTomek Borek
 
China bank industry market forecast and investment strategy report, 2013 2017
China bank industry market forecast and investment strategy report, 2013 2017China bank industry market forecast and investment strategy report, 2013 2017
China bank industry market forecast and investment strategy report, 2013 2017Qianzhan Intelligence
 
China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...Qianzhan Intelligence
 

Destaque (12)

Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
MAHOUT classifier tour
MAHOUT classifier tourMAHOUT classifier tour
MAHOUT classifier tour
 
Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Diane Richey Resume4
Diane Richey Resume4Diane Richey Resume4
Diane Richey Resume4
 
How to make mobile convert - usertesting webinar with michael mace
How to make mobile convert - usertesting webinar with michael maceHow to make mobile convert - usertesting webinar with michael mace
How to make mobile convert - usertesting webinar with michael mace
 
Wild Times
Wild TimesWild Times
Wild Times
 
Few words about happiness (Polish talk) / O szczęściu słów kilka
Few words about happiness (Polish talk) / O szczęściu słów kilkaFew words about happiness (Polish talk) / O szczęściu słów kilka
Few words about happiness (Polish talk) / O szczęściu słów kilka
 
China bank industry market forecast and investment strategy report, 2013 2017
China bank industry market forecast and investment strategy report, 2013 2017China bank industry market forecast and investment strategy report, 2013 2017
China bank industry market forecast and investment strategy report, 2013 2017
 
Culture
CultureCulture
Culture
 
China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...
 

Semelhante a Apache Mahout

Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
Apache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectApache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectsakthibalabalamuruga
 
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci....NET Conf UY
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicRaúl Garreta
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaMichael Mior
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892mercedes calderon
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)ActiveEon
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonActiveeon
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6Peter Tröger
 

Semelhante a Apache Mahout (20)

Mahout
MahoutMahout
Mahout
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Apache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectApache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobject
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
 
Data science
Data scienceData science
Data science
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academia
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Paris ML meetup
Paris ML meetupParis ML meetup
Paris ML meetup
 
Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Data science as career
Data science as careerData science as career
Data science as career
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6
 

Mais de Save Manos

Software Defined Networking for Community Network Testbeds
Software Defined Networking for Community Network TestbedsSoftware Defined Networking for Community Network Testbeds
Software Defined Networking for Community Network TestbedsSave Manos
 
Lock Service with Paxos in Erlang
Lock Service with Paxos in ErlangLock Service with Paxos in Erlang
Lock Service with Paxos in ErlangSave Manos
 
FOSS Licenses: A first attempt
FOSS Licenses: A first attemptFOSS Licenses: A first attempt
FOSS Licenses: A first attemptSave Manos
 
Ciel universal distributed execution engine
Ciel universal distributed execution engine Ciel universal distributed execution engine
Ciel universal distributed execution engine Save Manos
 
A boring presentation about social mobile communication patterns and opportun...
A boring presentation about social mobile communication patterns and opportun...A boring presentation about social mobile communication patterns and opportun...
A boring presentation about social mobile communication patterns and opportun...Save Manos
 
Man In The Browser
Man In The BrowserMan In The Browser
Man In The BrowserSave Manos
 
P2P-Tuple: Towards a Robust Volunteer Computing Platform
P2P-Tuple: Towards a Robust Volunteer Computing Platform P2P-Tuple: Towards a Robust Volunteer Computing Platform
P2P-Tuple: Towards a Robust Volunteer Computing Platform Save Manos
 
A survey on modifications for unstructured P2P in WMNs .
A survey on modifications for unstructured P2P in WMNs . A survey on modifications for unstructured P2P in WMNs .
A survey on modifications for unstructured P2P in WMNs . Save Manos
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Save Manos
 
Network as a Service
Network as  a ServiceNetwork as  a Service
Network as a ServiceSave Manos
 
RESTful Web Services
RESTful Web ServicesRESTful Web Services
RESTful Web ServicesSave Manos
 
Distributed systems
Distributed systemsDistributed systems
Distributed systemsSave Manos
 

Mais de Save Manos (14)

Software Defined Networking for Community Network Testbeds
Software Defined Networking for Community Network TestbedsSoftware Defined Networking for Community Network Testbeds
Software Defined Networking for Community Network Testbeds
 
Lock Service with Paxos in Erlang
Lock Service with Paxos in ErlangLock Service with Paxos in Erlang
Lock Service with Paxos in Erlang
 
NaaS
NaaSNaaS
NaaS
 
FOSS Licenses: A first attempt
FOSS Licenses: A first attemptFOSS Licenses: A first attempt
FOSS Licenses: A first attempt
 
Ciel universal distributed execution engine
Ciel universal distributed execution engine Ciel universal distributed execution engine
Ciel universal distributed execution engine
 
A boring presentation about social mobile communication patterns and opportun...
A boring presentation about social mobile communication patterns and opportun...A boring presentation about social mobile communication patterns and opportun...
A boring presentation about social mobile communication patterns and opportun...
 
Man In The Browser
Man In The BrowserMan In The Browser
Man In The Browser
 
P2P-Tuple: Towards a Robust Volunteer Computing Platform
P2P-Tuple: Towards a Robust Volunteer Computing Platform P2P-Tuple: Towards a Robust Volunteer Computing Platform
P2P-Tuple: Towards a Robust Volunteer Computing Platform
 
A survey on modifications for unstructured P2P in WMNs .
A survey on modifications for unstructured P2P in WMNs . A survey on modifications for unstructured P2P in WMNs .
A survey on modifications for unstructured P2P in WMNs .
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
Network as a Service
Network as  a ServiceNetwork as  a Service
Network as a Service
 
Openflow
OpenflowOpenflow
Openflow
 
RESTful Web Services
RESTful Web ServicesRESTful Web Services
RESTful Web Services
 
Distributed systems
Distributed systemsDistributed systems
Distributed systems
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Apache Mahout

  • 1. Apache The Elephant Driver Presenters: Antonio Loureiro Severien Emmanouil Dimogerontakis Muhammad Anis uddin Nasir
  • 2. What is Apache Mahout? ● Machine learning and data mining framework for classification, clustering and recommendation ● The Apache Mahout free machine learning library's goal is to build scalable machine learning tools for use on analysing big data on a distributed manner
  • 3. Machine Learning "Machine Learning is programming computers to optimize a performance criterion using example data or past experience" - Alpaydin, 2004 Machine learning is concerned with the design and development of algorithms that allow machines to make decisions or even evolve behaviors based on collection of empirical data.
  • 4. Data Mining Data mining, also called knowledge discovery in databases(KDD) is the process of discovering interesting and useful patterns and relationships in large volumes of data. Combines tools from: ● statistics ● artificial intelligence (such as neural networks and machine learning) with database management to analyze large data sets. -Britannica Online Encyclopedia
  • 5. Why Machine Learning and Data Mining? ● Data, Data, DATA!!! ● Tasks too Hard to Program ● Customizing software
  • 6. Available Machine Learning Tools ● WEKA ● R ● KEEL ● Others... Not enough?
  • 7. Apache Mahout vs others? Many open source Machine Learning libraries either: ● Lack Community ● Lack Documentation and Examples ● Lack the Apache License (business opportunity) ● Are research-oriented (not fit for production yet) ● Lack Scalability
  • 9. Why we need scalability? ● Big Data
  • 10. Applications ● Recommendation features ● Clustering of information ● Classification Examples: Movie recommendations, stock analysis, fraud detection, ad-sense recommendation, etc... How do we do this?
  • 11. Supported Algorithms ● Classification ● Clustering ● Recommender / Collaborative Filtering ● Evolutionary Algorithms ● Pattern Mining ● Regression ● Dimension reduction ● Similarity Vectors
  • 12. Classification (learn to assign categories to documents) Fully functional ● Logistic Regression (SGD) ● Bayesian Integrated to Mahout Development ● Random Forests (integrated) ● Online Passive Aggressive (integrated) ● Boosting (awaiting patch commit) Open to be worked on... ● Hidden Markov Models (HMM) - Training is done in Map-Reduce ● Support Vector Machines (SVM) (open) ● Perceptron and Winnow (open) ● Neural Network (open)
  • 13. Clustering (group items that are topically related) Fully functional ● Expectation Maximization (EM) ● Hierarchical Clustering Integrated to Mahout Development ● Canopy Clustering ● K-Means Clustering ● Fuzzy K-Means ● Mean Shift Clustering ● Dirichlet Process Clustering ● Latent Dirichlet Allocation ● Spectral Clustering ● Minhash Clustering ● Top Down Clustering
  • 14. Recommenders / Collaborative Filtering (find items a user might like / find items that appear together) Integrated to Mahout Development ● Non-distributed recommenders ("Taste") (integrated) ● Distributed Item-Based Collaborative Filtering (integrated) ● Collaborative Filtering using a parallel matrix factorization (integrated)
  • 16. Opportunities ● Developers ● Researchers ● Small Business ● Large Business ● Consultancy... ○ on Mahout ○ on specific data analysis ● Open data ● etc...
  • 18. Where to start? ● Wikipedia Bayes Example ○ https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html ● What does it do? ○ Classify wikipedia data dump by countries. ○ Objective: Predict what country an unseen article should be categorized into.