SlideShare uma empresa Scribd logo
1 de 11
Apache Mahout
By,
Rahul Reghunath
• A scalable machine learning library built on
hadoop, written on Java
• In the areas of collaborative filtering,
clustering and classification. Many of the
implementations use the Apache Hadoop
platform.
• It gives ability (Drive hadoop) to Hadoop
analyze.---- data mining.
• “ Machine learning is Programming computers
to optimize a performance criterion using
example data and past Experience”
Mahout Points
• Take a power of apache hadoop to solve
complex probs.
• By breaking them up into multiple parallel
tasks
• Stable release-- 0.9 / 1 February 2014
• 9 Oct 2011 - Mahout in Action released
Why Mahout?
• Many Open Source ML libraries either:
– Lack Community
– Lack Documentation and Examples
– Lack Scalability
– Or are research-oriented
Hadoop
• That was invented by Google back in their earlier days,
so they could usefully index all the rich textural and
structural information they were collecting, and then
present meaningful and actionable results to users.
• There was nothing on the market that would let them
do that, so they built their own platform. Google’s
innovations were incorporated into Nutch, an open
source project, and Hadoop was later spun-off from
that.
• Yahoo has played a key role developing Hadoop for
enterprise applications.
Hadoop architect
• Hadoop is designed to run on a large number of machines that
don’t share any memory or disks. That means you can buy a whole
bunch of commodity servers, slap them in a rack, and run the
Hadoop software on each one.
• When you want to load all of your organization’s data into Hadoop,
what the software does is bust that data into pieces that it then
spreads across your different servers. There’s no one place where
you go to talk to all of your data; Hadoop keeps track of where the
data resides.
• And because there are multiple copy stores, data stored on a server
that goes offline or dies can be automatically replicated from a
known good copy.
• Hadoop derives from Google's MapReduce and Google File System
papers.
Current Stages of Hadoop
• Facebook processes more than 500 TB of
data daily----The site manages millions of
photos and processes billions of likes each
day. That's a whole lot of sharing.
• hive is the technique used for connecting with
Hadoop.
• Yahoo also have some technique--pig
How to solve common business
Problems
• Recommendation –
User info + community info=Recommendation
• Classification --Mail sparming
• Clustering --making similar groups of data
Applications
• Ebay
• Netflix—movie
• Pandora—Radio staion
• E Hormoney –match people
Reference
• http://pig.apache.org
• http://mahout.apache.org
• Quora
• http://www.itproportal.com
• Wikipedia
• Youtube
• colleagues and Friends
The End
• 218 days are left in this year, Try to create an
awesome year for the world.
Thanks

Mais conteúdo relacionado

Mais procurados

INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceIntroduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceAvkash Chauhan
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and developmentconline training
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To HadoopAl Chin
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Joydeep Sen Sarma
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 

Mais procurados (18)

Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceIntroduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Hadoop..
Hadoop..Hadoop..
Hadoop..
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 

Semelhante a MahoutNew

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 

Semelhante a MahoutNew (20)

Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unit 3 intro.pptx
Unit 3 intro.pptxUnit 3 intro.pptx
Unit 3 intro.pptx
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

MahoutNew

  • 2. • A scalable machine learning library built on hadoop, written on Java • In the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. • It gives ability (Drive hadoop) to Hadoop analyze.---- data mining. • “ Machine learning is Programming computers to optimize a performance criterion using example data and past Experience”
  • 3. Mahout Points • Take a power of apache hadoop to solve complex probs. • By breaking them up into multiple parallel tasks • Stable release-- 0.9 / 1 February 2014 • 9 Oct 2011 - Mahout in Action released
  • 4. Why Mahout? • Many Open Source ML libraries either: – Lack Community – Lack Documentation and Examples – Lack Scalability – Or are research-oriented
  • 5. Hadoop • That was invented by Google back in their earlier days, so they could usefully index all the rich textural and structural information they were collecting, and then present meaningful and actionable results to users. • There was nothing on the market that would let them do that, so they built their own platform. Google’s innovations were incorporated into Nutch, an open source project, and Hadoop was later spun-off from that. • Yahoo has played a key role developing Hadoop for enterprise applications.
  • 6. Hadoop architect • Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. • When you want to load all of your organization’s data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There’s no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. • And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy. • Hadoop derives from Google's MapReduce and Google File System papers.
  • 7. Current Stages of Hadoop • Facebook processes more than 500 TB of data daily----The site manages millions of photos and processes billions of likes each day. That's a whole lot of sharing. • hive is the technique used for connecting with Hadoop. • Yahoo also have some technique--pig
  • 8. How to solve common business Problems • Recommendation – User info + community info=Recommendation • Classification --Mail sparming • Clustering --making similar groups of data
  • 9. Applications • Ebay • Netflix—movie • Pandora—Radio staion • E Hormoney –match people
  • 10. Reference • http://pig.apache.org • http://mahout.apache.org • Quora • http://www.itproportal.com • Wikipedia • Youtube • colleagues and Friends
  • 11. The End • 218 days are left in this year, Try to create an awesome year for the world. Thanks