O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.


Survey on machine learning on mapreduce

  • Entre para ver os comentários

  • Seja a primeira pessoa a gostar disto


  1. 1. Abhijit Kumar Behera M.Tech (CSE) Roll No. 1350001 School of Computer Engineering Guided By : Dr. Laxman Sahoo
  2. 2. Contents  Introduction  Apache Hadoop related projects  Application of Mahout  Literature Survey  Plan of Action  Conclusion  References
  3. 3. Introduction •The K-means algorithm is one of the most well-known clustering algorithms that has been frequently used to variety of problems. •MapReduce as the most popular cloud computing parallel framework is effective to handle massive data, the researches of K-means clustering algorithm which is based on MapReduce become a focus for scholars.
  4. 4. Components of Hadoop HDFS •Name Node •Data Node •Secondary Name Node  Map Reduce •Map() •Combine() •Reduce() YARN •Job Tracker •TaskTracker HBase
  5. 5. MapReduce Word count process
  6. 6. HBase Hadoop ( HDFS and MapReduce) Mahout Spark HIVE Zookeeper Sqoop PIG Apache Hadoop Projects
  7. 7. Application of Mahout  Collaborative Filtering  Matrix factorization based recommenders  A user based Recommender  Clustering  Canopy Clustering  K-Means Clustering  Fuzzy K-Means  Affinity Propagation Clustering  Classification  Naive Bayes  Random forest classifier
  8. 8. Literature Survey An Improved parallel K-means Clustering Algorithm with MapReduce Authors Name: Qing Liao, Fan Yang, Jingming Zhao Journal : Communication Technology (ICCT), IEEE Year of Publication:2014 Parallel K-means Algorithm 1) Initial 2) Mapper 3) Reducer
  9. 9. Literature Survey...
  10. 10. Literature Survey Clouds for Scalable Big Data Analytics Authors Name: Domenico Talia Journal: IEEE Computer Society Year of Publication:2013 In this paper, author describe how cloud comp uting enhance the development and functionality of Big Data Analytics when it deployed into it. Cloud Service Model Features Users Data analytics software as a service A single and complete data mining application or task (including data sources) offered as a service End users, analytics managers, data analysts Data analytics platform as a service A data analysis suite or framework for programming or developing high-level applications, hiding the cloud infrastructure and data storage Data mining application developers, data scientists Data analytics infrastructure as a service A set of virtualized resources provided to a programmer or data mining researcher for developing, configuring, and running data analysis frameworks or applications Data mining programmers, data management developers, data mining researchers
  11. 11. Plan of Action August - October 2014 Literature survey is done. November 2014 Problem definition formulation is done and problem solving outline are yet to be done December 2014- January 2015 Find out the appropriate solution of the problem yet to be formulated February-May 2015 Final implementation of the solution with result yet to be done
  12. 12. Conclusion Large-scale data mining has been a new challenge in recent years. Using the Map-Reduce frame work the big data analytics can be accomplished. The K-means algorithm is one of the most well-known clustering algorithms. However, its processing performance has usually encountered a bottleneck if being utilized to deal with massive data. A parallel K-means algorithm with MapReduce which shows obvious advantage is implemented to handle massive data.
  13. 13. References [1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map- Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ", Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014 [2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013 [3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services Platform integrating R", IEEE International Conference on Advance Cloud and Big Data , 2013 [4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F