SlideShare uma empresa Scribd logo
1 de 17
Several Applications in Machine
Learning and Netflix competition
Advisor: Wei-Chang Yeh
Student: Jing-Feng Deng
Report Date: 12/27/2013
4 Papers in this talk
• From the course material of coursera (online
NTU ML foundation course)
• Provided by Dr. Lin (Dr. 林軒田 )
Machine Learning in Food
Machine Learning in Food
• Sadilek et al. (2013)
• Foodborne illness detecting system (nEmesis)
• 4 month data collection
• 3,800,000 twitter data (2012/12/26~2013/4/25)
• NYC
• Twitter(+foursquare) Geocoding => GPS location
• Twitter data => corpus
• Human guided Machine Learning
• SVM
• Lots of statistics analysis
Machine Learning in Food: SVM learning
Machine Learning in Food: Top 20 significant Corpus
Machine Learning in Clothes
• Abu-Mostafa (2012)
• Fashion, style, how to wearing
• Fashion Recommendation System
piqueoutfit

Source: http://vimeo.com/34789625
Machine Learning in Traffic
• Stallkamp et al. (2012)
• GTSRB(German Traffic Sign Recognition
Benchmark)
• 52,000 pictures
• 43 diffreent icons
Google self-driving cars

Source: http://goo.gl/4EUDw
Netflix competition
• Bell et al. (2009), Abu-Mostafa (2012)
• Authors are from AT&T and Yahoo!
• Netflix: a company renting movies online
• 2007=>8.43%
• 2008=>9.63%
• 2009=>10.06%

• KDDCup
The Neighborhood Model
The Latent-Factor Approach
Reference
• Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which
Restaurants Should You Avoid Today?. In First AAAI Conference on Human
Computation and Crowdsourcing.
• Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for
teaching computers how to learn are beating the experts. Scientific American,
289(7):78-81, 2012.
• R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize.
IEEE Spectrum, 46(5):29–33, 2009.
• J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on
Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent
Transportation Systems 13(4): 1481-1483, 2012.
• GTSRB(German Traffic Sign Recognition Benchmark),
http://benchmark.ini.rub.de,
• KDDCup, http://www.kdd.org/kddcup/index.php

Mais conteúdo relacionado

Semelhante a Several applications in machine learning and netflix competition

Semelhante a Several applications in machine learning and netflix competition (20)

Introduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas JellemaIntroduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas Jellema
 
Essentials 4 Data Support
Essentials 4 Data Support Essentials 4 Data Support
Essentials 4 Data Support
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...
Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...
Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
Exploring learning analytics
 Exploring learning analytics Exploring learning analytics
Exploring learning analytics
 
Keeping up with Public Health Series: A Pilot Project for Public Health Resea...
Keeping up with Public Health Series: A Pilot Project for Public Health Resea...Keeping up with Public Health Series: A Pilot Project for Public Health Resea...
Keeping up with Public Health Series: A Pilot Project for Public Health Resea...
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
Jisc learning analytics service core slides
Jisc learning analytics service core slidesJisc learning analytics service core slides
Jisc learning analytics service core slides
 
Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...
Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...
Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...
 
Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...
 
Research data management training - How to make it happen?
Research data management training - How to make it happen?Research data management training - How to make it happen?
Research data management training - How to make it happen?
 
27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...
27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...
27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...
 
Learning Analytics for Self-Regulated Learning (2019)
Learning Analytics for Self-Regulated Learning (2019)Learning Analytics for Self-Regulated Learning (2019)
Learning Analytics for Self-Regulated Learning (2019)
 
Identifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and TechnologyIdentifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and Technology
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool
 
Learning Analytics – Research challenges arising from a current review of LA use
Learning Analytics – Research challenges arising from a current review of LA useLearning Analytics – Research challenges arising from a current review of LA use
Learning Analytics – Research challenges arising from a current review of LA use
 
TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation TeachingWithData.org Outreach Presentation
TeachingWithData.org Outreach Presentation
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

Several applications in machine learning and netflix competition

  • 1. Several Applications in Machine Learning and Netflix competition Advisor: Wei-Chang Yeh Student: Jing-Feng Deng Report Date: 12/27/2013
  • 2. 4 Papers in this talk • From the course material of coursera (online NTU ML foundation course) • Provided by Dr. Lin (Dr. 林軒田 )
  • 4. Machine Learning in Food • Sadilek et al. (2013) • Foodborne illness detecting system (nEmesis) • 4 month data collection • 3,800,000 twitter data (2012/12/26~2013/4/25) • NYC • Twitter(+foursquare) Geocoding => GPS location • Twitter data => corpus • Human guided Machine Learning • SVM • Lots of statistics analysis
  • 5. Machine Learning in Food: SVM learning
  • 6. Machine Learning in Food: Top 20 significant Corpus
  • 7. Machine Learning in Clothes • Abu-Mostafa (2012) • Fashion, style, how to wearing • Fashion Recommendation System
  • 9. Machine Learning in Traffic • Stallkamp et al. (2012) • GTSRB(German Traffic Sign Recognition Benchmark) • 52,000 pictures • 43 diffreent icons
  • 10. Google self-driving cars Source: http://goo.gl/4EUDw
  • 11.
  • 12.
  • 13. Netflix competition • Bell et al. (2009), Abu-Mostafa (2012) • Authors are from AT&T and Yahoo! • Netflix: a company renting movies online • 2007=>8.43% • 2008=>9.63% • 2009=>10.06% • KDDCup
  • 14.
  • 17. Reference • Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which Restaurants Should You Avoid Today?. In First AAAI Conference on Human Computation and Crowdsourcing. • Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for teaching computers how to learn are beating the experts. Scientific American, 289(7):78-81, 2012. • R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, 46(5):29–33, 2009. • J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent Transportation Systems 13(4): 1481-1483, 2012. • GTSRB(German Traffic Sign Recognition Benchmark), http://benchmark.ini.rub.de, • KDDCup, http://www.kdd.org/kddcup/index.php

Notas do Editor

  1. Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which Restaurants Should You Avoid Today?. In First AAAI Conference on Human Computation and Crowdsourcing. 這篇是2013年AAAI Conference上講ML(Machine Learning)在「食」上面的實務應用,它週期性收集了四個月twitter上380萬筆的資料(2012/12/26~2013/4/25),然後利用twitter上的geocoding對應到餐廳GPS的位置,並將收集到的twitter拆解成語料(corpus),利用人力去引導機器學習判別這些twitter中哪些是與因飲食生病相關的(文中說花了近1500美金),自動辨識出哪些餐廳可能有食安問題,作者之一在Google上班。整體的概念很有趣。 <number>
  2. Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for teaching computers how to learn are beating the experts. Scientific American, 289(7):78-81, 2012.這篇是2012年在美國科學人雜誌上的文章,內容提及一丁點 ML在「衣」上的應用,讓作者雖然不懂fashion仍可推薦別人,然後介紹基本ML的概念(supervised, unsupervised, reinforcement learning......),和提到比賽期間為三年的 100 million point of real data的 NetFlix 百萬美金「影片推薦演算法」競賽作者的團隊因晚20分鐘提解(第一和第二名的改善比率皆為10.06%),結果變成第二名,作者是加州理工學院的教授,影片推薦系統用了SVD(singular value decomposition)的ML strategy。 <number>
  3. J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent Transportation Systems 13(4): 1481-1483, 2012. 這篇是在講ML 在「行」的應用,用於實務的交通號誌辨識,交通號誌辨識的難度在於受到照明、部份遮蓋、旋轉和天氣狀況等因素影響,符號又有文字、圖示,顏色、形狀的差異,有些又十分相似(如速限標誌)。此Special Issue主要介紹四篇paper,也介紹了2011年IJCNN競賽最後一回合團隊用到的方法(共有超過20隊參加)。 GTSRB(German Traffic Sign Recognition Benchmark) http://benchmark.ini.rub.de,裡面有52000張圖43種不同類型的交通號誌,可作為benchmark dataset ,IJCNN比賽第二名的演算法是基於CNN(Convolutional Neural Network),其它的方法有SVM, linear discriminant analysis, subspace analysis, ensemble classifiers, slow feature analysis, nearest neighbor classifiers 和random forests(排名第三名的方法是K-D tree+random forests)。文中提到也有人把距當時八年內最新的44種 traffic sign detection algorithms做review和差異分析。 <number>
  4. R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, 46(5):29–33, 2009,這篇是由Netflix 比賽的得獎者寫的,蠻精彩的科普文章,作者來自AT&T Lab和Yahoo!等公司,文中大略地介紹他們的方法。Netflix 是一家著名的線上租電影的公司,它自製了Cinematch系統,可依使用者的喜好去推薦使用者相關口味的片子。公司執行長Reed Hastings考量Netflix小組人力可能無法負荷眾多演算在巨量資料下的效能評測,於是舉辦了Netflix 百萬程式大賽,Netflix會提供十億的ratings資料,來自48萬匿名使用者,1 萬7千部片。Netflix會保留最近300萬筆rating資料,要求參與的競爭者去預測它㥃。Netflix會評估每位競爭者的300萬筆的預測資料,並與真實的rating資料做比較,用RMSE(Root Mean Squared Error)做預測正確性的metric,預測的愈正確,RMSE愈小,分數會立刻回報線上的leaderboard讓所有人看見。這樣巨量的資料集一般一天只能算一次(我找到這裡可下載,666MB http://www.lifecrunch.biz/archives/207),Netflix另外提供一個代表性的資料集讓你可以方便計算。2006年10月2日比賽開始,可惜目前已停辦(http://en.wikipedia.org/wiki/Netflix_Prize)。 作者發現Nearest-neighbor在neighbor數少於50時會表現的比較好 Latent-factor model則有相對的弱點,難以偵測一些緊密相關片子(如魔戒三部曲)的強烈關連性(association),而這二個方法是互補的。 第一年(2007)提出的方法是這二個方法組合成的變化型(包含參數tuning),改善 8.43% (原先比賽的目標是10%),2008和另一隊合作又得了2008 Progress Prize(9.63%),2009年由The Pragmatic Chaos得獎(10.09%也有寫10.06%)。 發現一件有新意的事是,他們發現給幾分不重要,which movies比較重要,而從原本的數值分數,轉成binary,此作法補強了其它方法。 <number>