Several applications in machine learning and netflix competition

•Transferir como PPT, PDF•

1 gostou•767 visualizações

jiing deng

Educação Tecnologia

Several Applications in Machine
Learning and Netflix competition
Advisor: Wei-Chang Yeh
Student: Jing-Feng Deng
Report Date: 12/27/2013

4 Papers in this talk
• From the course material of coursera (online
NTU ML foundation course)
• Provided by Dr. Lin (Dr. 林軒田 )

Machine Learning in Food
• Sadilek et al. (2013)
• Foodborne illness detecting system (nEmesis)
• 4 month data collection
• 3,800,000 twitter data (2012/12/26~2013/4/25)
• NYC
• Twitter(+foursquare) Geocoding => GPS location
• Twitter data => corpus
• Human guided Machine Learning
• SVM
• Lots of statistics analysis

Machine Learning in Food: Top 20 significant Corpus

Machine Learning in Clothes
• Abu-Mostafa (2012)
• Fashion, style, how to wearing
• Fashion Recommendation System

piqueoutfit

Source: http://vimeo.com/34789625

Machine Learning in Traffic
• Stallkamp et al. (2012)
• GTSRB(German Traffic Sign Recognition
Benchmark)
• 52,000 pictures
• 43 diffreent icons

Google self-driving cars

Source: http://goo.gl/4EUDw

Netflix competition
• Bell et al. (2009), Abu-Mostafa (2012)
• Authors are from AT&T and Yahoo!
• Netflix: a company renting movies online
• 2007=>8.43%
• 2008=>9.63%
• 2009=>10.06%

• KDDCup

Reference
• Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which
Restaurants Should You Avoid Today?. In First AAAI Conference on Human
Computation and Crowdsourcing.
• Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for
teaching computers how to learn are beating the experts. Scientific American,
289(7):78-81, 2012.
• R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize.
IEEE Spectrum, 46(5):29–33, 2009.
• J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on
Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent
Transportation Systems 13(4): 1481-1483, 2012.
• GTSRB(German Traffic Sign Recognition Benchmark),
http://benchmark.ini.rub.de，
• KDDCup, http://www.kdd.org/kddcup/index.php

Mais conteúdo relacionado

Semelhante a Several applications in machine learning and netflix competition

Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks, Oracle Machine Learning CS and the Citizen Data Scientists will all make their appearance, as will SQL.

Introduction overviewmachinelearning sig Door Lucas Jellema

Getting value from IoT, Integration and Data Analytics

Essentials 4 Data Support

Ellen Verbakel

Data Stewardship for SPATIAL/IsoCamp 2014

Carly Strasser

Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...

Megan Bowe

Smarter Data for Smarter Libraries

OCLC

Exploring learning analytics

Jisc

Keeping up with Public Health Series: A Pilot Project for Public Health Resea...

Douglas Joubert

The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...

Lucas Jellema

Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...

Lauri Eloranta

Jisc learning analytics service core slides

Paul Bailey

Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...

SURF Events

Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks and Citizen Data Scientists will all make their appearance, as will SQL.

Introduction to Machine Learning - An overview and first step for candidate d...

Lucas Jellema

Research data management training - How to make it happen?

Mari Elisa Kuusniemi

27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...

eMadrid network

Learning Analytics for Self-Regulated Learning (2019)

Wolfgang Greller

Identifying and Tracking Trends in Instructional Design and Technology

Fabrizio Fornara

The path to be a data scientist

Poo Kuan Hoong

DMPTool Webinar Series 1: Introduction to DMPTool

Carly Strasser

The JRC published a report on the use of Learning Analytics in education. These slides talk about the research challenges that arise from that report. Ferguson, R., Brasher, A., Clow, D., Cooper, A., Hillaire, G., Mittelmeier, J., Rienties, B., Ullmann, T., Vuorikari, R., Research Evidence on the Use of Learning Analytics and Their Implications for Education Policy. (2016), Joint Research Centre Science for Policy Report. https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/research-evidence-use-learning-analytics-implications-education-policy

Learning Analytics – Research challenges arising from a current review of LA use

Riina Vuorikari

TeachingWithData.org Outreach Presentation

ICPSR

Semelhante a Several applications in machine learning and netflix competition (20)

Introduction overviewmachinelearning sig Door Lucas Jellema

Essentials 4 Data Support

Data Stewardship for SPATIAL/IsoCamp 2014

Learning Analytics: Today, Tomorrow, and When We Get Flying Cars #psuweb Conf...

Smarter Data for Smarter Libraries

Exploring learning analytics

Keeping up with Public Health Series: A Pilot Project for Public Health Resea...

The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...

Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...

Jisc learning analytics service core slides

Ethics & Privacy issues in the context of Learning Analytics - Alan Berg, Mar...

Introduction to Machine Learning - An overview and first step for candidate d...

Research data management training - How to make it happen?

27_06_2019 Wolfgang Greller, from University of Teacher Education (Viena), on...

Learning Analytics for Self-Regulated Learning (2019)

Identifying and Tracking Trends in Instructional Design and Technology

The path to be a data scientist

DMPTool Webinar Series 1: Introduction to DMPTool

Learning Analytics – Research challenges arising from a current review of LA use

TeachingWithData.org Outreach Presentation

Último

Micro-Scholarship, What it is, How can it help me.pdf

Poh-Sun Goh

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

MaritesTamaniVerdade

Explore the world of IT certification with CompTIA. Discover how the CompTIA Security+ Book SY0-701 can elevate your cybersecurity expertise and open doors to new career opportunities. This PDF provides essential insights into the CompTIA Security+ certification, guiding you through exam preparation and showcasing the benefits of becoming CompTIA-certified. Download now to embark on your journey to IT excellence with CompTIA.

ComPTIA Overview | Comptia Security+ Book SY0-701

bronxfugly43

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Poonam Aher Patil

Holdier Curriculum Vitae (April 2024).pdf

agholdier

Making communications land - Are they received and understood as intended? webinar Thursday 2 May 2024 A joint webinar created by the APM Enabling Change and APM People Interest Networks, this is the third of our three part series on Making Communications Land. presented by Ian Cribbes, Director, IMC&T Ltd @cribbesheet The link to the write up page and resources of this webinar: https://www.apm.org.uk/news/making-communications-land-are-they-received-and-understood-as-intended-webinar/ Content description: How do we ensure that what we have communicated was received and understood as we intended and how do we course correct if it has not.

Making communications land - Are they received and understood as intended? we...

Association for Project Management

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Celine George

Introduction to Nonprofit Accounting: The Basics

TechSoup

SOC 101 Demonstration of Learning Presentation

camerronhm

ICT Role in 21st Century Education & its Challenges.pptx

AreebaZafar22

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

christianmathematics

Application orientated numerical on hev.ppt

RamjanShidvankar

1029-Danh muc Sach Giao Khoa khoi 6.pdf

QucHHunhnh

Basic Civil Engineering notes first year Notes Building notes Selection of site for Building Layout of a Building What is Burjis, Mutam Building Bye laws Basic Concept of sunlight ventilation in building National Building Code of India Set back or building line Types of Buildings Floor Space Index (F.S.I) Institutional Vs Educational Building Components & function Sills, Lintels, Cantilever Doors, Windows and Ventilators Types of Foundation AND THEIR USES Plinth Area Shallow and Deep Foundation Super Built-up & carpet area Floor Area Ratio (F.A.R) RCC Reinforced Cement Concrete RCC VS PCC

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Denish Jangid

Spellings Wk 3 English CAPS CARES Please Practise

AnaAcapella

Wizards are very useful for creating a good user experience. In all businesses, interactive sessions are most beneficial. To improve the user experience, wizards in Odoo provide an interactive session. For creating wizards, we can use transient models or abstract models. This gives features of a model class except the data storing. Transient and abstract models have permanent database persistence. For them, database tables are made, and the records in such tables are kept until they are specifically erased.

How to Create and Manage Wizard in Odoo 17

Celine George

Graduate Outcomes Presentation Slides - English

neillewis46

Unit-IV; Professional Sales Representative (PSR).pptx

VishalSingh1417

Sociology 101 Demonstration of Learning Exhibit

jbellavia9

Google Gemini An AI Revolution in Education.pptx

Dr. Sarita Anand

Several applications in machine learning and netflix competition

1. Several Applications in Machine Learning and Netflix competition Advisor: Wei-Chang Yeh Student: Jing-Feng Deng Report Date: 12/27/2013

2. 4 Papers in this talk • From the course material of coursera (online NTU ML foundation course) • Provided by Dr. Lin (Dr. 林軒田 )

3. Machine Learning in Food

4. Machine Learning in Food • Sadilek et al. (2013) • Foodborne illness detecting system (nEmesis) • 4 month data collection • 3,800,000 twitter data (2012/12/26~2013/4/25) • NYC • Twitter(+foursquare) Geocoding => GPS location • Twitter data => corpus • Human guided Machine Learning • SVM • Lots of statistics analysis

5. Machine Learning in Food: SVM learning

6. Machine Learning in Food: Top 20 significant Corpus

7. Machine Learning in Clothes • Abu-Mostafa (2012) • Fashion, style, how to wearing • Fashion Recommendation System

8. piqueoutfit Source: http://vimeo.com/34789625

9. Machine Learning in Traffic • Stallkamp et al. (2012) • GTSRB(German Traffic Sign Recognition Benchmark) • 52,000 pictures • 43 diffreent icons

10. Google self-driving cars Source: http://goo.gl/4EUDw

11.

12.

13. Netflix competition • Bell et al. (2009), Abu-Mostafa (2012) • Authors are from AT&T and Yahoo! • Netflix: a company renting movies online • 2007=>8.43% • 2008=>9.63% • 2009=>10.06% • KDDCup

14.

15. The Neighborhood Model

16. The Latent-Factor Approach

17. Reference • Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which Restaurants Should You Avoid Today?. In First AAAI Conference on Human Computation and Crowdsourcing. • Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for teaching computers how to learn are beating the experts. Scientific American, 289(7):78-81, 2012. • R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, 46(5):29–33, 2009. • J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent Transportation Systems 13(4): 1481-1483, 2012. • GTSRB(German Traffic Sign Recognition Benchmark), http://benchmark.ini.rub.de， • KDDCup, http://www.kdd.org/kddcup/index.php

Notas do Editor

Sadilek, A., Brennan, S., Kautz, H., & Silenzio, V. (2013, March). nEmesis: Which Restaurants Should You Avoid Today?. In First AAAI Conference on Human Computation and Crowdsourcing. 這篇是2013年AAAI Conference上講ML(Machine Learning)在「食」上面的實務應用，它週期性收集了四個月twitter上380萬筆的資料（2012/12/26~2013/4/25），然後利用twitter上的geocoding對應到餐廳GPS的位置，並將收集到的twitter拆解成語料(corpus)，利用人力去引導機器學習判別這些twitter中哪些是與因飲食生病相關的（文中說花了近1500美金），自動辨識出哪些餐廳可能有食安問題，作者之一在Google上班。整體的概念很有趣。 <number>
Y. S. Abu-Mostafa. Machines that think for themselves: New techniques for teaching computers how to learn are beating the experts. Scientific American, 289(7):78-81, 2012.這篇是2012年在美國科學人雜誌上的文章，內容提及一丁點 ML在「衣」上的應用，讓作者雖然不懂fashion仍可推薦別人，然後介紹基本ML的概念（supervised, unsupervised, reinforcement learning......），和提到比賽期間為三年的 100 million point of real data的 NetFlix 百萬美金「影片推薦演算法」競賽作者的團隊因晚20分鐘提解（第一和第二名的改善比率皆為10.06%)，結果變成第二名，作者是加州理工學院的教授，影片推薦系統用了SVD（singular value decomposition）的ML strategy。 <number>
J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel: Introduction to the Special Issue on Machine Learning for Traffic Sign Recognition. IEEE Transactions on Intelligent Transportation Systems 13(4): 1481-1483, 2012. 這篇是在講ML 在「行」的應用，用於實務的交通號誌辨識，交通號誌辨識的難度在於受到照明、部份遮蓋、旋轉和天氣狀況等因素影響，符號又有文字、圖示，顏色、形狀的差異，有些又十分相似（如速限標誌）。此Special Issue主要介紹四篇paper，也介紹了2011年IJCNN競賽最後一回合團隊用到的方法（共有超過20隊參加）。 GTSRB(German Traffic Sign Recognition Benchmark) http://benchmark.ini.rub.de，裡面有52000張圖43種不同類型的交通號誌，可作為benchmark dataset ，IJCNN比賽第二名的演算法是基於CNN(Convolutional Neural Network)，其它的方法有SVM, linear discriminant analysis, subspace analysis, ensemble classifiers, slow feature analysis, nearest neighbor classifiers 和random forests（排名第三名的方法是K-D tree+random forests）。文中提到也有人把距當時八年內最新的44種 traffic sign detection algorithms做review和差異分析。 <number>
R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, 46(5):29–33, 2009，這篇是由Netflix 比賽的得獎者寫的，蠻精彩的科普文章，作者來自AT&T Lab和Yahoo!等公司，文中大略地介紹他們的方法。Netflix 是一家著名的線上租電影的公司，它自製了Cinematch系統，可依使用者的喜好去推薦使用者相關口味的片子。公司執行長Reed Hastings考量Netflix小組人力可能無法負荷眾多演算在巨量資料下的效能評測，於是舉辦了Netflix 百萬程式大賽，Netflix會提供十億的ratings資料，來自48萬匿名使用者，1 萬7千部片。Netflix會保留最近300萬筆rating資料，要求參與的競爭者去預測它㥃。Netflix會評估每位競爭者的300萬筆的預測資料，並與真實的rating資料做比較，用RMSE(Root Mean Squared Error)做預測正確性的metric，預測的愈正確，RMSE愈小，分數會立刻回報線上的leaderboard讓所有人看見。這樣巨量的資料集一般一天只能算一次（我找到這裡可下載，666MB http://www.lifecrunch.biz/archives/207），Netflix另外提供一個代表性的資料集讓你可以方便計算。2006年10月2日比賽開始，可惜目前已停辦（http://en.wikipedia.org/wiki/Netflix_Prize）。作者發現Nearest-neighbor在neighbor數少於50時會表現的比較好 Latent-factor model則有相對的弱點，難以偵測一些緊密相關片子（如魔戒三部曲）的強烈關連性(association)，而這二個方法是互補的。第一年(2007)提出的方法是這二個方法組合成的變化型（包含參數tuning）,改善 8.43% (原先比賽的目標是10%)，2008和另一隊合作又得了2008 Progress Prize（9.63%），2009年由The Pragmatic Chaos得獎(10.09%也有寫10.06%)。發現一件有新意的事是，他們發現給幾分不重要，which movies比較重要，而從原本的數值分數，轉成binary，此作法補強了其它方法。 <number>

Several applications in machine learning and netflix competition

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Several applications in machine learning and netflix competition

Semelhante a Several applications in machine learning and netflix competition (20)

Último

Último (20)

Several applications in machine learning and netflix competition

Notas do Editor