18. Knowledge Discovery Process
Understand and define problem
Extract data
Data engineering (understanding and cleansing)
Exploratory data analysis
Data mining (searching for patterns)
Machine learning (for data products)
Communication with stake holders
Repeat until certain goals are accomplished
19. Data Science: Yet Another Viewpoint
Knowledge discovery with more emphases on
big data (with volume and/or variety)
unknown problems
data collection and unification
data product building
business context (i.e., goal => revenue)
20. Data Science is More Than …
Analysis toolbox (e.g., R, Python)
Infrastructure (e.g., Hadoop, NoSQL)
Big data (small data also do)
Data visualization
Statistics / machine learning
21. A Data Scientist?
[1] “A Statistician's View on Big Data and Data Science”, http://www.slideshare.net/kuonen/a-statisticians-view-on-big-data-and-data-science
[2] “Big Data [sorry] & Data Science: What Does a Data Scientist Do?”, http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what- does-a-data-scientist-do
[3] Machine Learning and Data Mining, http://web.cecs.pdx.edu/~mperkows/CLASS_479/LECTURES479/PE013..pdf
29. Final Words of Warning
“Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run,it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.” --Francois Pinard
R
37. 跨領域交流
名牌資訊 & 分類
橘: 貴賓 & 講師
紅: 議程/課程參加者
黑: 黑客松參加者
藍: 媒體
黃: 工作人員
Cross-disciplinary communication is a MUST for data science practitioners. So please just mingle!