4. What Problem is AI Solving Today
Input
Emails
Images
Audio
Chinese (你妹)
Text
Response
Is it a Spam? (0/1)
What is it? (1, …, 100)
Text
English (Your Sister)
Audio
5. “The massive economic value of AI
today is driven by supervised
learning.”
- Andrew Ng
8. AI vs. Machine Learning
vs. Deep Learning
Artificial Intelligence - Machine thinks, talks,
and behaves as human.
Machine Learning - Computer makes decision
without being explicitly programmed.
Deep Learning - A network of multi-layer non-
linear processing unit capable of adapting
itself to new data.
9. “AI problem is a Data Problem. The
more data, the merrier.”
- Raymond Fu
10. Machine Learning vs. Statistics
Machine Learning
Goal: “learning” from data of all sorts
No assumptions about data distributions
Generalization is through training,
validation and test datasets
Tolerant of redundant features.
Does not promote data reduction prior to
learning.
Statistics
Goal: Analyzing and summarizing data
Tight assumptions about data
distributions
Generalization is pursued using statistical
tests on the training dataset.
Preferable to use less input features
Promotes data reduction as much as
possible before modeling
12. Dataset Labeling
Labeled data is a group of samples with one specific meaning or tag.
● Label an image with objects in it.
● Label an X-ray photo with whether or not the patient has
certain disease.
● Join datasets that may correlate with each other.
13. Big Data Engineering
1. Data Cleansing: Create both better features and better
labels
2. Self Service Analytics: Give data analyst tools to easily
prepare their data
3. Data Storage: Build performance and cost efficient data
storage strategy.
4. Streaming: Fast data feed + AI = Fast decision making.
Differences in Goal: let the machine learn vs. give a fact to human so human can make a decision.
Difference is methodology: Reduction of data for Statistics: reduction in two directions, number of data, which is sampling, and number of features, which is to simplify.