Why we really need big data? Can't things work with small data too?
1. MAKING BIG DATA COME ALIVE
Why is big data fundamentally a replacement for the lack of a
good algorithm? And why is this a good thing?
Danko Nikolić, Prof. Dr., Senior Data Scientist
17. November 2017
3. 3
Textbooks make you believe
that a set of tools will cover it
all.
“You just needs to select the
right one.”
4. 4
Rarely will an off-the-shelf model be
outright optimal for a real-life
problem.
5. 5
Correction: a data scientist creates a model.
Misconception: a data scientist applies a model.
6. 6
Commonly used specialization tool: data wrangling
+ feature engineering.
Feature engineering extracts from the data what is important (the signal!) and in a
way that is suitable for an off-the-shelf model. Example:
Equations for
data wrangling
Data
Neural net + Specific wrangling steps -> form together a highly specialized
model.
Here, data wrangling plays a role similar to that of convolution in deep
neural nets.
Less thought may be needed to apply a
neural net. This is because neural net
alone provides an eclectic
algorithm/architecture.
+
Extensive thought
given to data
wrangling and
feature engineer.
13. 13
Relative contributions to model’s knowledge
Highly
specialized
architecture/alg
orithm
“Small”
data
This is the ratio
we prefer.
Eclectic
architecture
Big
Data
This tradeoff is
often successful.
14. 14
high training effort,
lower performance
Specialized
model
low training effort,
often high performance
Eclecticmodel
FastlearnersSlowlearners
Doing something
wrong?
Laws of
physics
Linear
regression
Deep
learning
Genetic
algo-
rithms
SVM
Decision
tree
Random
forest
Naïve
Bayes
the
black
triangle
of fantasy
The slope of
optimal model
application