Wapid and wobust active online machine leawning with Vowpal Wabbit

Wapid and wobust active online machine leawning with Vowpal Wabbit
Pycon Finland 2014, Helsinki
2014-10-27
Antti Haapala
antti@anttipatterns.com

Disclaimer
● IANAS – I Am Not A Statistician
● I researched the principles on how this works
for this presentation

Why did I start to do ML?
● Task
– Receive social media content from various sources
– Filter out all messages that are not in English, are press
releases or outright spam.
● Easy, when you can hire a team of people for just this task...
● But people are expensive compared to computers...
– And filtering messages is tedious work
● Clearly a little machine learning could help us to
separate the spam from sausage, eggs and ham.

Time to code
● Write a binary classifier
● But with what?
– How does one even do it?

Libraries: Scikit-Learn
● NLTK
– Has some pure Python classifier implementations
– These algorithms require all data in memory
– The speed is an issue here
● Some of them are too slow
● The rest are even slower

Libraries: Scikit-Learn
● Scikit-Learn
– Better than NLTK
– Though most algorithms require all data in memory
● And our data still does not fit
– There are some out-of-core algorithms yes, but
they're not clearly documented
– Still slow - we cannot afford to reevaluate our
classifiers for hours...

Possible libraries
● How about FANN, Orange, PyMC, PyML,
LIBSVM, PyBrain, ffnet, MDP, Shogun toolbox,
Theano, mlpy, Elefant, Bayes Blocks, Monte
Python, hcluster, Plearn, Pycplex, pymorph....

Asking does not hurt
“Have you tried using Vowpal Wabbit?”
“Vowwhat?”
“Vowpal Wabbit”

What is Vowpal Wabbit?
A research project with the most Pythonic name
ever

• John Langford: I'd like to solve AI.
• Interviewer: How?
• John: I want to use parallel learning algorithms
to create fantastic learning machines!

“VW is the essence of speed in machine learning,
able to learn from terafeature datasets with ease.”

“Via parallel learning, it can exceed the
throughput of any single machine network
interface when doing linear learning, a first
amongst learning algorithms.”

Built for speed and scalability
● “Plausibly the most scalable public linear
learner, and plausibly the most scalable
anywhere”
● Excels on the network though impressive
performance even on a single node.

Vowpal Wabbit compared to scikit-learn
The algorithms where the cheatsheet says “>
100k samples”

Scalability
● Find a good linear predictor
f w (x )=Σi
– For 2,100,000,000,000 features...
– 17,000,000,000 examples...
– 16,000,000 parameters...
– Using 1,000 nodes...
wi xi
● Finished in 70 minutes, at 500M features per second
● That was years ago, using the then stock build of
VW.

Open Source
● Vowpal Wabbit is open source, under BSD
license
● Exists even in Ubuntu universe repository
● The project was started by Yahoo Research,
currently under Microsoft Research.
– So even Windows will be supported...

Sparse Stochastic
Gradient Descent
● Maps all inputs to n-dimensional space
● And divides the space by one hyperplane
minimizing the loss caused by wrong
classification
– One class is on one side of the plane
– The other is on the other side of the plane
– The loss is modeled by a loss function

Stochastic Gradient Descent
Image from Scikit-Learn

Which loss function for a classifier?
● Crash course in statistics:
– “It helps if you understand the data”
– “But if you don't then try logistic regression”
– Thus go for the logistic loss function

Multiclass classifier
● Vowpal Wabbit supports various methods for
multiclass classification, read on documentation
how to use them.

Least squares regression
● The gradient descent algorithm can also be used
for regression, for example using the “squared”
loss function for least squares.
● A regression predicts the real number value for
the input that is dependent on the given features
● A classifier gives a class for the input, and
possibly the probability for input belonging to that
class

Classifier output
in logistic regression
● With Vowpal Wabbit the prediction value given
by a classifier with logistic loss is in range [-50,
50]
● You can map this to a binary probability using
the logistic function

From prediction to probability
p=
1
1+e−x

Common practices of machine
learning
● Reduce the number of features by hand
guessing which features are relevant
● Use non-linear approaches such as the kernel
trick
● Map your features to integers
● Leave your computer on at night to build the
model from your training data

... become don'ts
● Reduce the number of features by hand
guessing which features are relevant
● Use non-linear approaches such as the kernel
trick
● Map your features to integers
● Leave your workstation on at night to build the
model from your training data

Reduce the number of features
● Vowpal Wabbit can handle sparse featuresets
having millions of features efficiently

Use non-linear approaches
● Sparse dataset with many dimensions yields
comparative results to using fewer features with
kernel tricks
● One can ask Vowpal Wabbit to generate new
features as the Cartesian product of existing
features, using namespaces:
– That is, given features u^a, u^b, v^c, and v^d, by
using command line parameter -q uv, VW can make
u^a^v^c and so forth.

Map your features to integers
● Vowpal Wabbit hashes feature names to
integers internally using Murmur hash v3
● The downside of hashing are the possible
collisions for too many features
– H(“Nigerian prince”) = H(“job interview”)
● Though it also decreases the possibility of
overfitting

Fit the model at night
● Vowpal Wabbit supports online and active
learning.
● Most learning tasks are IO-, not CPU-bound
● That is to mean, your feature extraction code
will be the bottleneck.

Supervised Learning
● Training
Label
Input FFeeaatuturree e exxtrtraacctotorr Features
● Prediction
Machine
Learning
algorithm
Machine
Learning
algorithm
MMooddeell
Label

Offline vs Online learning
● In offline learning the model is fed all the input, after
which it is finalized; the finalized model will be used for
predictions
– That is, teach the classifier all kinds of unwanted messages
before actual use, and use the resulting classifier for 10 years.
→ Certainly not going to work.
● In online learning, the model can be used for predictions
right after the first input
– The model will gradually converge towards better classification

Semisupervised learning – active
learning
● Asking for input for classifier is expensive
– If one asks to label all given examples, it is almost
even worse as not asking at all
● The solution is active learning

Active learning
● Train only if importance >= threshold
Label
Machine
Learning
algorithm
Machine
Learning
algorithm
● Prediction
MMooddeell
Label
Importance

How to use Vowpal Wabbit
● You can use it on the command line. To teach a
model using logistic regression:
% cat train.txt
-1 |t nigerian prince offers money ... |a user@example.com
1 |t invite job interview ... |a boss@dreamco.com
...
% vw -d train.txt --loss_function=logistic -f model.vw
● To test
% vw -i model.vw --loss_function=logistic -p /dev/stdout
|t nigerian prince interview
-0.145824
|t spam ham and eggs |a boss@dreamco.com
0.134225

How to use VW in Python
● Multiple libraries exist
– Though none of the APIs are to my liking
– So I wrote my own
from caerbannog import Rabbit

Wapid and wobust active online machine leawning with Vowpal Wabbit

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Wapid and wobust active online machine leawning with Vowpal Wabbit

Semelhante a Wapid and wobust active online machine leawning with Vowpal Wabbit (20)

Último

Último (20)

Wapid and wobust active online machine leawning with Vowpal Wabbit