4. AXIOMS
• Measure, Measure, Measure
• Garbage in, Garbage out
• Correlation is not Causation
• More Data Beats Cleverer Algorithms
• Algorithms that do better with more data are more interesting
• Independent Sources Of data add new signals
• Feature Engineering is the key to being a good data scientist
• How do machines and Human interplay in Big Data?
• Learn many models ‐ ensembles
• Outliers are always interesting..
13. WORD SENSE DISAMBIGUATION
• Bank
• Sloping Land Alongside a river or a lake. It typically has thick vegetation growing..
• A financial institution that takes deposits from some customers and gives loans to others who require the
money.
To disambiguate in typical sentences look for co‐occurrences of words with words in definition.
Unsupervised Learning. Bootstrap a model.
The pilot landed the plane on the Hudson River amongst several boats and an appreciative audience
cheered from the banks of the river.
He issued a check and took it to the bank so he could transfer money.
Can look for frequent co‐occurrences with each sense of the word (boats and check respectively) and build
a larger bag of words in which to disambiguate.
19. ENSEMBLES ‐ OUTLIERS ARE NOT INTERESTING – FOR
CLASSIFIERS
• Learn many models from random subsets of training data
• Effect of outliers is reduced on a majority of the models
• Random Forests