8. Confidential8
Machine Learning Interpretation
8
Goal:
Get a general understanding of how
the model works
Example:
What were the most important
features to predict default?
Techniques:
● Variable Importance
● Partial Dependency Plots
● Surrogate simple decision tree
Global
Goal:
Understand why a particular record
received that prediction
Example:
Why did the model predict such a
high score for this customer
Techniques:
● K-LIME
● Leave One Covariate Out
(LOCO)
Local
13. Confidential13
BYOR - Text Preprocessing
1
3
https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_preprocessing_transformer.py
● Lower casing
● Removal of Punctuations
● Removal of Stopwords
● Removal of Frequent words
● Removal of Rare words
● Stemming
● Lemmatization
● Removal of emojis
● Removal of emoticons
● Conversion of emoticons to words
● Conversion of emojis to words
● Removal of URLs
● Removal of HTML tags
● Chat words conversion
● Spelling correction
https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing
14. Confidential14
BYOR - Text Meta Features
1
4
https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_meta_transformers.py
● Number of words
● Number of characters
● Number of unique words
● Number of uppercase words
● Number of numerics
● Number of punctuations
● Mean word length
15. Confidential15
BYOR - Text Readability Features
1
5
https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_readability_transformers.py
● Syllable count
● Poly-syllable count
● Average syllables per word
● Smog index
● Flesch reading ease score
https://github.com/shivam5992/textstat
16. Confidential16
BYOR - Text Features
1
6
● Text Sentiment
○ Get the sentiment scores from pretrained models / packages
● Language Detection
○ To detect the language of the given text
● POS Tagging
○ Count of nouns, verbs, adjectives etc
● Topic Modeling
○ To do topic modeling and use the topics as features
● Text Summarization
○ Summarize the text and use that as new feature
17. Confidential17
BYOR - Text Similarity Features
1
7
● N-gram similarity
○ Count of common n-grams
○ Jaccard similarity
○ Dice similarity
○ Edit distance similarity
● Fuzzy Similarity
○ Partial ratio
○ Token set ratio
○ Token sort ratio
● Embedding similarity
○ Glove
○ Fasttext
○ BERT
19. Confidential19
We are hiring!
1
9
Data Scientist / Engineers in Customer Solutions / Support team
● Experience in data science / engineering field with strong basics
● Exposure to big data technologies (spark/hadoop), cloud/docker etc
● Strong in one or more Python / R / Java / Scala and ofcourse Linux (shell scripting)
● Hands on, highly motivated and can keep up with our startup pace (ie a fast learner)
Please mail to:
sairaam.varadarajan@h2o.ai
sudalai.rajkumar@h2o.ai