SlideShare uma empresa Scribd logo
1 de 41
About Analytics Vidhya
Thanks Tlabs & Venturesity for being a lovely
host…
First things first:
• Meetup frequency – Once every month
• Next meetup – 14th June 2015
• Aim to provide best networking and learning platform in
Bangalore
• Areas of Interest – Data Science, Big Data, Machine Learning,
Meet Your Volunteers
Kunal
Data Science Evangelist,
(Growth) Hacker, Blogger,
Husband, Father
Tavish
Blogger, Problem solver,
Data scientist
Agenda
• Introduction
• Model building – life cycle
• Data Exploration and Feature Engineering methods
• Talk about modelling Techniques like
• Logistic Regression
• Decision Tree
• Random Forest
• SVM
• Predict Man vs. Machine
Introduction
• Name
• Experience in Data Science
• Current Company
• Are you proficient with (SAS/ R/ Python)?
Team creation
• Look for diversity in experience
• Hopefully common toolset, but complementary can also work
• Competing against each other
Team Formation
A few ground rules for today
• This is not a tutorial – you are expected to solve this problem yourself
• We are here to help you, organize your thoughts and to make sure
you are going in the right direction.
• Good question to ask:
• While trying Logistic regression in R, I am facing following error?
• Bad question to ask:
• Help me understand what is Logistic Regression!
• Register on DataHack.io & discuss.analyticsvidhya.com
• Datahack.io – Website for hackathon memberships
• Discuss.analyticsvidhya.com – Discussions for the day & always
• One login for each participant
• Password would be mailed upon registration
• Registration on Kaggle.com
Model building – life cycle
Problem of the day : FB challenge – Human or Robot
OR
Question in detail
Datasets you will have access to
Training Testing
Bids
Submission
Unique at Bid ID
Unique at Bidder ID
Unique at Bidder ID
Hypothesis generation
• In your groups, list down all possible variables, which might influence the
chances of survival of a passenger
• Download the dataset from Kaggle
• Next, look at the dataset and see which variables are available
Make sure you always do this in this order
Data Exploration & Feature Engineering
• Import data set
• Variable identification
• Univariate, Bivariate and Multivariate analysis
• Identify and Treat missing and outlier values
• Create new variables or transform existing variables
Datasets details
Test and Train
Are you a newbie, refer to these guides:
• Import data set (SAS, Python, R)
• Variable identification (Methods, SAS, Python, R)
• Univariate, Bivariate and Multivariate analysis (Methods, SAS, Python, R)
• Identify and Treat missing and outlier values (Missing, Outlier, SAS, Python, R1,
R2)
• Create new variables or transform existing variables (Methods, SAS, Python, R1)
Practice
Explore the data set and share your inferences with the group
Break
Modelling Techniques – Logistic Regression
• Logistic regression is a form of regression analysis in which the outcome variable is binary or
dichotomous
• Used when the focus on whether or not an event occurred, rather than when it occurred
• Here, Instead of modelling the outcome, Y, directly, the method models the log odds(Y) using the
logistic function
• Analysis of variance (ANOVA) and logistic regression all are special cases of General Linear Model
(GLM)
• The probability of success falls between 0 and 1 for all possible values of X
Linear & Logistic Regression
20 30 40 50 60 70
Age
0.0
0.2
0.4
0.6
0.8
1.0
CHDProbability(p)
Predictor (x)
0 20 40 60 80
Age (yrs.)
0
20
40
60
80
100
LengthofStay(days)
Predictor(X)
Y
Y=aX+b
Logit Transformation
Logit is Directly related to Odds
• The logistic model can be written as:
• This implies that the odds for success can be expressed as:
• This relationship is the key to interpreting the coefficients in a logistic regression model !!
Xo
e
P
P 1
1
 


Modelling Techniques – Decision Tree
• Decision tree is a type of supervised learning algorithm
• It works for both categorical and continuous input and output variables
• It is a classification technique that split the population or sample into two or more homogeneous
sets (or sub-populations) based on most significant splitter / differentiator in input variables
Decision Tree - Example
Types of Decision Tree
• Binary Variable Decision Tree: Decision Tree which has binary target variable then it called as
Binary Variable Decision Tree. Example:- In above scenario of student problem, where the target
variable was “Student will play cricket or not” i.e. YES or NO.
• Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called
as Continuous Variable Decision Tree.
Decision Tree - Terminology
Decision Tree – Advantages/ Disadvantages
Advantages:
• Easy to understand
• Useful in data exploration
• Less Data Cleaning required
• Data type is not a constraint
Disadvantages:
• Overfit
• Not fit for continuous variables
• Not Sensitive to Skewed distributions
Modelling Techniques – Random Forest
• “Random Forest“ is an algorithm to perform very intensive calculations.
• Random forest is like a bootstrapping algorithm with Decision tree (CART) model.
• Random forest gives much more accurate predictions when compared to simple CART/CHAID or
regression models in many scenarios.
• It captures the variance of several input variables at the same time and enables high number of
observations to participate in the prediction.
• A different subset of the training data and subset of variables are selected for each tree
• Remaining training data are used to estimate error and variable importance
Random Forest – Advantages/ Disadvantages
Advantages:
• No need for pruning trees
• Accuracy and variable importance generated automatically
• Not very sensitive to outliers in training data
• Easy to set parameters
Disadvantages:
• Over fitting is not a problem
• It is black box, rules behind model building can not be explained
Modelling Techniques – SVM
35
• It is a classification technique.
• Support Vectors are simply the co-ordinates of individual
observation
• Support Vector Machine is a frontier which best segregates the one
class from other
• Solving SVMs is a quadratic programming problem
• Seen by many as the most successful current text classification
method
Sec. 15.1
Support vectors
Case Study
36
Sec. 15.1
We have a population of 50%-50% Males and
Females. Here, we want to create some set of rules
which will guide us the gender class for rest of the
population.
The blue circles in the plot represent females and
green squares represents male.
 Males in our population have a higher average
height.
 Females in our population have longer scalp
hairs.
Case Study – How to find right SVM
37
Sec. 15.1
Here we have three possible frontiers. Decide which
one is best.
Methods:
• Find the minimum distance of the frontier from
closest support vector (this can belong to any
class).
• Choose the frontier with the maximum distance
from the closest support vector. In this case, it is
black frontier with 15 unit distance.
Predict Robots out of all the bidders
Perform prediction for robot prediction
Python Resources:
Python:
• http://www.bigdataexaminer.com/dealing-with-unbalanced-classes-svm-random-forests-and-
decision-trees-in-python/
• http://nbviewer.ipython.org/github/justmarkham/gadsdc1/blob/master/logistic_assignment/kevi
n_logistic_sklearn.ipynb
• http://scikit-learn.org/stable/modules/svm.html
• http://scikit-learn.org/stable/modules/tree.html
• http://blog.yhathq.com/posts/random-forests-in-python.html
R Resources:
R:
• http://www.ats.ucla.edu/stat/r/dae/logit.htm
• http://www.cookbook-r.com/Statistical_analysis/Logistic_regression/
• http://www.rdatamining.com/examples/decision-tree
• http://www.statmethods.net/advstats/cart.html
• http://www.cair.org/conferences/cair2013/pres/58_Headstrom.pdf
• http://blog.yhathq.com/posts/comparing-random-forests-in-python-and-r.html
• http://www.louisaslett.com/Courses/Data_Mining/ST4003-Lab7-Introduction_to_Support_Vector_Machines.pdf
• http://thinktostart.com/build-a-spam-filter-with-r/
• http://cbio.ensmp.fr/~jvert/svn/tutorials/practical/svmbasic/svmbasic_notes.pdf
Thanks

Mais conteúdo relacionado

Mais procurados

Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryCSIRO
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...Eirini Ntoutsi
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 

Mais procurados (20)

Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response Theory
 
machine learning
machine learningmachine learning
machine learning
 
Problem solving
Problem solvingProblem solving
Problem solving
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Genetic algo
Genetic algoGenetic algo
Genetic algo
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Quality control tools
Quality control toolsQuality control tools
Quality control tools
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 

Destaque

Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
Lecture11
Lecture11Lecture11
Lecture11zukun
 
Random Field Theory in Functional Imaging
Random Field Theory in Functional ImagingRandom Field Theory in Functional Imaging
Random Field Theory in Functional ImagingJean-Etienne Poirrier
 
Smart Grid Security by Falgun Rathod
Smart Grid Security by Falgun RathodSmart Grid Security by Falgun Rathod
Smart Grid Security by Falgun RathodClubHack
 
Decision tree based classifier for mammogram image classification
Decision tree based classifier for mammogram image classificationDecision tree based classifier for mammogram image classification
Decision tree based classifier for mammogram image classificationmohamed khalaf alla mohamedain
 
A secure cloud computing based framework for big data information management ...
A secure cloud computing based framework for big data information management ...A secure cloud computing based framework for big data information management ...
A secure cloud computing based framework for big data information management ...Nexgen Technology
 
Smart grid paper presentation
Smart grid paper presentationSmart grid paper presentation
Smart grid paper presentationUtsav Yagnik
 
International conference power point presentation Skopje 2014
International conference power point presentation Skopje 2014International conference power point presentation Skopje 2014
International conference power point presentation Skopje 2014Karolina Nedelkovska
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersPier Luca Lanzi
 
energy theft detection
energy theft detectionenergy theft detection
energy theft detectionabdulsuboor235
 
Paper presentation held at national seminar
Paper presentation held at national seminarPaper presentation held at national seminar
Paper presentation held at national seminarKrishna Kumar
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentationsapdh1312
 

Destaque (20)

Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Lecture11
Lecture11Lecture11
Lecture11
 
Random Field Theory in Functional Imaging
Random Field Theory in Functional ImagingRandom Field Theory in Functional Imaging
Random Field Theory in Functional Imaging
 
Smart Grid Security by Falgun Rathod
Smart Grid Security by Falgun RathodSmart Grid Security by Falgun Rathod
Smart Grid Security by Falgun Rathod
 
EMMA IEEE
EMMA IEEEEMMA IEEE
EMMA IEEE
 
Decision tree based classifier for mammogram image classification
Decision tree based classifier for mammogram image classificationDecision tree based classifier for mammogram image classification
Decision tree based classifier for mammogram image classification
 
A secure cloud computing based framework for big data information management ...
A secure cloud computing based framework for big data information management ...A secure cloud computing based framework for big data information management ...
A secure cloud computing based framework for big data information management ...
 
Smart grid paper presentation
Smart grid paper presentationSmart grid paper presentation
Smart grid paper presentation
 
Svm my
Svm mySvm my
Svm my
 
AI IEEE
AI IEEEAI IEEE
AI IEEE
 
Data mining
Data miningData mining
Data mining
 
FACTS IEEE
FACTS IEEEFACTS IEEE
FACTS IEEE
 
International conference power point presentation Skopje 2014
International conference power point presentation Skopje 2014International conference power point presentation Skopje 2014
International conference power point presentation Skopje 2014
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
 
Conference ppt
Conference pptConference ppt
Conference ppt
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
 
energy theft detection
energy theft detectionenergy theft detection
energy theft detection
 
Paper presentation held at national seminar
Paper presentation held at national seminarPaper presentation held at national seminar
Paper presentation held at national seminar
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentations
 
Ieee slide format
Ieee slide formatIeee slide format
Ieee slide format
 

Semelhante a Mini datathon - Bengaluru

Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelHiram Ting
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...DurgaDevi310087
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Farhan Khan
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...Sandip Chatterjee
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptxDOUGLASBILLY
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxCasylouMendozaBorqui
 

Semelhante a Mini datathon - Bengaluru (20)

Mini datathon
Mini datathonMini datathon
Mini datathon
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
SQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMiningSQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMining
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Turning Information chaos into reliable data
Turning Information chaos into reliable dataTurning Information chaos into reliable data
Turning Information chaos into reliable data
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
 

Último

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Mini datathon - Bengaluru

  • 1.
  • 3. Thanks Tlabs & Venturesity for being a lovely host…
  • 4.
  • 5. First things first: • Meetup frequency – Once every month • Next meetup – 14th June 2015 • Aim to provide best networking and learning platform in Bangalore • Areas of Interest – Data Science, Big Data, Machine Learning,
  • 6. Meet Your Volunteers Kunal Data Science Evangelist, (Growth) Hacker, Blogger, Husband, Father Tavish Blogger, Problem solver, Data scientist
  • 7. Agenda • Introduction • Model building – life cycle • Data Exploration and Feature Engineering methods • Talk about modelling Techniques like • Logistic Regression • Decision Tree • Random Forest • SVM • Predict Man vs. Machine
  • 8. Introduction • Name • Experience in Data Science • Current Company • Are you proficient with (SAS/ R/ Python)?
  • 9. Team creation • Look for diversity in experience • Hopefully common toolset, but complementary can also work • Competing against each other
  • 11. A few ground rules for today • This is not a tutorial – you are expected to solve this problem yourself • We are here to help you, organize your thoughts and to make sure you are going in the right direction. • Good question to ask: • While trying Logistic regression in R, I am facing following error? • Bad question to ask: • Help me understand what is Logistic Regression!
  • 12.
  • 13. • Register on DataHack.io & discuss.analyticsvidhya.com • Datahack.io – Website for hackathon memberships • Discuss.analyticsvidhya.com – Discussions for the day & always • One login for each participant • Password would be mailed upon registration • Registration on Kaggle.com
  • 14. Model building – life cycle
  • 15. Problem of the day : FB challenge – Human or Robot
  • 17. Datasets you will have access to Training Testing Bids Submission Unique at Bid ID Unique at Bidder ID Unique at Bidder ID
  • 18. Hypothesis generation • In your groups, list down all possible variables, which might influence the chances of survival of a passenger • Download the dataset from Kaggle • Next, look at the dataset and see which variables are available Make sure you always do this in this order
  • 19. Data Exploration & Feature Engineering • Import data set • Variable identification • Univariate, Bivariate and Multivariate analysis • Identify and Treat missing and outlier values • Create new variables or transform existing variables
  • 21. Are you a newbie, refer to these guides: • Import data set (SAS, Python, R) • Variable identification (Methods, SAS, Python, R) • Univariate, Bivariate and Multivariate analysis (Methods, SAS, Python, R) • Identify and Treat missing and outlier values (Missing, Outlier, SAS, Python, R1, R2) • Create new variables or transform existing variables (Methods, SAS, Python, R1)
  • 22. Practice Explore the data set and share your inferences with the group
  • 23. Break
  • 24. Modelling Techniques – Logistic Regression • Logistic regression is a form of regression analysis in which the outcome variable is binary or dichotomous • Used when the focus on whether or not an event occurred, rather than when it occurred • Here, Instead of modelling the outcome, Y, directly, the method models the log odds(Y) using the logistic function • Analysis of variance (ANOVA) and logistic regression all are special cases of General Linear Model (GLM) • The probability of success falls between 0 and 1 for all possible values of X
  • 25. Linear & Logistic Regression 20 30 40 50 60 70 Age 0.0 0.2 0.4 0.6 0.8 1.0 CHDProbability(p) Predictor (x) 0 20 40 60 80 Age (yrs.) 0 20 40 60 80 100 LengthofStay(days) Predictor(X) Y Y=aX+b
  • 27. Logit is Directly related to Odds • The logistic model can be written as: • This implies that the odds for success can be expressed as: • This relationship is the key to interpreting the coefficients in a logistic regression model !! Xo e P P 1 1    
  • 28. Modelling Techniques – Decision Tree • Decision tree is a type of supervised learning algorithm • It works for both categorical and continuous input and output variables • It is a classification technique that split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables
  • 29. Decision Tree - Example
  • 30. Types of Decision Tree • Binary Variable Decision Tree: Decision Tree which has binary target variable then it called as Binary Variable Decision Tree. Example:- In above scenario of student problem, where the target variable was “Student will play cricket or not” i.e. YES or NO. • Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as Continuous Variable Decision Tree.
  • 31. Decision Tree - Terminology
  • 32. Decision Tree – Advantages/ Disadvantages Advantages: • Easy to understand • Useful in data exploration • Less Data Cleaning required • Data type is not a constraint Disadvantages: • Overfit • Not fit for continuous variables • Not Sensitive to Skewed distributions
  • 33. Modelling Techniques – Random Forest • “Random Forest“ is an algorithm to perform very intensive calculations. • Random forest is like a bootstrapping algorithm with Decision tree (CART) model. • Random forest gives much more accurate predictions when compared to simple CART/CHAID or regression models in many scenarios. • It captures the variance of several input variables at the same time and enables high number of observations to participate in the prediction. • A different subset of the training data and subset of variables are selected for each tree • Remaining training data are used to estimate error and variable importance
  • 34. Random Forest – Advantages/ Disadvantages Advantages: • No need for pruning trees • Accuracy and variable importance generated automatically • Not very sensitive to outliers in training data • Easy to set parameters Disadvantages: • Over fitting is not a problem • It is black box, rules behind model building can not be explained
  • 35. Modelling Techniques – SVM 35 • It is a classification technique. • Support Vectors are simply the co-ordinates of individual observation • Support Vector Machine is a frontier which best segregates the one class from other • Solving SVMs is a quadratic programming problem • Seen by many as the most successful current text classification method Sec. 15.1 Support vectors
  • 36. Case Study 36 Sec. 15.1 We have a population of 50%-50% Males and Females. Here, we want to create some set of rules which will guide us the gender class for rest of the population. The blue circles in the plot represent females and green squares represents male.  Males in our population have a higher average height.  Females in our population have longer scalp hairs.
  • 37. Case Study – How to find right SVM 37 Sec. 15.1 Here we have three possible frontiers. Decide which one is best. Methods: • Find the minimum distance of the frontier from closest support vector (this can belong to any class). • Choose the frontier with the maximum distance from the closest support vector. In this case, it is black frontier with 15 unit distance.
  • 38. Predict Robots out of all the bidders Perform prediction for robot prediction
  • 39. Python Resources: Python: • http://www.bigdataexaminer.com/dealing-with-unbalanced-classes-svm-random-forests-and- decision-trees-in-python/ • http://nbviewer.ipython.org/github/justmarkham/gadsdc1/blob/master/logistic_assignment/kevi n_logistic_sklearn.ipynb • http://scikit-learn.org/stable/modules/svm.html • http://scikit-learn.org/stable/modules/tree.html • http://blog.yhathq.com/posts/random-forests-in-python.html
  • 40. R Resources: R: • http://www.ats.ucla.edu/stat/r/dae/logit.htm • http://www.cookbook-r.com/Statistical_analysis/Logistic_regression/ • http://www.rdatamining.com/examples/decision-tree • http://www.statmethods.net/advstats/cart.html • http://www.cair.org/conferences/cair2013/pres/58_Headstrom.pdf • http://blog.yhathq.com/posts/comparing-random-forests-in-python-and-r.html • http://www.louisaslett.com/Courses/Data_Mining/ST4003-Lab7-Introduction_to_Support_Vector_Machines.pdf • http://thinktostart.com/build-a-spam-filter-with-r/ • http://cbio.ensmp.fr/~jvert/svn/tutorials/practical/svmbasic/svmbasic_notes.pdf