Kaggle Projects Presentation Sawinder Pal Kaur

•

2 gostaram•2,192 visualizações

Sawinder Pal Kaur

Kagggle Projects - Digit Recognizer and Titanic Disaster

Tecnologia Educação

Outline
 Problem
 Statement
 Methods used
 Results

Problem: Digit Recognizer
 Identify handwritten single digits 0~9, based
on grey scale images.
Sample images

Statement
Each image is 28 pixels in height and 28 pixels in width, for a
total of 784 pixels in total. Each pixel has a single pixel-
value associated with it, indicating the lightness or darkness
of that pixel, with higher numbers meaning darker. This
pixel-value is an integer between 0 and 255, inclusive.
pixel0 pixel1 pixel2 ... pixel27
pixel28 pixel29 pixel30 ... pixel55
| | | ... |
pixel756 pixel757 pixel758 ... pixel783

Statement
 The training data set, has 785 columns. The first
column, called "label", is the digit that was drawn by the
user. The rest of the columns contain the pixel-values of
the associated image.
 The test data set, is the same as the training set, except
that it does not contain the "label" column.
 Goal of the problem is to predict the images in the test
data set

Methods used to solve the
problem
 Random Forest
 Support Vector Machine (SVM)
 K-Nearest Neighborhood (KNN)

Random Forest
 Ensemble of decision trees
 Each tree is trained on a bootstrapped sample of the
original data set
 Each time a node is split, only a randomly chosen subset
of the dimensions are considered for splitting
 Each tree is fully grown and not pruned
 When a new input is entered into the system, it is run down
all of the trees. The result may either be an average or
weighted average of all of the terminal nodes that are
reached, or, in the case of categorical variables, a voting
majority

Support Vector Machine
 In a SVM model original objects (training data) are treated
as a points in the space (input space)
 These are mapped (rearranged) to a new space (feature
space) using mathematical functions called kernels
 After mapping objects of separate categories are divided
by a clear gap as wide as possible

K Nearest Neighborhood
 Basic idea
 If it walks like a duck, quacks like a duck than it is probably a duck
 There are three key elements :
 a set of labeled objects (e.g., a set of stored records)
 a distance or similarity metric to compute distance between objects,
and
 the value of k, the number of nearest neighbors.
 To classify an unlabeled object :
 the distance of this object to the labeled objects is computed,
 its k-nearest neighbors are identified, and
 the class labels of these nearest neighbors are then used to
determine the class label of the object.

Results
 Random Forests with 500 trees gave 97%
accuracy on the test data.
 SVM with RBF kernel and C=1, gave 97.71%
accuracy on the test data.
 KNN with k=10 gave 96% accuracy.

Problem
 The sinking of the RMS Titanic is one of the most
infamous shipwrecks in history.
 One of the reasons that the shipwreck led to such loss
of life was that there were not enough lifeboats for the
passengers and crew. Although there was some
element of luck involved in surviving the sinking, some
groups of people were more likely to survive than
others, such as women, children, and the upper-class.
 In this project, the analysis of what sorts of people
were likely to survive is done. In particular, the tools of
machine learning are applied to predict which
passengers survived the tragedy.

Statement
 The historical data has been split into two
groups, a 'training set' and a 'test set'. For the
training set, the outcome whether or not the
passenger survived the sinking ( 0 for deceased,
1 for survived ) is provided.
 The goal of the problem is to predict the
outcome for each passenger in the test set.

Methods used to solve the
problem
• Random Forest
• Support Vector Machine (SVM)

Results
 Random Forests with 300 trees gave 77.9%
accuracy on the test data.
 SVM with RBF kernel and C=1, gave 77.7%
accuracy on the test data.

Mais conteúdo relacionado

Mais procurados

Customer Segmentation using ClusteringDessy Amirudin

Clustering: A SurveyRaffaele Capaldo

K-Nearest Neighbor ClassifierNeha Kulkarni

Lecture 8: Decision Trees & k-Nearest NeighborsMarina Santini

Introduction to data mining and machine learningTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

KNN West Virginia University

Data clustering GARIMA SHAKYA

Chapter 11 cluster advanced : web and text miningHouw Liong The

Dbm630 lecture09Tokyo Institute of Technology

Pillar k meansswathi b

Cluster Analysis for DummiesVenkata Reddy Konasani

K means Clustering AlgorithmKasun Ranga Wijeweera

K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc

3.3 hierarchical methodsKrish_ver2

KmeansNikita Goyal

ClusteringMeme Hei

Dataa miiningSUBBIAH SURESH

"k-means-clustering" presentation @ Papers We Love BucharestAdrian Florea

A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech

Unsupervised learning clusteringArshad Farhad

Mais procurados (20)

Customer Segmentation using Clustering

Clustering: A Survey

K-Nearest Neighbor Classifier

Lecture 8: Decision Trees & k-Nearest Neighbors

Introduction to data mining and machine learning

KNN

Data clustering

Chapter 11 cluster advanced : web and text mining

Dbm630 lecture09

Pillar k means

Cluster Analysis for Dummies

K means Clustering Algorithm

K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...

3.3 hierarchical methods

Kmeans

Clustering

Dataa miining

"k-means-clustering" presentation @ Papers We Love Bucharest

A Novel Algorithm for Design Tree Classification with PCA

Unsupervised learning clustering

Semelhante a Kaggle Projects Presentation Sawinder Pal Kaur

Machine Learning Algorithms (Part 1)Zihui Li

Application of combined support vector machines in process fault diagnosisDr.Pooja Jain

MLHEP Lectures - day 1, basic trackarogozhnikov

Supervised and unsupervised learningAmAn Singh

Introduction to conventional machine learning techniquesXavier Rafael Palou

机器学习AdaboostShocky1

Anomaly detection using deep one class classifier홍배 김

Analytical study of feature extraction techniques in opinion miningcsandit

Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf

ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar

Lect4sumit621

Text categorizationPhuong Nguyen

EE660_Report_YaxinLiu_8448347171Yaxin Liu

ClassifiersAyurdata

Introduction to Support Vector MachinesSilicon Mentor

Neural networksHarshitGupta367

Machine learning for_financeStefan Duprey

8.clustering algorithm.k means.em algorithmLaura Petrosanu

Event classification & prediction using support vector machineRuta Kambli

Semelhante a Kaggle Projects Presentation Sawinder Pal Kaur (20)

Machine Learning Algorithms (Part 1)

Application of combined support vector machines in process fault diagnosis

MLHEP Lectures - day 1, basic track

Supervised and unsupervised learning

Introduction to conventional machine learning techniques

机器学习Adaboost

Anomaly detection using deep one class classifier

Analytical study of feature extraction techniques in opinion mining

Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...

ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES

Lect4

Text categorization

EE660_Report_YaxinLiu_8448347171

Classifiers

Introduction to Support Vector Machines

Neural networks

Machine learning for_finance

8.clustering algorithm.k means.em algorithm

Event classification & prediction using support vector machine

Último

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Gen AI in Business - Global Trends Report 2024.pdfAddepto

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

A Journey Into the Emotions of Software DevelopersNicole Novielli

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Rise of the Machines: Known As Drones...Rick Flair

Training state-of-the-art general text embeddingZilliz

Take control of your SAP testing with UiPath Test SuiteDianaGray10

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

What is Artificial Intelligence?????????blackmambaettijean

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Kaggle Projects Presentation Sawinder Pal Kaur

1. Sawinder Pal Kaur, PhD Kaggle Projects

2. Outline  Problem  Statement  Methods used  Results

3. Problem: Digit Recognizer  Identify handwritten single digits 0~9, based on grey scale images. Sample images

4. Statement Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel- value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive. pixel0 pixel1 pixel2 ... pixel27 pixel28 pixel29 pixel30 ... pixel55 | | | ... | pixel756 pixel757 pixel758 ... pixel783

5. Statement  The training data set, has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.  The test data set, is the same as the training set, except that it does not contain the "label" column.  Goal of the problem is to predict the images in the test data set

6. Methods used to solve the problem  Random Forest  Support Vector Machine (SVM)  K-Nearest Neighborhood (KNN)

7. Random Forest  Ensemble of decision trees  Each tree is trained on a bootstrapped sample of the original data set  Each time a node is split, only a randomly chosen subset of the dimensions are considered for splitting  Each tree is fully grown and not pruned  When a new input is entered into the system, it is run down all of the trees. The result may either be an average or weighted average of all of the terminal nodes that are reached, or, in the case of categorical variables, a voting majority

8. Random Forest

9. Support Vector Machine  In a SVM model original objects (training data) are treated as a points in the space (input space)  These are mapped (rearranged) to a new space (feature space) using mathematical functions called kernels  After mapping objects of separate categories are divided by a clear gap as wide as possible

10. K Nearest Neighborhood  Basic idea  If it walks like a duck, quacks like a duck than it is probably a duck  There are three key elements :  a set of labeled objects (e.g., a set of stored records)  a distance or similarity metric to compute distance between objects, and  the value of k, the number of nearest neighbors.  To classify an unlabeled object :  the distance of this object to the labeled objects is computed,  its k-nearest neighbors are identified, and  the class labels of these nearest neighbors are then used to determine the class label of the object.

11. Results  Random Forests with 500 trees gave 97% accuracy on the test data.  SVM with RBF kernel and C=1, gave 97.71% accuracy on the test data.  KNN with k=10 gave 96% accuracy.

12. Titanic: Machine Learning from Disaster

13. Problem  The sinking of the RMS Titanic is one of the most infamous shipwrecks in history.  One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.  In this project, the analysis of what sorts of people were likely to survive is done. In particular, the tools of machine learning are applied to predict which passengers survived the tragedy.

14. Statement  The historical data has been split into two groups, a 'training set' and a 'test set'. For the training set, the outcome whether or not the passenger survived the sinking ( 0 for deceased, 1 for survived ) is provided.  The goal of the problem is to predict the outcome for each passenger in the test set.

15. Methods used to solve the problem • Random Forest • Support Vector Machine (SVM)

16. Results  Random Forests with 300 trees gave 77.9% accuracy on the test data.  SVM with RBF kernel and C=1, gave 77.7% accuracy on the test data.

Kaggle Projects Presentation Sawinder Pal Kaur

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Kaggle Projects Presentation Sawinder Pal Kaur

Semelhante a Kaggle Projects Presentation Sawinder Pal Kaur (20)

Último

Último (20)

Kaggle Projects Presentation Sawinder Pal Kaur