K - Nearest neighbor ( KNN )

•

5 gostaram•5,852 visualizações

Machine learning algorithm ( KNN ) for classification and regression . Lazy learning , competitive and Instance based learning.

Software

K - Nearest Neighbours
Classification , Regression
Khan

Introduction
KNN
K - Nearest neighbors is a
lazy learning instance based
classification( regression )
algorithm which is widely
implemented in both
supervised and unsupervised
learning techniques.

Nearest Neighbors Techniques
UnSupervised Learning
● Manifold learning
● Spectral clustering
Supervised Learning
Classification / Regression
● K nearest neighbor
● Radius neighbor

It is lazy Learner as it doesn't learn from a discriminative function
from training data but memorizes training dataset.
This technique implements classification by considering majority of
vote among the “k” closest points to the unlabeled data point.
It works on unseen data and will search through the training dataset
for the k-most similar instances.
Euclidean distance / Hamming distance is used as metric for
calculating the distance between points.
Principle of KNN - Classifier

The Euclidean distance between two points in the plane with
coordinates (x, y) and (a, b) is given by
dist((x, y), (a, b)) = √((x - a)² + (y - b)²))
Hamming distance between data of equal length is the number of
positions at which the corresponding character are different.
oneforone oneandone → 3
11010110110 11000111110 → 2
Euclidean distance / Hamming distance

K Nearest Neighbor
Green circle is the unlabeled
data point
k=3 in this problem
● Closest 3 points taken
● 2 are red 1 is blue
● Votes = 2Red > 1Blue
● Green circle is a red
triangle.K = 3

K Nearest Neighbor
Green circle is the unlabeled
data point
k=5 in this problem
● Closest 5 points taken
● 2 are red 3 are blue
● Votes = 2Red < 3Blue
● Green circle is a Blue
square.
K = 5

This implements learning based on the number of neighbors within
a fixed radius of each training point.
RadiusNeighborsClassifier can be a better choice For
high-dimensional parameter spaces
The radius floating point is provided by the user for taking the
Points into consideration.
This method becomes less effective due to “curse of
dimensionality”.
RadiusNeighborsClassifier

RadiusNeighborsClassifier
● Radius = 2 units
● Within radius
○ 9 blue dots
○ 10 purple dots
● Black dot is
predicted to be a
purple dot as per
votes.
2 units

Influence of K on prediction
K = 1 → perfect classification with overfitting
K = ∞ → entire classification becomes single class

Choosing value of “ K “
k should be large so that error rate is minimized
k too small will lead to noisy decision boundaries
k should be small enough so that only nearby samples are included
k too large will lead to over-smoothed boundaries
Setting K to the square root of the number of training samples can lead
to better results.
No Of features = 20
K = √20 = 4.4 ~ 4

K values vs Validation error curve
Minimum error
Best value of K

Pros :
● Non complex and Very easy to understand and implement.
● Useful for non linear data as No assumptions about data.
● High accuracy (relatively), but not competitive compared to
Supervised learning algorithms.
● Can be used both for classification or regression.
● Best used where where the probability distribution is unknown

Cons :
● Computationally expensive.
● Lot of space is consumed as all the data points are stored .
● Sensitive to irrelevant features and the scale of the data.
● Output purely depends on K value chosen by user which can
reduce accuracy for some values.

Applications :
1. Recommender Systems
2. Medicine
3. Finance
4. Text mining
5. Agriculture

Let’s code now
Data used : Iris from Sklearn
Plots : Matplotlib
K values taken - 1, 3 , 10 , 150
File : knnKpara.py
Link to code : Click here for code

Mais conteúdo relacionado

Mais procurados

Presentation on K-Means ClusteringPabna University of Science & Technology

Introduction to Machine Learning ClassifiersFunctional Imperative

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

Classification techniques in data miningKamal Acharya

Machine Learning with Decision treesKnoldus Inc.

Naive bayesAshraf Uddin

Random forestMusa Hawamdah

Feature selection concepts and methodsReza Ramezani

K means Clustering AlgorithmKasun Ranga Wijeweera

Feature selectionDong Guo

Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony

Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn

05 Clustering in Data MiningValerii Klymchuk

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn

Feature selectiondkpawar

Decision trees in Machine Learning Mohammad Junaid Khan

supervised learningAmar Tripathi

PerceptronNagarajan

Knn Algorithm presentationRishavSharma112

Mais procurados (20)

Presentation on K-Means Clustering

Introduction to Machine Learning Classifiers

Classification Based Machine Learning Algorithms

Classification techniques in data mining

Machine Learning with Decision trees

Naive bayes

Random forest

Feature selection concepts and methods

K means Clustering Algorithm

Feature selection

Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...

Logistic Regression | Logistic Regression In Python | Machine Learning Algori...

05 Clustering in Data Mining

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...

Feature selection

Decision trees in Machine Learning

supervised learning

Perceptron

Knn Algorithm presentation

Semelhante a K - Nearest neighbor ( KNN )

Knn 160904075605-convertedrameswara reddy venkat

KNN ClassifierMobashshirur Rahman 👲

K- Nearest Neighbor ApproachKumud Arora

ML MODULE 4.pdfShiwani Gupta

Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya

KNN.pptxMohamed Essam

KnnAmir Shokri

k-Nearest Neighbors with brief explanation.pptxgamingzonedead880

K-Nearest Neighbor(KNN)Abdullah al Mamun

Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...IRJET Journal

Machine Learning Algorithm - KNNKush Kulshrestha

Data mining with wekaHein Min Htike

Data analysis of weather forecastingTrupti Shingala, WAS, CPACC, CPWA, JAWS, CSM

CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1

Lecture 8Zahra Amini

Machine learning in science and industry — day 1arogozhnikov

Master defense presentation 2019 04_18_rev2Hyun Wong Choi

K-NN K-Nearest Neighbors Algorithm.pptxssuser2624f71

3.2 partitioning methodsKrish_ver2

Instance based learningswapnac12

Semelhante a K - Nearest neighbor ( KNN ) (20)

Knn 160904075605-converted

KNN Classifier

K- Nearest Neighbor Approach

ML MODULE 4.pdf

Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...

KNN.pptx

Knn

k-Nearest Neighbors with brief explanation.pptx

K-Nearest Neighbor(KNN)

Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...

Machine Learning Algorithm - KNN

Data mining with weka

Data analysis of weather forecasting

CLUSTER ANALYSIS ALGORITHMS.pptx

Lecture 8

Machine learning in science and industry — day 1

Master defense presentation 2019 04_18_rev2

K-NN K-Nearest Neighbors Algorithm.pptx

3.2 partitioning methods

Instance based learning

Último

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Right Money Management App For Your Financial GoalsJhone kinadey

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Software Quality Assurance Interview QuestionsArshad QA

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

5 Signs You Need a Fashion PLM Software.pdfWave PLM

K - Nearest neighbor ( KNN )

1. K - Nearest Neighbours Classification , Regression Khan

2. Introduction KNN K - Nearest neighbors is a lazy learning instance based classification( regression ) algorithm which is widely implemented in both supervised and unsupervised learning techniques.

3. Nearest Neighbors Techniques UnSupervised Learning ● Manifold learning ● Spectral clustering Supervised Learning Classification / Regression ● K nearest neighbor ● Radius neighbor

4. It is lazy Learner as it doesn't learn from a discriminative function from training data but memorizes training dataset. This technique implements classification by considering majority of vote among the “k” closest points to the unlabeled data point. It works on unseen data and will search through the training dataset for the k-most similar instances. Euclidean distance / Hamming distance is used as metric for calculating the distance between points. Principle of KNN - Classifier

5. The Euclidean distance between two points in the plane with coordinates (x, y) and (a, b) is given by dist((x, y), (a, b)) = √((x - a)² + (y - b)²)) Hamming distance between data of equal length is the number of positions at which the corresponding character are different. oneforone oneandone → 3 11010110110 11000111110 → 2 Euclidean distance / Hamming distance

6. K Nearest Neighbor Green circle is the unlabeled data point k=3 in this problem ● Closest 3 points taken ● 2 are red 1 is blue ● Votes = 2Red > 1Blue ● Green circle is a red triangle.K = 3

7. K Nearest Neighbor Green circle is the unlabeled data point k=5 in this problem ● Closest 5 points taken ● 2 are red 3 are blue ● Votes = 2Red < 3Blue ● Green circle is a Blue square. K = 5

8. This implements learning based on the number of neighbors within a fixed radius of each training point. RadiusNeighborsClassifier can be a better choice For high-dimensional parameter spaces The radius floating point is provided by the user for taking the Points into consideration. This method becomes less effective due to “curse of dimensionality”. RadiusNeighborsClassifier

9. RadiusNeighborsClassifier ● Radius = 2 units ● Within radius ○ 9 blue dots ○ 10 purple dots ● Black dot is predicted to be a purple dot as per votes. 2 units

10. Influence of K on prediction K = 1 → perfect classification with overfitting K = ∞ → entire classification becomes single class

11. K = 3 K = 10

12. Choosing value of “ K “ k should be large so that error rate is minimized k too small will lead to noisy decision boundaries k should be small enough so that only nearby samples are included k too large will lead to over-smoothed boundaries Setting K to the square root of the number of training samples can lead to better results. No Of features = 20 K = √20 = 4.4 ~ 4

13. K values vs Error curve

14. K values vs Validation error curve Minimum error Best value of K

15. Pros : ● Non complex and Very easy to understand and implement. ● Useful for non linear data as No assumptions about data. ● High accuracy (relatively), but not competitive compared to Supervised learning algorithms. ● Can be used both for classification or regression. ● Best used where where the probability distribution is unknown

16. Cons : ● Computationally expensive. ● Lot of space is consumed as all the data points are stored . ● Sensitive to irrelevant features and the scale of the data. ● Output purely depends on K value chosen by user which can reduce accuracy for some values.

17. Applications : 1. Recommender Systems 2. Medicine 3. Finance 4. Text mining 5. Agriculture

18. Let’s code now Data used : Iris from Sklearn Plots : Matplotlib K values taken - 1, 3 , 10 , 150 File : knnKpara.py Link to code : Click here for code

19. Thank You

K - Nearest neighbor ( KNN )

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a K - Nearest neighbor ( KNN )

Semelhante a K - Nearest neighbor ( KNN ) (20)

Último

Último (20)

K - Nearest neighbor ( KNN )