SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Weka Term Paper
Submission
ITB Assignment




 Sathiyaseelan M
 10BM60080
Table of Contents
1.     Classification via Decision Trees ....................................................................................................... 3
     1.1      Car Evaluation Database........................................................................................................... 3
     1.2      J48 pruned classification tree ................................................................................................... 4
     1.3      Summary of Results................................................................................................................. 6
     1.4      Simplified Decision Tree ........................................................................................................... 7
     1.5      Test Set .................................................................................................................................... 7
2      K-Means Clustering .......................................................................................................................... 8
     2.1      Bank Database ......................................................................................................................... 8
     2.2      Summary of Results.................................................................................................................. 9
     2.3      Cluster Explanation ................................................................................................................ 10
1. Classification via Decision Trees


The Car Evaluation Database contains data pertaining to six attributes buying price, maintenance price,
no of persons, no of doors, safety and size of the luggage boot. Certain attributes related to structural
information are removed for simplification of analysis. Because of known underlying concept structure,
this database may be particularly useful for testing constructive induction and structure discovery
methods.

1.1 Car Evaluation Database
This model evaluates cars according to the following concept structure.

PRICE
           buying                              buying price
           maint                              price of the maintenance
TECHNICAL CHARACTERISTICS
          ……. (Removed for simplification of analysis)
COMFORT
           doors                              number of doors
           persons                            capacity in terms of persons to carry
           lug_boot                            the size of luggage boot
SAFETY
           safety                             estimated safety of the car

Number of Instances: 1728

Attribute Values

buying  v-high, high, med, low
maint  v-high, high, med, low
1. doors     2, 4, 5-more
persons  2, 4, more
lug_boot  small, med, big
safety  low, med, high

class          N         N[%]
---------------------------------------
unacc 1210 (70.023 %)  Unacceptable
acc       384 (22.222 %)               Acceptable
good         69 ( 3.993 %)             Good
v-good 65 ( 3.762 %)  Very Good

J48 (implementation of C4.5 algorithm) is used for classification.
Test Mode: 10-fold cross-validation & min no. of objects required is 2.
1.2 J48 pruned classification tree
safety = low: unacc (576.0)
safety = med
| persons = 2: unacc (192.0)
| persons = 4
| | buying = vhigh
| | | maint = vhigh: unacc (12.0)
| | | maint = high: unacc (12.0)
| | | maint = med
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = low
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | buying = high
| | | lug_boot = small: unacc (16.0)
| | | lug_boot = med
| | | | doors = 2: unacc (4.0)
| | | | doors = 3: unacc (4.0)
| | | | doors = 4: acc (4.0/1.0)
| | | | doors = 5more: acc (4.0/1.0)
| | | lug_boot = big
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | buying = med
| | | maint = vhigh
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = high
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = med: acc (12.0)
| | | maint = low
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| | buying = low
| | | maint = vhigh
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = high: acc (12.0)
| | | maint = med
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| | | maint = low
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| persons = more
| | lug_boot = small
| | | buying = vhigh: unacc (16.0)
| | | buying = high: unacc (16.0)
| | | buying = med
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = low
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0/1.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | lug_boot = med
| | | buying = vhigh
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = high
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0/1.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = med: acc (16.0/5.0)
| | | buying = low
| | | | maint = vhigh: acc (4.0/1.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: good (4.0/1.0)
| | | | maint = low: good (4.0/1.0)
| | lug_boot = big
| | | buying = vhigh
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | | buying = high
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | | buying = med
| | | | maint = vhigh: acc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: good (4.0)
| | | buying = low
| | | | maint = vhigh: acc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: good (4.0)
| | | | maint = low: good (4.0)
safety = high
| persons = 2: unacc (192.0)
| persons = 4
| | buying = vhigh
| | | maint = vhigh: unacc (12.0)
| | | maint = high: unacc (12.0)
| | | maint = med: acc (12.0)
| | | maint = low: acc (12.0)
| | buying = high
| | | maint = vhigh: unacc (12.0)
| | | maint = high: acc (12.0)
| | | maint = med: acc (12.0)
| | | maint = low: acc (12.0)
| | buying = med
| | | maint = vhigh: acc (12.0)
| | | maint = high: acc (12.0)
| | | maint = med
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = low
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | buying = low
| | | maint = vhigh: acc (12.0)
| | | maint = high
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = med
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = low
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| persons = more
| | buying = vhigh
|   |   | maint = vhigh: unacc (12.0)
|   |   | maint = high: unacc (12.0)
|   |   | maint = med: acc (12.0/1.0)
|   |   | maint = low: acc (12.0/1.0)
|   |   buying = high
|   |   | maint = vhigh: unacc (12.0)
|   |   | maint = high: acc (12.0/1.0)
|   |   | maint = med: acc (12.0/1.0)
|   |   | maint = low: acc (12.0/1.0)
|   |   buying = med
|   |   | maint = vhigh: acc (12.0/1.0)
|   |   | maint = high: acc (12.0/1.0)
|   |   | maint = med
|   |   | | lug_boot = small: acc (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = low
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   buying = low
|   |   | maint = vhigh: acc (12.0/1.0)
|   |   | maint = high
|   |   | | lug_boot = small: acc (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = med
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = low
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)


Number of Leaves :                    131
Size of the tree :                    182


1.3 Summary of Results

Correctly Classified Instances     1596        92.3611 %
Incorrectly Classified Instances    132        7.6389 %
Kappa statistic               0.8343
Mean absolute error               0.0421
Root mean squared error              0.1718
Relative absolute error           18.3833 %
Root relative squared error         50.8176 %
Coverage of cases (0.95 level)       97.2222 %
Mean rel. region size (0.95 level) 29.1088 %
Total Number of Instances          1728

=== Detailed Accuracy By Class ===

       TP Rate              FP Rate Precision Recall F-Measure ROC Area Class
        0.962               0.064 0.972 0.962 0.967 0.983 unacc
        0.867               0.047 0.841 0.867 0.854 0.962 acc
        0.609               0.011 0.689 0.609 0.646 0.918 good
        0.877               0.01    0.77 0.877 0.82       0.995 vgood
Weighted Avg.              0.924 0.056 0.924 0.924 0.924 0.976
=== Confusion Matrix ===

                      Unacceptable (a)      Acceptable(b)         Good(c)            Very Good(d)
Unacceptable (a)            1164                    43                      3                0
Acceptable(b)                33                    333                      11               7
Good(c)                       0                     17                      42              10
Very Good(d)                  0                     3                       5               57

Diagonal elements correctly classified and the rest are not.


1.4 Simplified Decision Tree

When repeated the same with no of folds=10 and min no. of objects =25 [To Simplify
the Classification tree], it produced an accuracy of 81.3079%. Reduction in accuracy is due to the
relaxation on the minimum number of objects. Below is the simplified version of the classification tree.




1.5 Test Set


When applied on the test set, it correctly classified 94.87% of the instances.
2 K-Means Clustering

This example illustrates the use of k-means with Weka.

2.1 Bank Database


The sample data set used for this example is of bank maintaining their customer’s age, gender, region
type, income, marital status, no of children, owning a car and mortgage. The Bank wants to find the
savings pattern of their customer’s of the age group It has 600 instances and 8 attributes with
corresponding values listed below.

       @attribute age numeric
       @attribute sex {FEMALE,MALE}
       @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN}
       @attribute income numeric
       @attribute married {NO,YES}
       @attribute children {0,1,2,3}
       @attribute car {NO,YES}
       @attribute mortgage {NO,YES}

2.2 Summary of Results


Number of iterations: 5
Within cluster sum of squared errors: 1201.3638013812113
Missing values globally replaced with mean/mode




Time taken to build model (full training data) : 0.08 seconds

Clustered Instances

0  163 ( 27%)

1  100 ( 17%)

2  159 ( 27%)

3  178 ( 30%)
Below figure shows the plot of age of customers vs. income for various clusters.




Above picture gives a glimpse of the clusters. It can be observed that age and income are significant
variables in determining the clusters.

2.3 Cluster Explanation


Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid
represents the mean value for that dimension in the cluster). Thus, centroids can be used to
characterize the clusters. For example centroid for cluster 0 Sex=Male implies, this cluster is centered
around Male Population and doesn’t imply that this cluster contain only Male Population
Cluster 0: Consists of predominantly male population of age group around 35 residing in Inner city
and doesn’t have car and children. This cluster consists of men in early stages of the career.

Cluster 1: Consists of predominantly female population of age group around 53 residing in rural areas
and have car and children. They also earn more than other clusters. This cluster predominantly consists
of ladies in fifties.

Cluster 2: Consists of predominantly male population of age group around 43 residing in inner city
and have car and children. They also earn more than cluster 0. This cluster predominantly consists of
men in late forties of their career.

Cluster 3: Consists of predominantly female population of age group around 40 residing in town and
doesn’t have car and children. They also earn lesser than ladies in cluster1. This cluster predominantly
consists of ladies in early forties.

Mais conteúdo relacionado

Último

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Último (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Destaque (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

10BM60080 - Weka Term Paper

  • 1. Weka Term Paper Submission ITB Assignment Sathiyaseelan M 10BM60080
  • 2. Table of Contents 1. Classification via Decision Trees ....................................................................................................... 3 1.1 Car Evaluation Database........................................................................................................... 3 1.2 J48 pruned classification tree ................................................................................................... 4 1.3 Summary of Results................................................................................................................. 6 1.4 Simplified Decision Tree ........................................................................................................... 7 1.5 Test Set .................................................................................................................................... 7 2 K-Means Clustering .......................................................................................................................... 8 2.1 Bank Database ......................................................................................................................... 8 2.2 Summary of Results.................................................................................................................. 9 2.3 Cluster Explanation ................................................................................................................ 10
  • 3. 1. Classification via Decision Trees The Car Evaluation Database contains data pertaining to six attributes buying price, maintenance price, no of persons, no of doors, safety and size of the luggage boot. Certain attributes related to structural information are removed for simplification of analysis. Because of known underlying concept structure, this database may be particularly useful for testing constructive induction and structure discovery methods. 1.1 Car Evaluation Database This model evaluates cars according to the following concept structure. PRICE  buying buying price  maint price of the maintenance TECHNICAL CHARACTERISTICS ……. (Removed for simplification of analysis) COMFORT  doors number of doors  persons capacity in terms of persons to carry  lug_boot the size of luggage boot SAFETY  safety estimated safety of the car Number of Instances: 1728 Attribute Values buying  v-high, high, med, low maint  v-high, high, med, low 1. doors  2, 4, 5-more persons  2, 4, more lug_boot  small, med, big safety  low, med, high class N N[%] --------------------------------------- unacc 1210 (70.023 %)  Unacceptable acc 384 (22.222 %)  Acceptable good 69 ( 3.993 %)  Good v-good 65 ( 3.762 %)  Very Good J48 (implementation of C4.5 algorithm) is used for classification. Test Mode: 10-fold cross-validation & min no. of objects required is 2.
  • 4. 1.2 J48 pruned classification tree safety = low: unacc (576.0) safety = med | persons = 2: unacc (192.0) | persons = 4 | | buying = vhigh | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = low | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | buying = high | | | lug_boot = small: unacc (16.0) | | | lug_boot = med | | | | doors = 2: unacc (4.0) | | | | doors = 3: unacc (4.0) | | | | doors = 4: acc (4.0/1.0) | | | | doors = 5more: acc (4.0/1.0) | | | lug_boot = big | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | buying = med | | | maint = vhigh | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = high | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = med: acc (12.0) | | | maint = low | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | | buying = low | | | maint = vhigh | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = high: acc (12.0) | | | maint = med | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | | | maint = low | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | persons = more | | lug_boot = small | | | buying = vhigh: unacc (16.0) | | | buying = high: unacc (16.0) | | | buying = med | | | | maint = vhigh: unacc (4.0) | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = low | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0/1.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | lug_boot = med | | | buying = vhigh | | | | maint = vhigh: unacc (4.0)
  • 5. | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = high | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0/1.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = med: acc (16.0/5.0) | | | buying = low | | | | maint = vhigh: acc (4.0/1.0) | | | | maint = high: acc (4.0) | | | | maint = med: good (4.0/1.0) | | | | maint = low: good (4.0/1.0) | | lug_boot = big | | | buying = vhigh | | | | maint = vhigh: unacc (4.0) | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | | buying = high | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | | buying = med | | | | maint = vhigh: acc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: good (4.0) | | | buying = low | | | | maint = vhigh: acc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: good (4.0) | | | | maint = low: good (4.0) safety = high | persons = 2: unacc (192.0) | persons = 4 | | buying = vhigh | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med: acc (12.0) | | | maint = low: acc (12.0) | | buying = high | | | maint = vhigh: unacc (12.0) | | | maint = high: acc (12.0) | | | maint = med: acc (12.0) | | | maint = low: acc (12.0) | | buying = med | | | maint = vhigh: acc (12.0) | | | maint = high: acc (12.0) | | | maint = med | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | buying = low | | | maint = vhigh: acc (12.0) | | | maint = high | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = med | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | persons = more | | buying = vhigh
  • 6. | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med: acc (12.0/1.0) | | | maint = low: acc (12.0/1.0) | | buying = high | | | maint = vhigh: unacc (12.0) | | | maint = high: acc (12.0/1.0) | | | maint = med: acc (12.0/1.0) | | | maint = low: acc (12.0/1.0) | | buying = med | | | maint = vhigh: acc (12.0/1.0) | | | maint = high: acc (12.0/1.0) | | | maint = med | | | | lug_boot = small: acc (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | buying = low | | | maint = vhigh: acc (12.0/1.0) | | | maint = high | | | | lug_boot = small: acc (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = med | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) Number of Leaves : 131 Size of the tree : 182 1.3 Summary of Results Correctly Classified Instances 1596 92.3611 % Incorrectly Classified Instances 132 7.6389 % Kappa statistic 0.8343 Mean absolute error 0.0421 Root mean squared error 0.1718 Relative absolute error 18.3833 % Root relative squared error 50.8176 % Coverage of cases (0.95 level) 97.2222 % Mean rel. region size (0.95 level) 29.1088 % Total Number of Instances 1728 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.962 0.064 0.972 0.962 0.967 0.983 unacc 0.867 0.047 0.841 0.867 0.854 0.962 acc 0.609 0.011 0.689 0.609 0.646 0.918 good 0.877 0.01 0.77 0.877 0.82 0.995 vgood Weighted Avg. 0.924 0.056 0.924 0.924 0.924 0.976
  • 7. === Confusion Matrix === Unacceptable (a) Acceptable(b) Good(c) Very Good(d) Unacceptable (a) 1164 43 3 0 Acceptable(b) 33 333 11 7 Good(c) 0 17 42 10 Very Good(d) 0 3 5 57 Diagonal elements correctly classified and the rest are not. 1.4 Simplified Decision Tree When repeated the same with no of folds=10 and min no. of objects =25 [To Simplify the Classification tree], it produced an accuracy of 81.3079%. Reduction in accuracy is due to the relaxation on the minimum number of objects. Below is the simplified version of the classification tree. 1.5 Test Set When applied on the test set, it correctly classified 94.87% of the instances.
  • 8. 2 K-Means Clustering This example illustrates the use of k-means with Weka. 2.1 Bank Database The sample data set used for this example is of bank maintaining their customer’s age, gender, region type, income, marital status, no of children, owning a car and mortgage. The Bank wants to find the
  • 9. savings pattern of their customer’s of the age group It has 600 instances and 8 attributes with corresponding values listed below.  @attribute age numeric  @attribute sex {FEMALE,MALE}  @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN}  @attribute income numeric  @attribute married {NO,YES}  @attribute children {0,1,2,3}  @attribute car {NO,YES}  @attribute mortgage {NO,YES} 2.2 Summary of Results Number of iterations: 5 Within cluster sum of squared errors: 1201.3638013812113 Missing values globally replaced with mean/mode Time taken to build model (full training data) : 0.08 seconds Clustered Instances 0  163 ( 27%) 1  100 ( 17%) 2  159 ( 27%) 3  178 ( 30%)
  • 10. Below figure shows the plot of age of customers vs. income for various clusters. Above picture gives a glimpse of the clusters. It can be observed that age and income are significant variables in determining the clusters. 2.3 Cluster Explanation Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid represents the mean value for that dimension in the cluster). Thus, centroids can be used to characterize the clusters. For example centroid for cluster 0 Sex=Male implies, this cluster is centered around Male Population and doesn’t imply that this cluster contain only Male Population
  • 11. Cluster 0: Consists of predominantly male population of age group around 35 residing in Inner city and doesn’t have car and children. This cluster consists of men in early stages of the career. Cluster 1: Consists of predominantly female population of age group around 53 residing in rural areas and have car and children. They also earn more than other clusters. This cluster predominantly consists of ladies in fifties. Cluster 2: Consists of predominantly male population of age group around 43 residing in inner city and have car and children. They also earn more than cluster 0. This cluster predominantly consists of men in late forties of their career. Cluster 3: Consists of predominantly female population of age group around 40 residing in town and doesn’t have car and children. They also earn lesser than ladies in cluster1. This cluster predominantly consists of ladies in early forties.