SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Weka Term Paper
Submission
ITB Assignment




 Sathiyaseelan M
 10BM60080
Table of Contents
1.     Classification via Decision Trees ....................................................................................................... 3
     1.1      Car Evaluation Database........................................................................................................... 3
     1.2      J48 pruned classification tree ................................................................................................... 4
     1.3      Summary of Results................................................................................................................. 6
     1.4      Simplified Decision Tree ........................................................................................................... 7
     1.5      Test Set .................................................................................................................................... 7
2      K-Means Clustering .......................................................................................................................... 8
     2.1      Bank Database ......................................................................................................................... 8
     2.2      Summary of Results.................................................................................................................. 9
     2.3      Cluster Explanation ................................................................................................................ 10
1. Classification via Decision Trees


The Car Evaluation Database contains data pertaining to six attributes buying price, maintenance price,
no of persons, no of doors, safety and size of the luggage boot. Certain attributes related to structural
information are removed for simplification of analysis. Because of known underlying concept structure,
this database may be particularly useful for testing constructive induction and structure discovery
methods.

1.1 Car Evaluation Database
This model evaluates cars according to the following concept structure.

PRICE
           buying                              buying price
           maint                              price of the maintenance
TECHNICAL CHARACTERISTICS
          ……. (Removed for simplification of analysis)
COMFORT
           doors                              number of doors
           persons                            capacity in terms of persons to carry
           lug_boot                            the size of luggage boot
SAFETY
           safety                             estimated safety of the car

Number of Instances: 1728

Attribute Values

buying  v-high, high, med, low
maint  v-high, high, med, low
1. doors     2, 4, 5-more
persons  2, 4, more
lug_boot  small, med, big
safety  low, med, high

class          N         N[%]
---------------------------------------
unacc 1210 (70.023 %)  Unacceptable
acc       384 (22.222 %)               Acceptable
good         69 ( 3.993 %)             Good
v-good 65 ( 3.762 %)  Very Good

J48 (implementation of C4.5 algorithm) is used for classification.
Test Mode: 10-fold cross-validation & min no. of objects required is 2.
1.2 J48 pruned classification tree
safety = low: unacc (576.0)
safety = med
| persons = 2: unacc (192.0)
| persons = 4
| | buying = vhigh
| | | maint = vhigh: unacc (12.0)
| | | maint = high: unacc (12.0)
| | | maint = med
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = low
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | buying = high
| | | lug_boot = small: unacc (16.0)
| | | lug_boot = med
| | | | doors = 2: unacc (4.0)
| | | | doors = 3: unacc (4.0)
| | | | doors = 4: acc (4.0/1.0)
| | | | doors = 5more: acc (4.0/1.0)
| | | lug_boot = big
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | buying = med
| | | maint = vhigh
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = high
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = med: acc (12.0)
| | | maint = low
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| | buying = low
| | | maint = vhigh
| | | | lug_boot = small: unacc (4.0)
| | | | lug_boot = med: unacc (4.0/2.0)
| | | | lug_boot = big: acc (4.0)
| | | maint = high: acc (12.0)
| | | maint = med
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| | | maint = low
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: good (4.0)
| persons = more
| | lug_boot = small
| | | buying = vhigh: unacc (16.0)
| | | buying = high: unacc (16.0)
| | | buying = med
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = low
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0/1.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | lug_boot = med
| | | buying = vhigh
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = high
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0/1.0)
| | | | maint = med: acc (4.0/1.0)
| | | | maint = low: acc (4.0/1.0)
| | | buying = med: acc (16.0/5.0)
| | | buying = low
| | | | maint = vhigh: acc (4.0/1.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: good (4.0/1.0)
| | | | maint = low: good (4.0/1.0)
| | lug_boot = big
| | | buying = vhigh
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: unacc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | | buying = high
| | | | maint = vhigh: unacc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: acc (4.0)
| | | buying = med
| | | | maint = vhigh: acc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: acc (4.0)
| | | | maint = low: good (4.0)
| | | buying = low
| | | | maint = vhigh: acc (4.0)
| | | | maint = high: acc (4.0)
| | | | maint = med: good (4.0)
| | | | maint = low: good (4.0)
safety = high
| persons = 2: unacc (192.0)
| persons = 4
| | buying = vhigh
| | | maint = vhigh: unacc (12.0)
| | | maint = high: unacc (12.0)
| | | maint = med: acc (12.0)
| | | maint = low: acc (12.0)
| | buying = high
| | | maint = vhigh: unacc (12.0)
| | | maint = high: acc (12.0)
| | | maint = med: acc (12.0)
| | | maint = low: acc (12.0)
| | buying = med
| | | maint = vhigh: acc (12.0)
| | | maint = high: acc (12.0)
| | | maint = med
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = low
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | buying = low
| | | maint = vhigh: acc (12.0)
| | | maint = high
| | | | lug_boot = small: acc (4.0)
| | | | lug_boot = med: acc (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = med
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| | | maint = low
| | | | lug_boot = small: good (4.0)
| | | | lug_boot = med: good (4.0/2.0)
| | | | lug_boot = big: vgood (4.0)
| persons = more
| | buying = vhigh
|   |   | maint = vhigh: unacc (12.0)
|   |   | maint = high: unacc (12.0)
|   |   | maint = med: acc (12.0/1.0)
|   |   | maint = low: acc (12.0/1.0)
|   |   buying = high
|   |   | maint = vhigh: unacc (12.0)
|   |   | maint = high: acc (12.0/1.0)
|   |   | maint = med: acc (12.0/1.0)
|   |   | maint = low: acc (12.0/1.0)
|   |   buying = med
|   |   | maint = vhigh: acc (12.0/1.0)
|   |   | maint = high: acc (12.0/1.0)
|   |   | maint = med
|   |   | | lug_boot = small: acc (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = low
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   buying = low
|   |   | maint = vhigh: acc (12.0/1.0)
|   |   | maint = high
|   |   | | lug_boot = small: acc (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = med
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)
|   |   | maint = low
|   |   | | lug_boot = small: good (4.0/1.0)
|   |   | | lug_boot = med: vgood (4.0/1.0)
|   |   | | lug_boot = big: vgood (4.0)


Number of Leaves :                    131
Size of the tree :                    182


1.3 Summary of Results

Correctly Classified Instances     1596        92.3611 %
Incorrectly Classified Instances    132        7.6389 %
Kappa statistic               0.8343
Mean absolute error               0.0421
Root mean squared error              0.1718
Relative absolute error           18.3833 %
Root relative squared error         50.8176 %
Coverage of cases (0.95 level)       97.2222 %
Mean rel. region size (0.95 level) 29.1088 %
Total Number of Instances          1728

=== Detailed Accuracy By Class ===

       TP Rate              FP Rate Precision Recall F-Measure ROC Area Class
        0.962               0.064 0.972 0.962 0.967 0.983 unacc
        0.867               0.047 0.841 0.867 0.854 0.962 acc
        0.609               0.011 0.689 0.609 0.646 0.918 good
        0.877               0.01    0.77 0.877 0.82       0.995 vgood
Weighted Avg.              0.924 0.056 0.924 0.924 0.924 0.976
=== Confusion Matrix ===

                      Unacceptable (a)      Acceptable(b)         Good(c)            Very Good(d)
Unacceptable (a)            1164                    43                      3                0
Acceptable(b)                33                    333                      11               7
Good(c)                       0                     17                      42              10
Very Good(d)                  0                     3                       5               57

Diagonal elements correctly classified and the rest are not.


1.4 Simplified Decision Tree

When repeated the same with no of folds=10 and min no. of objects =25 [To Simplify
the Classification tree], it produced an accuracy of 81.3079%. Reduction in accuracy is due to the
relaxation on the minimum number of objects. Below is the simplified version of the classification tree.




1.5 Test Set


When applied on the test set, it correctly classified 94.87% of the instances.
2 K-Means Clustering

This example illustrates the use of k-means with Weka.

2.1 Bank Database


The sample data set used for this example is of bank maintaining their customer’s age, gender, region
type, income, marital status, no of children, owning a car and mortgage. The Bank wants to find the
savings pattern of their customer’s of the age group It has 600 instances and 8 attributes with
corresponding values listed below.

       @attribute age numeric
       @attribute sex {FEMALE,MALE}
       @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN}
       @attribute income numeric
       @attribute married {NO,YES}
       @attribute children {0,1,2,3}
       @attribute car {NO,YES}
       @attribute mortgage {NO,YES}

2.2 Summary of Results


Number of iterations: 5
Within cluster sum of squared errors: 1201.3638013812113
Missing values globally replaced with mean/mode




Time taken to build model (full training data) : 0.08 seconds

Clustered Instances

0  163 ( 27%)

1  100 ( 17%)

2  159 ( 27%)

3  178 ( 30%)
Below figure shows the plot of age of customers vs. income for various clusters.




Above picture gives a glimpse of the clusters. It can be observed that age and income are significant
variables in determining the clusters.

2.3 Cluster Explanation


Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid
represents the mean value for that dimension in the cluster). Thus, centroids can be used to
characterize the clusters. For example centroid for cluster 0 Sex=Male implies, this cluster is centered
around Male Population and doesn’t imply that this cluster contain only Male Population
Cluster 0: Consists of predominantly male population of age group around 35 residing in Inner city
and doesn’t have car and children. This cluster consists of men in early stages of the career.

Cluster 1: Consists of predominantly female population of age group around 53 residing in rural areas
and have car and children. They also earn more than other clusters. This cluster predominantly consists
of ladies in fifties.

Cluster 2: Consists of predominantly male population of age group around 43 residing in inner city
and have car and children. They also earn more than cluster 0. This cluster predominantly consists of
men in late forties of their career.

Cluster 3: Consists of predominantly female population of age group around 40 residing in town and
doesn’t have car and children. They also earn lesser than ladies in cluster1. This cluster predominantly
consists of ladies in early forties.

Mais conteúdo relacionado

Último

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 

Último (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Destaque (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

10BM60080 - Weka Term Paper

  • 1. Weka Term Paper Submission ITB Assignment Sathiyaseelan M 10BM60080
  • 2. Table of Contents 1. Classification via Decision Trees ....................................................................................................... 3 1.1 Car Evaluation Database........................................................................................................... 3 1.2 J48 pruned classification tree ................................................................................................... 4 1.3 Summary of Results................................................................................................................. 6 1.4 Simplified Decision Tree ........................................................................................................... 7 1.5 Test Set .................................................................................................................................... 7 2 K-Means Clustering .......................................................................................................................... 8 2.1 Bank Database ......................................................................................................................... 8 2.2 Summary of Results.................................................................................................................. 9 2.3 Cluster Explanation ................................................................................................................ 10
  • 3. 1. Classification via Decision Trees The Car Evaluation Database contains data pertaining to six attributes buying price, maintenance price, no of persons, no of doors, safety and size of the luggage boot. Certain attributes related to structural information are removed for simplification of analysis. Because of known underlying concept structure, this database may be particularly useful for testing constructive induction and structure discovery methods. 1.1 Car Evaluation Database This model evaluates cars according to the following concept structure. PRICE  buying buying price  maint price of the maintenance TECHNICAL CHARACTERISTICS ……. (Removed for simplification of analysis) COMFORT  doors number of doors  persons capacity in terms of persons to carry  lug_boot the size of luggage boot SAFETY  safety estimated safety of the car Number of Instances: 1728 Attribute Values buying  v-high, high, med, low maint  v-high, high, med, low 1. doors  2, 4, 5-more persons  2, 4, more lug_boot  small, med, big safety  low, med, high class N N[%] --------------------------------------- unacc 1210 (70.023 %)  Unacceptable acc 384 (22.222 %)  Acceptable good 69 ( 3.993 %)  Good v-good 65 ( 3.762 %)  Very Good J48 (implementation of C4.5 algorithm) is used for classification. Test Mode: 10-fold cross-validation & min no. of objects required is 2.
  • 4. 1.2 J48 pruned classification tree safety = low: unacc (576.0) safety = med | persons = 2: unacc (192.0) | persons = 4 | | buying = vhigh | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = low | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | buying = high | | | lug_boot = small: unacc (16.0) | | | lug_boot = med | | | | doors = 2: unacc (4.0) | | | | doors = 3: unacc (4.0) | | | | doors = 4: acc (4.0/1.0) | | | | doors = 5more: acc (4.0/1.0) | | | lug_boot = big | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | buying = med | | | maint = vhigh | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = high | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = med: acc (12.0) | | | maint = low | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | | buying = low | | | maint = vhigh | | | | lug_boot = small: unacc (4.0) | | | | lug_boot = med: unacc (4.0/2.0) | | | | lug_boot = big: acc (4.0) | | | maint = high: acc (12.0) | | | maint = med | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | | | maint = low | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: good (4.0) | persons = more | | lug_boot = small | | | buying = vhigh: unacc (16.0) | | | buying = high: unacc (16.0) | | | buying = med | | | | maint = vhigh: unacc (4.0) | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = low | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0/1.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | lug_boot = med | | | buying = vhigh | | | | maint = vhigh: unacc (4.0)
  • 5. | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = high | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0/1.0) | | | | maint = med: acc (4.0/1.0) | | | | maint = low: acc (4.0/1.0) | | | buying = med: acc (16.0/5.0) | | | buying = low | | | | maint = vhigh: acc (4.0/1.0) | | | | maint = high: acc (4.0) | | | | maint = med: good (4.0/1.0) | | | | maint = low: good (4.0/1.0) | | lug_boot = big | | | buying = vhigh | | | | maint = vhigh: unacc (4.0) | | | | maint = high: unacc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | | buying = high | | | | maint = vhigh: unacc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: acc (4.0) | | | buying = med | | | | maint = vhigh: acc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: acc (4.0) | | | | maint = low: good (4.0) | | | buying = low | | | | maint = vhigh: acc (4.0) | | | | maint = high: acc (4.0) | | | | maint = med: good (4.0) | | | | maint = low: good (4.0) safety = high | persons = 2: unacc (192.0) | persons = 4 | | buying = vhigh | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med: acc (12.0) | | | maint = low: acc (12.0) | | buying = high | | | maint = vhigh: unacc (12.0) | | | maint = high: acc (12.0) | | | maint = med: acc (12.0) | | | maint = low: acc (12.0) | | buying = med | | | maint = vhigh: acc (12.0) | | | maint = high: acc (12.0) | | | maint = med | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | buying = low | | | maint = vhigh: acc (12.0) | | | maint = high | | | | lug_boot = small: acc (4.0) | | | | lug_boot = med: acc (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = med | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0) | | | | lug_boot = med: good (4.0/2.0) | | | | lug_boot = big: vgood (4.0) | persons = more | | buying = vhigh
  • 6. | | | maint = vhigh: unacc (12.0) | | | maint = high: unacc (12.0) | | | maint = med: acc (12.0/1.0) | | | maint = low: acc (12.0/1.0) | | buying = high | | | maint = vhigh: unacc (12.0) | | | maint = high: acc (12.0/1.0) | | | maint = med: acc (12.0/1.0) | | | maint = low: acc (12.0/1.0) | | buying = med | | | maint = vhigh: acc (12.0/1.0) | | | maint = high: acc (12.0/1.0) | | | maint = med | | | | lug_boot = small: acc (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | buying = low | | | maint = vhigh: acc (12.0/1.0) | | | maint = high | | | | lug_boot = small: acc (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = med | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) | | | maint = low | | | | lug_boot = small: good (4.0/1.0) | | | | lug_boot = med: vgood (4.0/1.0) | | | | lug_boot = big: vgood (4.0) Number of Leaves : 131 Size of the tree : 182 1.3 Summary of Results Correctly Classified Instances 1596 92.3611 % Incorrectly Classified Instances 132 7.6389 % Kappa statistic 0.8343 Mean absolute error 0.0421 Root mean squared error 0.1718 Relative absolute error 18.3833 % Root relative squared error 50.8176 % Coverage of cases (0.95 level) 97.2222 % Mean rel. region size (0.95 level) 29.1088 % Total Number of Instances 1728 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.962 0.064 0.972 0.962 0.967 0.983 unacc 0.867 0.047 0.841 0.867 0.854 0.962 acc 0.609 0.011 0.689 0.609 0.646 0.918 good 0.877 0.01 0.77 0.877 0.82 0.995 vgood Weighted Avg. 0.924 0.056 0.924 0.924 0.924 0.976
  • 7. === Confusion Matrix === Unacceptable (a) Acceptable(b) Good(c) Very Good(d) Unacceptable (a) 1164 43 3 0 Acceptable(b) 33 333 11 7 Good(c) 0 17 42 10 Very Good(d) 0 3 5 57 Diagonal elements correctly classified and the rest are not. 1.4 Simplified Decision Tree When repeated the same with no of folds=10 and min no. of objects =25 [To Simplify the Classification tree], it produced an accuracy of 81.3079%. Reduction in accuracy is due to the relaxation on the minimum number of objects. Below is the simplified version of the classification tree. 1.5 Test Set When applied on the test set, it correctly classified 94.87% of the instances.
  • 8. 2 K-Means Clustering This example illustrates the use of k-means with Weka. 2.1 Bank Database The sample data set used for this example is of bank maintaining their customer’s age, gender, region type, income, marital status, no of children, owning a car and mortgage. The Bank wants to find the
  • 9. savings pattern of their customer’s of the age group It has 600 instances and 8 attributes with corresponding values listed below.  @attribute age numeric  @attribute sex {FEMALE,MALE}  @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN}  @attribute income numeric  @attribute married {NO,YES}  @attribute children {0,1,2,3}  @attribute car {NO,YES}  @attribute mortgage {NO,YES} 2.2 Summary of Results Number of iterations: 5 Within cluster sum of squared errors: 1201.3638013812113 Missing values globally replaced with mean/mode Time taken to build model (full training data) : 0.08 seconds Clustered Instances 0  163 ( 27%) 1  100 ( 17%) 2  159 ( 27%) 3  178 ( 30%)
  • 10. Below figure shows the plot of age of customers vs. income for various clusters. Above picture gives a glimpse of the clusters. It can be observed that age and income are significant variables in determining the clusters. 2.3 Cluster Explanation Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid represents the mean value for that dimension in the cluster). Thus, centroids can be used to characterize the clusters. For example centroid for cluster 0 Sex=Male implies, this cluster is centered around Male Population and doesn’t imply that this cluster contain only Male Population
  • 11. Cluster 0: Consists of predominantly male population of age group around 35 residing in Inner city and doesn’t have car and children. This cluster consists of men in early stages of the career. Cluster 1: Consists of predominantly female population of age group around 53 residing in rural areas and have car and children. They also earn more than other clusters. This cluster predominantly consists of ladies in fifties. Cluster 2: Consists of predominantly male population of age group around 43 residing in inner city and have car and children. They also earn more than cluster 0. This cluster predominantly consists of men in late forties of their career. Cluster 3: Consists of predominantly female population of age group around 40 residing in town and doesn’t have car and children. They also earn lesser than ladies in cluster1. This cluster predominantly consists of ladies in early forties.