SlideShare uma empresa Scribd logo
1 de 36
Data Mining a
Mushroom Dataset
Raymond Borges
Jarilyn Hernandez
Outline
 Background
 Introduction
 Hypotheses
 Methodology
 Results
 Conclusions
 Future    Work
Background
Previous Work
The Mushroom Dataset
 Hypothetical  examples of 23 species from
  Agaricus and Lepiota families
 Class attribute: Edibility
Edible(4,208)51.8%
Poisonous(3,916)48.2%

Data Set                      Number of
                 Multivariate            8124 Area:   Life
Characteristics:              Instances:
Attribute                    Number of        Date
                 Categorical             22            1987
Characteristics:             Attributes:      Donated:
Benchmark ruleset
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white or
4. Population=clustered and cap-color=white
(100% accuracy)
The Mushroom Dataset
22 Attributes
18 Visually
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Visual Attribute ruleset
Only 4 attrib.(100% accuracy)
1.   Stalk surface above ring = not silky and ring
     number = not one, (79% accuracy JRIP)

2.   Population not clustered(80% accuracy J48)

Once retrieved test these two rules:
3. Odor = not bad, (98% accuracy J48)
4. Spore print color = not green, (100% J48)
Results
 Odor  and spore color may be the best
  attributes statistically but in the field

 Focused  on visual-queue attributes, e.g.
 habitat, population, cap and stalk

 Obtained   a more practical classification
Introduction
Project III
Introduction
Taking into account human
Based on:
  Lightingconditions
  Mushroom stage in lifecycle
  Humidity
  Seasons
  Human senses?
  other unknown factors…
Introduction
Some attributes difficult to discern
Textures, Shapes or

Colors like:
 Brown
 Chocolate
 Buff
 Cinnamon
Hypotheses
1.   Complex attributes = Higher error probability
2.   Human senses + external factors = Big impact

So…
Ruleset will change to approach reality
Some attributes will fair much better than others
Methodology
Methodology
Collect survey responses:

1.   Evaluate species in different
     conditions

2.   Measure overall accuracy

3.   Weight attributes based on survey
     performance
Methodology part 1
Take 3 mushroom species
 Agaricus Abruptibulbus
 Agaricus Augustus
 Lepiota Rubrotincta


Place under 2 distinct set of conditions
Methodology part 2
5 questions per species in each condition




Augustus
Rubrotincta
Abruptibulbus         Augustus
                      Rubrotincta
                      Abruptibulbus
under conditions X
under conditions X    under conditions Y
                      under conditions Y
Methodology part 3
 Design Tutorial (SurveyMonkey.com)
 Design Website (Weebly.com)


Get people to take survey (hardest part)
 Designed Flyers
 Poster boards
 Business cards
Survey at Mountainlair
Survey at Mountainlair
Methodology 4
 Calculate survey test scores
 Calculate species’ accuracy variation
 Calculate attributes’ accuracy variation
 Calculate attribute weights
 Use data mining tools to find best ruleset
Weighting Methodology

Results
Overall Survey Results
 30 questions per survey
 15 Attributes measured
 37 completed surveys
 1,110 answered questions
 Overall            A              0
Survey Grades        B              1
                      C             7
                      D             8
                      F            14

   Highest was 24 out of 30 correct answers
Results
          Survey Accuracy per Attribute
100.00%
 90.00%
 80.00%
 70.00%
 60.00%
 50.00%
 40.00%
 30.00%
 20.00%
 10.00%
  0.00%
Attribute Accuracy                                                Attribute Variation
              veil color                                                 37.8
                                           10.8
           ring number                                                                    59.5
                                  5.4
            stalk shape                                               33.75
                                                    18.9
             cap shape                                               32.45
                                                                                  48.7
            cap surface                                              32.45
                                  5.4
              cap color                                                                                             81.1
                                  5.4
            gill spacing                                                                                       78.4
                                                  16.2
              stalk root                                                        45.95
                                                         21.7
  stalk color above ring                                                                  59.45
                                                                                                 64.9
  stalk color below ring                                                                           67.6
                                           10.8
                gill size                                               36.45
                                             13.5
stalk surface below ring                                                                                       78.4
                                  5.4
              ring type                                                                                   73
                                  5.4
stalk surface above ring                                                                      63.55
                                2.7
               gill color                                                                     63.55
                                             13.5

                            0         10          20            30      40      50       60       70           80          90   100
Weighted Attributes
         100
          90
          80
          70
          60
Weight




          50
          40
          30
          20
          10   76.7 74.2 69 65.7 61.8 60.3 56.3 55   36 33.7 31.5 30.7 27.4 20.9 16.7
           0
J48 Tree 99.6%                                                E = Edible
Classification                                                P = Poisonous




   E       P           P           E           P                  P       P     P
almond creosote    foul        anise        musty        none pungent spicy    fishy


   E      E        E           E               P          E        E            E

 black   brown    buff chocolate green orange purple white                    yellow



                           E            P            E             E
                   silky               scaly       fibrous      smooth
J48 Tree 99.9%                                                 E = Edible
 Classification                                                 P = Poisonous




    E        P              P           E        P                   P       P         P
 almond creosote        foul        anise       musty   none pungent spicy            fishy


     E       E          E           E            P          E        E                 E

   black   brown       buff chocolate green orange purple white                      yellow


                                                                     E          E          E

                                                scaly              fibrous   silky    smooth

    P            P              P           P           P           P        P             E
evanescent   flaring        zone        sheathing       none      large cobwebby pendant
Attribute Accuracy
    100

    90
A
    80                                                                                           Cap Color, 10
c                                          Stalk Surface Below, 4
                                                                            Ring Type, 8
    70
                                                                                       Stalk Color Below, 9
c                                   Stalk Surface Above, 4
    60          Ring Number, 3                                                 Stalk Color Above, 9
u   50
                                                                  Stalk Root, 7
r   40                     Veil Color, 4
a   30
          Stalk Shape, 2         Cap Surface, 4          Cap Shape, 6


c   20

y   10

     0
          0     1      2        3      4      5      6        7         8         9     10     11     12      13   14

                                                  Complexity
Conclusions
Conclusion
Complex attributes = Higher error probability
Hypothesis 1: False

They are actually more accurate the more
complex the attribute

Fat spheres = Complex attributes
Height = Survey accuracy
Conclusion
Human senses + external factors = Big impact
Hypothesis 2: True
 24% change in correctly identifying attributes
  due to ambient environment conditions

 1.2
    questions answered incorrectly out of 5
 due to ambient environments of mushrooms
Future Work
 Evaluatemushroom expertise for increase
 in mushroom attribute identification
 accuracy

 Measure    Spore print color and Odor in
 surveys?
Questions?

Mais conteúdo relacionado

Mais procurados

Tourists yatra guide (An android application)
Tourists yatra guide (An android application)Tourists yatra guide (An android application)
Tourists yatra guide (An android application)Umang Aggarwal
 
Machine Learning - Breve panoramica
Machine Learning - Breve panoramicaMachine Learning - Breve panoramica
Machine Learning - Breve panoramicaLuca Naso
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonAkash Satamkar
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detectionPEIPEI HAN
 
Trajectory clustering - Traclus Algorithm
Trajectory clustering - Traclus AlgorithmTrajectory clustering - Traclus Algorithm
Trajectory clustering - Traclus AlgorithmIván Sanchez Vera
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networksjaskarankaur21
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsChitta Ranjan
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Face spoofing detection using texture analysis
Face spoofing detection  using texture analysisFace spoofing detection  using texture analysis
Face spoofing detection using texture analysisSREEKUTTY SREEKUMAR
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Towards Dropout Training for Convolutional Neural Networks
Towards Dropout Training for Convolutional Neural Networks Towards Dropout Training for Convolutional Neural Networks
Towards Dropout Training for Convolutional Neural Networks Mah Sa
 
Lung Cancer Detection Using Convolutional Neural Network
Lung Cancer Detection Using Convolutional Neural NetworkLung Cancer Detection Using Convolutional Neural Network
Lung Cancer Detection Using Convolutional Neural NetworkIRJET Journal
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learningijtsrd
 
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...IRJET Journal
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learningijtsrd
 
machine learning algorithm.pptx
machine learning algorithm.pptxmachine learning algorithm.pptx
machine learning algorithm.pptxSasmitaDash28
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.SUJIT SHIBAPRASAD MAITY
 

Mais procurados (20)

Tourists yatra guide (An android application)
Tourists yatra guide (An android application)Tourists yatra guide (An android application)
Tourists yatra guide (An android application)
 
Machine Learning - Breve panoramica
Machine Learning - Breve panoramicaMachine Learning - Breve panoramica
Machine Learning - Breve panoramica
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detection
 
Trajectory clustering - Traclus Algorithm
Trajectory clustering - Traclus AlgorithmTrajectory clustering - Traclus Algorithm
Trajectory clustering - Traclus Algorithm
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Face spoofing detection using texture analysis
Face spoofing detection  using texture analysisFace spoofing detection  using texture analysis
Face spoofing detection using texture analysis
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Towards Dropout Training for Convolutional Neural Networks
Towards Dropout Training for Convolutional Neural Networks Towards Dropout Training for Convolutional Neural Networks
Towards Dropout Training for Convolutional Neural Networks
 
Lung Cancer Detection Using Convolutional Neural Network
Lung Cancer Detection Using Convolutional Neural NetworkLung Cancer Detection Using Convolutional Neural Network
Lung Cancer Detection Using Convolutional Neural Network
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learning
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
machine learning algorithm.pptx
machine learning algorithm.pptxmachine learning algorithm.pptx
machine learning algorithm.pptx
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 

Destaque

Mushroom tutorial http://rjdatamining.weebly.com
Mushroom tutorial http://rjdatamining.weebly.comMushroom tutorial http://rjdatamining.weebly.com
Mushroom tutorial http://rjdatamining.weebly.comrayborg
 
Project 2 Data Mining Part 1
Project 2 Data Mining Part 1Project 2 Data Mining Part 1
Project 2 Data Mining Part 1rayborg
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetPawandeep Kaur
 
Asheville school mushroom program
Asheville school mushroom programAsheville school mushroom program
Asheville school mushroom programlelawrence03
 
Mushroom culture
Mushroom cultureMushroom culture
Mushroom cultureelachakiel
 
Mushroom Cultivation and Marketing
Mushroom Cultivation and MarketingMushroom Cultivation and Marketing
Mushroom Cultivation and MarketingElisaMendelsohn
 
Final ppt on mushroom
Final ppt on mushroomFinal ppt on mushroom
Final ppt on mushroomAjay Patidar
 
Mushroom Cultivation
Mushroom CultivationMushroom Cultivation
Mushroom CultivationGowri Prabhu
 

Destaque (13)

Mushroom tutorial http://rjdatamining.weebly.com
Mushroom tutorial http://rjdatamining.weebly.comMushroom tutorial http://rjdatamining.weebly.com
Mushroom tutorial http://rjdatamining.weebly.com
 
Project 2 Data Mining Part 1
Project 2 Data Mining Part 1Project 2 Data Mining Part 1
Project 2 Data Mining Part 1
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Asheville school mushroom program
Asheville school mushroom programAsheville school mushroom program
Asheville school mushroom program
 
Different types of mushrooms
Different types of mushroomsDifferent types of mushrooms
Different types of mushrooms
 
Mushroom business plan
Mushroom business planMushroom business plan
Mushroom business plan
 
Mushroom culture
Mushroom cultureMushroom culture
Mushroom culture
 
Mushroom Cultivation and Marketing
Mushroom Cultivation and MarketingMushroom Cultivation and Marketing
Mushroom Cultivation and Marketing
 
Mushroom cultivation
Mushroom cultivationMushroom cultivation
Mushroom cultivation
 
Final ppt on mushroom
Final ppt on mushroomFinal ppt on mushroom
Final ppt on mushroom
 
Mushroom Cultivation
Mushroom CultivationMushroom Cultivation
Mushroom Cultivation
 
mushroom culture
mushroom culturemushroom culture
mushroom culture
 

Mais de CS, NcState

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 

Mais de CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Project 3 mushrooms

  • 1. Data Mining a Mushroom Dataset Raymond Borges Jarilyn Hernandez
  • 2. Outline  Background  Introduction  Hypotheses  Methodology  Results  Conclusions  Future Work
  • 4. The Mushroom Dataset  Hypothetical examples of 23 species from Agaricus and Lepiota families  Class attribute: Edibility Edible(4,208)51.8% Poisonous(3,916)48.2% Data Set Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated:
  • 5. Benchmark ruleset 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white or 4. Population=clustered and cap-color=white (100% accuracy)
  • 6. The Mushroom Dataset 22 Attributes 18 Visually on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 7.
  • 8. Visual Attribute ruleset Only 4 attrib.(100% accuracy) 1. Stalk surface above ring = not silky and ring number = not one, (79% accuracy JRIP) 2. Population not clustered(80% accuracy J48) Once retrieved test these two rules: 3. Odor = not bad, (98% accuracy J48) 4. Spore print color = not green, (100% J48)
  • 9. Results  Odor and spore color may be the best attributes statistically but in the field  Focused on visual-queue attributes, e.g. habitat, population, cap and stalk  Obtained a more practical classification
  • 11. Introduction Taking into account human Based on:  Lightingconditions  Mushroom stage in lifecycle  Humidity  Seasons  Human senses?  other unknown factors…
  • 12. Introduction Some attributes difficult to discern Textures, Shapes or Colors like:  Brown  Chocolate  Buff  Cinnamon
  • 13. Hypotheses 1. Complex attributes = Higher error probability 2. Human senses + external factors = Big impact So… Ruleset will change to approach reality Some attributes will fair much better than others
  • 15. Methodology Collect survey responses: 1. Evaluate species in different conditions 2. Measure overall accuracy 3. Weight attributes based on survey performance
  • 16. Methodology part 1 Take 3 mushroom species  Agaricus Abruptibulbus  Agaricus Augustus  Lepiota Rubrotincta Place under 2 distinct set of conditions
  • 17. Methodology part 2 5 questions per species in each condition Augustus Rubrotincta Abruptibulbus Augustus Rubrotincta Abruptibulbus under conditions X under conditions X under conditions Y under conditions Y
  • 18. Methodology part 3  Design Tutorial (SurveyMonkey.com)  Design Website (Weebly.com) Get people to take survey (hardest part)  Designed Flyers  Poster boards  Business cards
  • 19.
  • 22. Methodology 4  Calculate survey test scores  Calculate species’ accuracy variation  Calculate attributes’ accuracy variation  Calculate attribute weights  Use data mining tools to find best ruleset
  • 25. Overall Survey Results  30 questions per survey  15 Attributes measured  37 completed surveys  1,110 answered questions  Overall A 0 Survey Grades B 1 C 7 D 8 F 14  Highest was 24 out of 30 correct answers
  • 26. Results Survey Accuracy per Attribute 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
  • 27. Attribute Accuracy Attribute Variation veil color 37.8 10.8 ring number 59.5 5.4 stalk shape 33.75 18.9 cap shape 32.45 48.7 cap surface 32.45 5.4 cap color 81.1 5.4 gill spacing 78.4 16.2 stalk root 45.95 21.7 stalk color above ring 59.45 64.9 stalk color below ring 67.6 10.8 gill size 36.45 13.5 stalk surface below ring 78.4 5.4 ring type 73 5.4 stalk surface above ring 63.55 2.7 gill color 63.55 13.5 0 10 20 30 40 50 60 70 80 90 100
  • 28. Weighted Attributes 100 90 80 70 60 Weight 50 40 30 20 10 76.7 74.2 69 65.7 61.8 60.3 56.3 55 36 33.7 31.5 30.7 27.4 20.9 16.7 0
  • 29. J48 Tree 99.6% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E E silky scaly fibrous smooth
  • 30. J48 Tree 99.9% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E E E scaly fibrous silky smooth P P P P P P P E evanescent flaring zone sheathing none large cobwebby pendant
  • 31. Attribute Accuracy 100 90 A 80 Cap Color, 10 c Stalk Surface Below, 4 Ring Type, 8 70 Stalk Color Below, 9 c Stalk Surface Above, 4 60 Ring Number, 3 Stalk Color Above, 9 u 50 Stalk Root, 7 r 40 Veil Color, 4 a 30 Stalk Shape, 2 Cap Surface, 4 Cap Shape, 6 c 20 y 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Complexity
  • 33. Conclusion Complex attributes = Higher error probability Hypothesis 1: False They are actually more accurate the more complex the attribute Fat spheres = Complex attributes Height = Survey accuracy
  • 34. Conclusion Human senses + external factors = Big impact Hypothesis 2: True  24% change in correctly identifying attributes due to ambient environment conditions  1.2 questions answered incorrectly out of 5 due to ambient environments of mushrooms
  • 35. Future Work  Evaluatemushroom expertise for increase in mushroom attribute identification accuracy  Measure Spore print color and Odor in surveys?