SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
On the dimensions of data
  complexity through synthetic
       l it th     h    th ti
           data sets
Onzè C
O è Congrés Internacional de l’Associació C
         é                              ó Catalana d’Intel·ligència Artificial
                                                              è         f




               Núria Macià
         Ester Bernadó-Mansilla
         Et B         dó M     ill
            Albert Orriols-Puig
   {nmacia,esterb,aorriols}@salle.url.edu

 Grup de Recerca en Sistemes Intel·ligents
     Enginyeria i Arquitectura La Salle
         Universitat Ramon Llull
Motivation (I)
   Competitive learners that extract accurate models from
         p
   data have been developed
   Which is the best technique?
                             q
   My learner has outperformed another in 1000 data sets.
   Will it outperform it again in the 1001st data set?
                          g

Toward:
   Data characterization to know which and why a method is the
   best for a given problem




                                                           Slide 2
Motivation (II)




                  Slide 3
Motivation (III)
  The purpose of this work is to:
     Analyze the impact of different data sets characteristics
     Consider two frequent measures
        Number of attributes
        Number of instances
     Consider a measure about class geometry
        Length of the class boundary




                                                            Slide 4
Outline
1.   Approach
2.   Complexity dimensions
3.
3    Synthetic data sets
4.   Design of experiments
5.   Results
6.   Conclusions
7.   Further work




                             Slide 5
1. Approach
 Study how different dimensions influence classifier
 behavior
    Number of attributes
    Number of instances
    Length of the class boundary
 Through synthetic data sets
    They
    Th provide a controlled scenario with k
              id    t ll d        i   ith known
    characteristics




                                                  Slide 6
2. Complexity dimensions
 Number of attributes and instances
   No relation is observed among these characteristics and
   c ass e s accuracy
   classifiers’ accu acy




                                                       Slide 7
2. Complexity dimensions

 Length of the class boundary
   Linear correlation with classifiers’ accuracy




                                                   Slide 8
2. Complexity dimensions
 Length of the class boundary
   It estimates the density of points located near the class
   bou da y
   boundary
   To compute the measure:
      Build the minimum spanning tree (MST)
      connecting all the points regardless
      of class
      Count the number of edges joining
      opposite classes and divide this
       pp
      value by the total number of points




                                                          Slide 9
3. Synthetic data sets
 Real world
 Real-world problems
    Not cover all the complexity spectrum
    Are characterized by uncertainty missing values class
                          uncertainty,       values,
    imbalance…

 Synthetic data sets
    Provide a controlled framework
    Permit varying different dimensions independently




                                                        Slide 10
3. Synthetic data sets
 Generation procedure
    Set the number of instances n, the number of attributes m,
    and the length of the class boundary b.
               g                         y
    Generate n points that follow a uniform distribution
    Build the MST
    Label the point to obtain the required boundary



  Attributes: 2
  Instances: 17
  Required boundary: 0 25
                     0.25




                                                           Slide 11
4. Design of experiments
 Three different paradigms of learners:
    Tree induction – C4.5
    Rule learning – PART
    Support vector machine – SMO
 10-fold cross validation
 10 fold cross-validation
 1000 synthetic data sets




                                          Slide 12
5. Results (I)




    n = 200, b = 0.3
    Classifiers behave similarly
    Cl   ifi     bh       i il l
    Accuracy rates range in the interval [0.6-08]: 1 - b
    Variability indicates that other dimensions are required to
    describe the data complexity


                                                           Slide 13
5. Results (II)




    m = 2, b = 0.3
    Classifiers behave similarly
    Cl   ifi    bh      i il l
    Spread of accuracy rates decreases with increasing
    number of instances



                                                         Slide 14
6. Conclusions
 Information provided by the number of attributes
 and the number of instances can be embedded in
 the measure of length of the class boundary
                     g                         y
 Length of the class boundary is a relevant estimate
 of classifier accuracy, but it is not enough to
 describe the whole data set complexity
 Using synthetic data sets along the experiments
 allows us to vary different dimensions
 independently and work under a controlled scenario




                                                Slide 15
7. Further work
 Synthetic data sets do not contain similar
 structures to those of real-world problems
    Generate data sets that follow physical processes
 Study other complexity dimensions




                                                        Slide 16
On the dimensions of data
  complexity through synthetic
       l it th     h    th ti
           data sets
Onzè C
O è Congrés Internacional de l’Assocaició C
         é                              ó Catalana d’Intel·ligència Artificial
                                                              è         f




               Núria Macià
         Ester Bernadó-Mansilla
         Et B         dó M     ill
            Albert Orriols-Puig
   {nmacia,esterb,aorriols}@salle.url.edu

 Grup de Recerca en Sistemes Intel·ligents
     Enginyeria i Arquitectura La Salle
         Universitat Ramon Llull

Mais conteúdo relacionado

Mais procurados

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...Anderson Pinho
 
Intelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn ApproachIntelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn ApproachWouter Beek
 
LE03.doc
LE03.docLE03.doc
LE03.docbutest
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 

Mais procurados (20)

Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture23
Lecture23Lecture23
Lecture23
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
 
Intelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn ApproachIntelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn Approach
 
LE03.doc
LE03.docLE03.doc
LE03.doc
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Semelhante a CCIA'2008: On the dimensions of data complexity through synthetic data sets

Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNNRICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNNIRJET Journal
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdfKammetaJoshna
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...cscpconf
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
Intrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross ValidationIntrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides vieweSAT Publishing House
 
Fault detection and_diagnosis
Fault detection and_diagnosisFault detection and_diagnosis
Fault detection and_diagnosisM Reza Rahmati
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3eSAT Journals
 

Semelhante a CCIA'2008: On the dimensions of data complexity through synthetic data sets (20)

Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNNRICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
Intrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross ValidationIntrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross Validation
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides view
 
Fault detection and_diagnosis
Fault detection and_diagnosisFault detection and_diagnosis
Fault detection and_diagnosis
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3
 

Mais de Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 

Mais de Albert Orriols-Puig (9)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 

Último

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Último (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

CCIA'2008: On the dimensions of data complexity through synthetic data sets

  • 1. On the dimensions of data complexity through synthetic l it th h th ti data sets Onzè C O è Congrés Internacional de l’Associació C é ó Catalana d’Intel·ligència Artificial è f Núria Macià Ester Bernadó-Mansilla Et B dó M ill Albert Orriols-Puig {nmacia,esterb,aorriols}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Motivation (I) Competitive learners that extract accurate models from p data have been developed Which is the best technique? q My learner has outperformed another in 1000 data sets. Will it outperform it again in the 1001st data set? g Toward: Data characterization to know which and why a method is the best for a given problem Slide 2
  • 3. Motivation (II) Slide 3
  • 4. Motivation (III) The purpose of this work is to: Analyze the impact of different data sets characteristics Consider two frequent measures Number of attributes Number of instances Consider a measure about class geometry Length of the class boundary Slide 4
  • 5. Outline 1. Approach 2. Complexity dimensions 3. 3 Synthetic data sets 4. Design of experiments 5. Results 6. Conclusions 7. Further work Slide 5
  • 6. 1. Approach Study how different dimensions influence classifier behavior Number of attributes Number of instances Length of the class boundary Through synthetic data sets They Th provide a controlled scenario with k id t ll d i ith known characteristics Slide 6
  • 7. 2. Complexity dimensions Number of attributes and instances No relation is observed among these characteristics and c ass e s accuracy classifiers’ accu acy Slide 7
  • 8. 2. Complexity dimensions Length of the class boundary Linear correlation with classifiers’ accuracy Slide 8
  • 9. 2. Complexity dimensions Length of the class boundary It estimates the density of points located near the class bou da y boundary To compute the measure: Build the minimum spanning tree (MST) connecting all the points regardless of class Count the number of edges joining opposite classes and divide this pp value by the total number of points Slide 9
  • 10. 3. Synthetic data sets Real world Real-world problems Not cover all the complexity spectrum Are characterized by uncertainty missing values class uncertainty, values, imbalance… Synthetic data sets Provide a controlled framework Permit varying different dimensions independently Slide 10
  • 11. 3. Synthetic data sets Generation procedure Set the number of instances n, the number of attributes m, and the length of the class boundary b. g y Generate n points that follow a uniform distribution Build the MST Label the point to obtain the required boundary Attributes: 2 Instances: 17 Required boundary: 0 25 0.25 Slide 11
  • 12. 4. Design of experiments Three different paradigms of learners: Tree induction – C4.5 Rule learning – PART Support vector machine – SMO 10-fold cross validation 10 fold cross-validation 1000 synthetic data sets Slide 12
  • 13. 5. Results (I) n = 200, b = 0.3 Classifiers behave similarly Cl ifi bh i il l Accuracy rates range in the interval [0.6-08]: 1 - b Variability indicates that other dimensions are required to describe the data complexity Slide 13
  • 14. 5. Results (II) m = 2, b = 0.3 Classifiers behave similarly Cl ifi bh i il l Spread of accuracy rates decreases with increasing number of instances Slide 14
  • 15. 6. Conclusions Information provided by the number of attributes and the number of instances can be embedded in the measure of length of the class boundary g y Length of the class boundary is a relevant estimate of classifier accuracy, but it is not enough to describe the whole data set complexity Using synthetic data sets along the experiments allows us to vary different dimensions independently and work under a controlled scenario Slide 15
  • 16. 7. Further work Synthetic data sets do not contain similar structures to those of real-world problems Generate data sets that follow physical processes Study other complexity dimensions Slide 16
  • 17. On the dimensions of data complexity through synthetic l it th h th ti data sets Onzè C O è Congrés Internacional de l’Assocaició C é ó Catalana d’Intel·ligència Artificial è f Núria Macià Ester Bernadó-Mansilla Et B dó M ill Albert Orriols-Puig {nmacia,esterb,aorriols}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull