SlideShare uma empresa Scribd logo
1 de 12
CVC TechParty


A practical Introduction to 
Machine Learning in Python


                         Piero Casale



           
CVC TechParty




www.ailab.si/orange/

                        
CVC TechParty



   Load-In Data and Basic Data Exploration
- Loading Data:
      iris = orange.ExampleTable('iris.tab')


- Exploring Features and Examples
       iris.domain.attributes
       iris.domain.classVar.name


- Basic Dataset Characteristics
       GetDatasetStatistics()


- Dataset Formats in Orange:
       csv, txt, xls


- Dataset as Python Lists:
      indexing, append, extend, native




                                 
CVC TechParty


                   Dataset Visualization
- Multi Dimensional Scaling:
     MultiDimensional Scaling Functions in orngMDS




                              
CVC TechParty



My First Classifier in Orange : Bayes




              
CVC TechParty



        My First Classifier in Orange : Bayes
- Loading Data:
        iris = orange.ExampleTable('iris.tab')


-   Declare the Learning Function:
        bayes = orange.BayesLearner()


- Train the Bayes Classifier on Data:
       BayesClassifier = bayes(iris)


- Classify new data:
       Prediction = bayesClassifier(newExample)


_ Example on Iris Dataset:
       exCodes.showBayes()



                                
CVC TechParty


        My (Second) Classifier in Orange :
                Decision Trees
- As before:
      import orngTree
      treeLearner = orngTree.TreeLearner()
      treeClassifier = treeLearner(iris)
      prediction = treeClassifier(newExample)


_ Measures for splitting : infoGain, gainRatio, gini
       treeLearner = orngTree.TreeLearner(measure='gini')


- Print the Tree:
  - on screen : orngTree.printTree(treeClassifier)
  - save as an image :
      orngTree.printDot(treeClassifier, fileName='tree.dot')
      dot -Tpng tree.dot -otree.png




                                
CVC TechParty



        Testing and Evaluating a Classifier
- Testing Functions in orngTest
      import orngTest
      learners = [bayesLearner, treeLearner]


- Make a 10 folds Cross Validation
      xv = orngTest.crossValidation(learners, data, folds=10)


- Scores Functions in orngStat
      import orngStat
      accuracy = orngStat.CA(xv)
      confusionMatrix = orngStat.cm(xv)


- Example on Iris Dataset using Bayes, DecisionTree and Knn.
      exCodes.crossValidate()




                               
CVC TechParty


                    Ensemble Methods
- Basic Ensemble Methods in orngEnsemble
      Bagging, Boosting and Random Forest
      import orngEnsemble


- Bagging of Decision Trees
      treeLearner = orngTree.TreeLearner()
      baggedTrees = orngEnsemble.BaggedLearner(treeLearner, t=10)


- Boosting of Decision Trees
      treeLearner = orngTree.TreeLearner()
      boostedTrees = orngEnsemble.BoostedLearner(treeLearner, t=10)


- Random Forest
      forest = orngEnsemble.RandomForestLearner(trees = 10)


- Example on Iris Dataset:
      exCodes.crossValidateEnsembles()


                             
CVC TechParty



                      Features Selection
- Functions for Features Selectoin in orngFSS
    import orngFSS
    vehicle = orange.ExampleTable('vehicle.tab')


-   Measuring Import of features with Information Gain
    measures = orngFSS.attMeasure(vehicle)
    TenBests = orngFSS.bestNAtts(measures,n=10)


-   Measuring Import of features with Gain Ratio
    gainRatio = orange.MeasureAttribute_gainRatio()
    measures = orngFSS.attMeasure(vehicle,gainRatio)
    fiveBests = orngFSS.bestNAtts(measures,n=5)


- Example on Vehicle Dataset:
       exCodes.measureAttributes()


                               
CVC TechParty

                             More.....
- Supervised Learning Algorithms:
           orngSVM,orngLR,orngC45
- Unsupervised Learning Algorithm :
           orngClustering
- Reinforcement Learning :
           orngReinforcement
- Outlier Detection :
           orngOutlier
- Discretization Functions :
           orngDisc




                          
CVC TechParty




      Enjoy.....
    More at www.ailab.si/orange




                                  Piero Casale



            

Mais conteúdo relacionado

Semelhante a A practical Introduction to Machine Learning in Python

Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Takashi J OZAKI
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Edureka!
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Computational decision making
Computational decision makingComputational decision making
Computational decision makingBoris Adryan
 
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.Lviv Startup Club
 
Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09stat
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architecturesinside-BigData.com
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on NetezzaAjay Ohri
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 SiglerSonya Sigler
 
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
 
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...Red Hat Developers
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkPatrick Pletscher
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!Diana Ortega
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyDatabricks
 

Semelhante a A practical Introduction to Machine Learning in Python (20)

Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Computational decision making
Computational decision makingComputational decision making
Computational decision making
 
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.
 
Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
 
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
C3 w2
C3 w2C3 w2
C3 w2
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise Company
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

A practical Introduction to Machine Learning in Python

  • 3. CVC TechParty Load-In Data and Basic Data Exploration - Loading Data: iris = orange.ExampleTable('iris.tab') - Exploring Features and Examples iris.domain.attributes iris.domain.classVar.name - Basic Dataset Characteristics GetDatasetStatistics() - Dataset Formats in Orange: csv, txt, xls - Dataset as Python Lists: indexing, append, extend, native    
  • 4. CVC TechParty Dataset Visualization - Multi Dimensional Scaling: MultiDimensional Scaling Functions in orngMDS    
  • 5. CVC TechParty My First Classifier in Orange : Bayes    
  • 6. CVC TechParty My First Classifier in Orange : Bayes - Loading Data: iris = orange.ExampleTable('iris.tab') - Declare the Learning Function: bayes = orange.BayesLearner() - Train the Bayes Classifier on Data: BayesClassifier = bayes(iris) - Classify new data: Prediction = bayesClassifier(newExample) _ Example on Iris Dataset: exCodes.showBayes()    
  • 7. CVC TechParty My (Second) Classifier in Orange : Decision Trees - As before: import orngTree treeLearner = orngTree.TreeLearner() treeClassifier = treeLearner(iris) prediction = treeClassifier(newExample) _ Measures for splitting : infoGain, gainRatio, gini treeLearner = orngTree.TreeLearner(measure='gini') - Print the Tree: - on screen : orngTree.printTree(treeClassifier) - save as an image : orngTree.printDot(treeClassifier, fileName='tree.dot') dot -Tpng tree.dot -otree.png    
  • 8. CVC TechParty Testing and Evaluating a Classifier - Testing Functions in orngTest import orngTest learners = [bayesLearner, treeLearner] - Make a 10 folds Cross Validation xv = orngTest.crossValidation(learners, data, folds=10) - Scores Functions in orngStat import orngStat accuracy = orngStat.CA(xv) confusionMatrix = orngStat.cm(xv) - Example on Iris Dataset using Bayes, DecisionTree and Knn. exCodes.crossValidate()    
  • 9. CVC TechParty Ensemble Methods - Basic Ensemble Methods in orngEnsemble Bagging, Boosting and Random Forest import orngEnsemble - Bagging of Decision Trees treeLearner = orngTree.TreeLearner() baggedTrees = orngEnsemble.BaggedLearner(treeLearner, t=10) - Boosting of Decision Trees treeLearner = orngTree.TreeLearner() boostedTrees = orngEnsemble.BoostedLearner(treeLearner, t=10) - Random Forest forest = orngEnsemble.RandomForestLearner(trees = 10) - Example on Iris Dataset: exCodes.crossValidateEnsembles()    
  • 10. CVC TechParty Features Selection - Functions for Features Selectoin in orngFSS import orngFSS vehicle = orange.ExampleTable('vehicle.tab') - Measuring Import of features with Information Gain measures = orngFSS.attMeasure(vehicle) TenBests = orngFSS.bestNAtts(measures,n=10) - Measuring Import of features with Gain Ratio gainRatio = orange.MeasureAttribute_gainRatio() measures = orngFSS.attMeasure(vehicle,gainRatio) fiveBests = orngFSS.bestNAtts(measures,n=5) - Example on Vehicle Dataset: exCodes.measureAttributes()    
  • 11. CVC TechParty More..... - Supervised Learning Algorithms: orngSVM,orngLR,orngC45 - Unsupervised Learning Algorithm : orngClustering - Reinforcement Learning : orngReinforcement - Outlier Detection : orngOutlier - Discretization Functions : orngDisc    
  • 12. CVC TechParty Enjoy..... More at www.ailab.si/orange Piero Casale