In this presentation, we will show to how use Python for Machine Learning. The Orange framework, a open-source data mining tool developed at the University of Ljubljiana will be used. Orange is a scriptable environment for fast prototyping of new algorithms and testing schemes. It is a collection of Python-based modules that sit over the C++ core library and implement some functionality for which execution time is not crucial and which is easier done in Python than in C++.
6. CVC TechParty
My First Classifier in Orange : Bayes
- Loading Data:
iris = orange.ExampleTable('iris.tab')
- Declare the Learning Function:
bayes = orange.BayesLearner()
- Train the Bayes Classifier on Data:
BayesClassifier = bayes(iris)
- Classify new data:
Prediction = bayesClassifier(newExample)
_ Example on Iris Dataset:
exCodes.showBayes()
7. CVC TechParty
My (Second) Classifier in Orange :
Decision Trees
- As before:
import orngTree
treeLearner = orngTree.TreeLearner()
treeClassifier = treeLearner(iris)
prediction = treeClassifier(newExample)
_ Measures for splitting : infoGain, gainRatio, gini
treeLearner = orngTree.TreeLearner(measure='gini')
- Print the Tree:
- on screen : orngTree.printTree(treeClassifier)
- save as an image :
orngTree.printDot(treeClassifier, fileName='tree.dot')
dot -Tpng tree.dot -otree.png
8. CVC TechParty
Testing and Evaluating a Classifier
- Testing Functions in orngTest
import orngTest
learners = [bayesLearner, treeLearner]
- Make a 10 folds Cross Validation
xv = orngTest.crossValidation(learners, data, folds=10)
- Scores Functions in orngStat
import orngStat
accuracy = orngStat.CA(xv)
confusionMatrix = orngStat.cm(xv)
- Example on Iris Dataset using Bayes, DecisionTree and Knn.
exCodes.crossValidate()
9. CVC TechParty
Ensemble Methods
- Basic Ensemble Methods in orngEnsemble
Bagging, Boosting and Random Forest
import orngEnsemble
- Bagging of Decision Trees
treeLearner = orngTree.TreeLearner()
baggedTrees = orngEnsemble.BaggedLearner(treeLearner, t=10)
- Boosting of Decision Trees
treeLearner = orngTree.TreeLearner()
boostedTrees = orngEnsemble.BoostedLearner(treeLearner, t=10)
- Random Forest
forest = orngEnsemble.RandomForestLearner(trees = 10)
- Example on Iris Dataset:
exCodes.crossValidateEnsembles()
10. CVC TechParty
Features Selection
- Functions for Features Selectoin in orngFSS
import orngFSS
vehicle = orange.ExampleTable('vehicle.tab')
- Measuring Import of features with Information Gain
measures = orngFSS.attMeasure(vehicle)
TenBests = orngFSS.bestNAtts(measures,n=10)
- Measuring Import of features with Gain Ratio
gainRatio = orange.MeasureAttribute_gainRatio()
measures = orngFSS.attMeasure(vehicle,gainRatio)
fiveBests = orngFSS.bestNAtts(measures,n=5)
- Example on Vehicle Dataset:
exCodes.measureAttributes()