SlideShare a Scribd company logo
1 of 23
Download to read offline
WEKAWEKA
A . Antony Alex MCA
Dr G R D College of Science – CBE
Tamil Nadu - India
Waikato Environment forWaikato Environment for
Knowledge AnalysisKnowledge Analysis
A collection of open source ML algorithms
◦ pre-processing
◦ classifiers
◦ clustering
◦ association rule
It’s a data mining/machine learning tool developed byIt’s a data mining/machine learning tool developed by
Department of Computer Science, University of Waikato,
New Zealand.
Weka is also a bird found only on the islands of New
Zealand.
Java based
Routines are implemented as classes and logically
arranged in packages
Comes with an extensive GUI interface
Download and Install WEKADownload and Install WEKA
Website:
http://www.cs.waikato.ac.nz/~ml/weka/index.
html
Support multiple platforms (written in java):Support multiple platforms (written in java):
◦ Windows, Mac OS X and Linux
39/8/2012
Main FeaturesMain Features
49 data preprocessing tools
76 classification/regression algorithms
8 clustering algorithms
3 algorithms for finding association rules
15 attribute/subset evaluators + 10 search15 attribute/subset evaluators + 10 search
algorithms for feature selection
49/8/2012
• Dataset
• Classifier
• Weka.filters
• Weka.classifiers
java weka.core.converters.CSVLoader data.csv > data.arff
Command line interface
java weka.core.converters.CSVLoader data.csv > data.arff
java weka.core.converters.C45Loader c45_filestem > data.arff
java weka.classifiers.rules.ZeroR -t weather.arff
java weka.classifiers.trees.J48 -t weather.arff
java weka.filters.supervised.attribute.Discretize -i data/iris.arff  -o
iris-nom.arff -c last
java weka.filters.supervised.attribute.Discretize -i data/cpu.arff  -o
cpu-classvendor-nom.arff -c first
Main GUIMain GUI
Three graphical user interfaces
◦ “The Explorer” (exploratory data
analysis)
◦ “The Experimenter” (experimental
environment)
◦ “The KnowledgeFlow” (new process◦ “The KnowledgeFlow” (new process
model inspired interface)
69/8/2012
Explorer: preExplorer: pre--processing the dataprocessing the data
Data can be imported from a file in various
formats:ARFF, CSV, C4.5, binary
Data can also be read from a URL or from
an SQL database (using JDBC)
Pre-processing tools inWEKA are called
9/8/2012 7
Pre-processing tools inWEKA are called
“filters”
WEKA contains filters for:
◦ Discretization, normalization, resampling,
attribute selection, transforming and combining
attributes, …
DatabaseUtils.props.hsql - HSQLDB
DatabaseUtils.props.msaccess - MS Access
jdbcDriver
jdbcURL
ACCESSING DATABASEACCESSING DATABASE
DatabaseUtils.props.mssqlserver - MS SQL Server
DatabaseUtils.props.mysql - MySQL
DatabaseUtils.props.odbc - ODBC access via ODBC/JDBC
bridge,
DatabaseUtils.props.oracle - Oracle 10g
DatabaseUtils.props.postgresql - PostgreSQL 7.4
DatabaseUtils.props.sqlite3 - sqlite 3.x
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
WEKA “flat” filesWEKA “flat” files
9/8/2012 9
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
9/8/2012 University of Waikato 10
9/8/2012 University of Waikato 11
9/8/2012 University of Waikato 12
9/8/2012 University of Waikato 13
WEKA:: Explorer: building “classifiers”WEKA:: Explorer: building “classifiers”
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
◦ Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
◦ Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
Explorer: clustering dataExplorer: clustering data
WEKA contains “clusterers” for finding
groups of similar instances in a dataset
Implemented schemes are:
◦ k-Means, EM, Cobweb, X-means, FarthestFirst
9/8/2012 16
◦ k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to
“true” clusters
Explorer: finding associationsExplorer: finding associations
WEKA contains an implementation of the
Apriori algorithm for learning association rules
◦ Works only with discrete data
Can identify statistical dependencies between
9/8/2012 17
Can identify statistical dependencies between
groups of attributes:
◦ milk, butter ⇒ bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given
confidence
Explorer: attribute selectionExplorer: attribute selection
Panel that can be used to investigate which
(subsets of) attributes are the most predictive
ones
Attribute selection methods contain two parts:
9/8/2012 18
◦ A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
◦ An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
Very flexible:WEKA allows (almost) arbitrary
combinations of these two
Explorer: data visualizationExplorer: data visualization
Visualization very useful in practice: e.g. helps
to determine difficulty of the learning
problem
WEKA can visualize single attributes (1-d)
and pairs of attributes (2-d)
◦ To do: rotating 3-d visualizations (Xgobi-style)
9/8/2012 19
◦ To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal
attributes (and to detect “hidden” data
points)
“Zoom-in” function
Thank UThank U

More Related Content

What's hot

What's hot (20)

Java swing
Java swingJava swing
Java swing
 
Collections in Java
Collections in JavaCollections in Java
Collections in Java
 
Interface in java
Interface in javaInterface in java
Interface in java
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Arrays in PHP
Arrays in PHPArrays in PHP
Arrays in PHP
 
SQLITE Android
SQLITE AndroidSQLITE Android
SQLITE Android
 
Java Collections | Collections Framework in Java | Java Tutorial For Beginner...
Java Collections | Collections Framework in Java | Java Tutorial For Beginner...Java Collections | Collections Framework in Java | Java Tutorial For Beginner...
Java Collections | Collections Framework in Java | Java Tutorial For Beginner...
 
Advance Java Programming (CM5I) 2.Swing
Advance Java Programming (CM5I) 2.SwingAdvance Java Programming (CM5I) 2.Swing
Advance Java Programming (CM5I) 2.Swing
 
Java Collections
Java CollectionsJava Collections
Java Collections
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursors
 
Trees
TreesTrees
Trees
 
Constructor ppt
Constructor pptConstructor ppt
Constructor ppt
 
Data structures & problem solving unit 1 ppt
Data structures & problem solving unit 1 pptData structures & problem solving unit 1 ppt
Data structures & problem solving unit 1 ppt
 
Jdbc ppt
Jdbc pptJdbc ppt
Jdbc ppt
 
JDBC: java DataBase connectivity
JDBC: java DataBase connectivityJDBC: java DataBase connectivity
JDBC: java DataBase connectivity
 
Sequences and indexes
Sequences and indexesSequences and indexes
Sequences and indexes
 
Network programming in Java
Network programming in JavaNetwork programming in Java
Network programming in Java
 
VB Script
VB ScriptVB Script
VB Script
 
Sql views
Sql viewsSql views
Sql views
 
Java Applet
Java AppletJava Applet
Java Applet
 

Similar to Weka

Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningKeshab Kumar Gaurav
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka applicationRezapourabbas
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Wekaweka Content
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfDr. Rajesh P Barnwal
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
1.5 weka an intoduction
1.5 weka an intoduction1.5 weka an intoduction
1.5 weka an intoductionKrish_ver2
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptradhikadsu
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 

Similar to Weka (20)

Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Wek1
Wek1Wek1
Wek1
 
Weka
Weka Weka
Weka
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Weka
WekaWeka
Weka
 
1.5 weka an intoduction
1.5 weka an intoduction1.5 weka an intoduction
1.5 weka an intoduction
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.ppt
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 
Data Quality
Data QualityData Quality
Data Quality
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 
Jdbc
JdbcJdbc
Jdbc
 

More from Antony Alex

Transposition cipher
Transposition cipherTransposition cipher
Transposition cipherAntony Alex
 
Textile management system review iii
Textile management system   review iiiTextile management system   review iii
Textile management system review iiiAntony Alex
 
Software project management requirements analysis
Software project management requirements analysisSoftware project management requirements analysis
Software project management requirements analysisAntony Alex
 
Installing windows xp
Installing windows xpInstalling windows xp
Installing windows xpAntony Alex
 
Application express
Application expressApplication express
Application expressAntony Alex
 

More from Antony Alex (10)

Transposition cipher
Transposition cipherTransposition cipher
Transposition cipher
 
Topdown parsing
Topdown parsingTopdown parsing
Topdown parsing
 
Textile management system review iii
Textile management system   review iiiTextile management system   review iii
Textile management system review iii
 
Sound
SoundSound
Sound
 
Software project management requirements analysis
Software project management requirements analysisSoftware project management requirements analysis
Software project management requirements analysis
 
Site map & web
Site map & webSite map & web
Site map & web
 
Review ii
Review iiReview ii
Review ii
 
Installing windows xp
Installing windows xpInstalling windows xp
Installing windows xp
 
Application express
Application expressApplication express
Application express
 
Android
AndroidAndroid
Android
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Weka

  • 1. WEKAWEKA A . Antony Alex MCA Dr G R D College of Science – CBE Tamil Nadu - India
  • 2. Waikato Environment forWaikato Environment for Knowledge AnalysisKnowledge Analysis A collection of open source ML algorithms ◦ pre-processing ◦ classifiers ◦ clustering ◦ association rule It’s a data mining/machine learning tool developed byIt’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand. Java based Routines are implemented as classes and logically arranged in packages Comes with an extensive GUI interface
  • 3. Download and Install WEKADownload and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index. html Support multiple platforms (written in java):Support multiple platforms (written in java): ◦ Windows, Mac OS X and Linux 39/8/2012
  • 4. Main FeaturesMain Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search15 attribute/subset evaluators + 10 search algorithms for feature selection 49/8/2012
  • 5. • Dataset • Classifier • Weka.filters • Weka.classifiers java weka.core.converters.CSVLoader data.csv > data.arff Command line interface java weka.core.converters.CSVLoader data.csv > data.arff java weka.core.converters.C45Loader c45_filestem > data.arff java weka.classifiers.rules.ZeroR -t weather.arff java weka.classifiers.trees.J48 -t weather.arff java weka.filters.supervised.attribute.Discretize -i data/iris.arff -o iris-nom.arff -c last java weka.filters.supervised.attribute.Discretize -i data/cpu.arff -o cpu-classvendor-nom.arff -c first
  • 6. Main GUIMain GUI Three graphical user interfaces ◦ “The Explorer” (exploratory data analysis) ◦ “The Experimenter” (experimental environment) ◦ “The KnowledgeFlow” (new process◦ “The KnowledgeFlow” (new process model inspired interface) 69/8/2012
  • 7. Explorer: preExplorer: pre--processing the dataprocessing the data Data can be imported from a file in various formats:ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools inWEKA are called 9/8/2012 7 Pre-processing tools inWEKA are called “filters” WEKA contains filters for: ◦ Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
  • 8. DatabaseUtils.props.hsql - HSQLDB DatabaseUtils.props.msaccess - MS Access jdbcDriver jdbcURL ACCESSING DATABASEACCESSING DATABASE DatabaseUtils.props.mssqlserver - MS SQL Server DatabaseUtils.props.mysql - MySQL DatabaseUtils.props.odbc - ODBC access via ODBC/JDBC bridge, DatabaseUtils.props.oracle - Oracle 10g DatabaseUtils.props.postgresql - PostgreSQL 7.4 DatabaseUtils.props.sqlite3 - sqlite 3.x
  • 9. @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} WEKA “flat” filesWEKA “flat” files 9/8/2012 9 @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...
  • 14. WEKA:: Explorer: building “classifiers”WEKA:: Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: ◦ Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: ◦ Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
  • 15.
  • 16. Explorer: clustering dataExplorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: ◦ k-Means, EM, Cobweb, X-means, FarthestFirst 9/8/2012 16 ◦ k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters
  • 17. Explorer: finding associationsExplorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules ◦ Works only with discrete data Can identify statistical dependencies between 9/8/2012 17 Can identify statistical dependencies between groups of attributes: ◦ milk, butter ⇒ bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
  • 18. Explorer: attribute selectionExplorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: 9/8/2012 18 ◦ A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking ◦ An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible:WEKA allows (almost) arbitrary combinations of these two
  • 19. Explorer: data visualizationExplorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) ◦ To do: rotating 3-d visualizations (Xgobi-style) 9/8/2012 19 ◦ To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
  • 20.
  • 21.
  • 22.