SlideShare a Scribd company logo
1 of 30
WEKA: A MODERN APPLICATION
OF DATA MINING TECHNIQUES
SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
What is WEKA?
• Desktop application for machine learning & data mining
• Open source Java based tool
• Offers commonly used algorithms to model data.
• University of Waikato, New Zealand
What is Data Mining & Machine Learning?
• Data Mining :
• Searching for patterns in data
• Finding value in data
• Machine Learning:
• Developing models which computational resources can use
• Using computational resources to model data to predict a likely outcome.
Features of WEKA
• Pre-process data
• Classification & Clustering
• Association rules
• 3D visualisation
Choosing the Dataset
• Public datasets:
•data.gov.uk
•kaggle.com: such as Titanic dataset
•UCI Machine Learning Repository
• Dataset which could provide insight to a real world scenario
• Would model effectively in WEKA: several properties
Capital Bikeshare
Picture: Alejandro Castro, flickr, creative commons
• Bike-share system in Washington DC and surrounding area
• https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
The Objective
• Investigate factors affecting bike-share usage
• Could this data be used to predict how busy or quiet a bike share
system may be on a given day?
Dataset fields
• Record index
• Time information
•Date, day of the week, whether day is holiday, whether day is working day,
month, year, season (1-4 spring/summer/autumn/winter)
•Weather Information
•weather description (separated into four distinct results which are roughly good
to bad)
•normalized values for temperature, ‘feels like’ temperature, humidity and
windspeed
•Totals
•counts for bikes rented by registered and casual users
•total count for bikes registered that day
Pre-processing
• Remove fields which don’t help prediction
•indexes, sub-totals etc
• Filters
• Discretize - categorise into discrete values
• ClassBalancer - re-weights instances so more evenly spread
Data Visualisation
Basic terminology to understand evolution of
classifiers
•True positive(tp): An instance is correctly predicted to belong to the
given class
•True negative(tn): An instance is correctly predicted not to belong
to the given class
•False positive(fp): An instance is incorrectly predicted to belong to
the given class
•False negative(fn): An instance is incorrectly predicted not to belong
to the given class
Explanation of Statistics
• Precision:
• Recall:
• F-measure:
Algorithms explored
Graph based:
• J48 - This classifier uses a tree structure to make decisions.
•Performs very good for our dataset
Algorithms explored
Rule based :
• ZeroR - ZeroR is the simplest classification method which relies on the target
and ignores all predictors.
•Not good for our dataset
Algorithms explored
Naïve Bayes
•This is a probabilistic classifier based on Bayes Theorem which
analyses the relationship between features and class labels.
•. This classifier can handle missing values by ignoring them during
calculation of the conditional probabilities.
Testset Division
Training and Testing set:
-Training data is used for building a ML model
-Testing data is used for measuring performance of a ML model
Supplying testing set in WekaSeparate training and testing
Testset Division
Cross Validation:
-To overcome the problem of overfitting
-Makes the predictions more general
•Includes:
-Splitting the original dataset into k equal parts (folds)
-Takes out one fold aside, and performs training over the rest k-1
folds and measures the performance
-Repeats the process k times by taking different fold each time.
•10-fold cross-validation : k = 10
Testset Division
Percentage split
-Randomly split your dataset into a training and a testing partitions
each time you evaluate a model.
Dividing original dataset into testing and training
For example:
If we have a data of 100
instances and we would like
to split 66% as training and
34% as test set using
percentage split
What is Clustering?
• Finding the class labels and the number of classes directly from the
data (in contrast to classification).
• It is unsupervised learning:
We want to explore the data to find some structures in them.
What is clustering for?
● Grouping items of similar properties together into clusters.
● For example to apply machine learning approaches to make
decisions based on data e.g. for classifying : “small”, “medium” and
“large” T-Shirts.
Clustering types:
Clustering types:
Some popular Clustering Algorithms
•K- means clustering (disjoint sets)
•EM clustering (probabilistic)
•Cobweb clustering (hierarchical)
KMeans: Iterative distance-based clustering
(disjoint sets)
1. Specify k, the desired number of clusters
2. Choose k points at random as cluster centers
3. Assign all instances to their closest cluster center
4. Calculate the centroid (i.e., mean) of instances in each cluster
5. These centroids are the new cluster centers
6. Continue until the cluster centers don’t change
Minimizes the total squared distance from instances to their cluster
centers.
K-means in Weka
•Note parameters:
• numClusters
•distanceFunction
How can we tell the
right number of clusters?
In general, this is
an unsolved problem
Clustering is subjective
•Use the AddCluster
unsupervised attribute filter
•Hard to evaluate clustering
Trying to cluster into seasons
Using K-means clustering, with k=4, we wish to see if the data falls
into the clusters based on the seasons
Observations
• We found that winter and summer months have separated into two
distinct clusters.
• The autumn and spring months have not separated so well.
• From the visualisation we also see the overall trend of more users in
the summer months compared to winter ones.
• This is not surprising since these months are hotter and people are
more likely to choose to rent bikes.
Possible Improvements
• Data accuracy
• Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc.
• As popularity increases -> may affect results.
• Data precision
• Bad measurements, subjective opinions(weather): generalised - exact calculations needed.
• Variable factors e.g. “temperature or weather” is different depending on exact location.
• Data itself always changing: only an indicator of some relationships.
• Different people: e.g. tourists – different people may have different attitudes
• Different locations yield different results: weather is variable across continents.
Evaluation of best approach
• J48 - easy to visualise
• Zero R is a bad idea for our dataset
Overall : the best approach is to analyse several different WEKA modules
and compare results to focus efforts and find the best solution.
• Graphs of properties: can indicate most important factors to be classified
• Classification algorithms: to build a model
• Testing the model is also crucial.
Conclusions based on data
• Dataset suitability - probably more suited to classification than
clustering
• Some prediction was possible
• External factors - other changes in the transport network, cycling
for health, city events
• Other possible analysis: usage by hour, casual users
• Applications: Smart cities & planning - effective bikeshare provision

More Related Content

What's hot

K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Data analytics in banking sector
Data analytics in banking sectorData analytics in banking sector
Data analytics in banking sectorSnigdhaGupta23
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Credit card fraud detection using random forest & cart algorithm
Credit card fraud detection using random forest & cart algorithmCredit card fraud detection using random forest & cart algorithm
Credit card fraud detection using random forest & cart algorithmVenkat Projects
 
Credit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationCredit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationAyapparaj SKS
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...Edureka!
 
What is Payment Tokenization?
What is Payment Tokenization?What is Payment Tokenization?
What is Payment Tokenization?Rambus Inc
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTRishabhTyagi48
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 

What's hot (20)

K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Data analytics in banking sector
Data analytics in banking sectorData analytics in banking sector
Data analytics in banking sector
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Atm security
Atm securityAtm security
Atm security
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Credit card fraud detection using random forest & cart algorithm
Credit card fraud detection using random forest & cart algorithmCredit card fraud detection using random forest & cart algorithm
Credit card fraud detection using random forest & cart algorithm
 
Decision trees
Decision treesDecision trees
Decision trees
 
Credit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationCredit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client Presentation
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...
 
What is Payment Tokenization?
What is Payment Tokenization?What is Payment Tokenization?
What is Payment Tokenization?
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Data mining
Data miningData mining
Data mining
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 

Viewers also liked

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Pratik Doshi
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copiaSOTO ZOTITO
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Ishan Awadhesh
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantallShantall0
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Matthew Courtney
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEKenneth Rosales
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)CASA, UCL
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representationweka Content
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedShareek Ahamed
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributesweka Content
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newninfaames
 

Viewers also liked (17)

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copia
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantall
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSE
 
Amazon
AmazonAmazon
Amazon
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representation
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer new
 

Similar to Weka bike rental

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxAkash527744
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data miningAshikur Rahman
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 

Similar to Weka bike rental (20)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
machine learning
machine learningmachine learning
machine learning
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 

Recently uploaded

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Weka bike rental

  • 1. WEKA: A MODERN APPLICATION OF DATA MINING TECHNIQUES SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
  • 2. What is WEKA? • Desktop application for machine learning & data mining • Open source Java based tool • Offers commonly used algorithms to model data. • University of Waikato, New Zealand
  • 3. What is Data Mining & Machine Learning? • Data Mining : • Searching for patterns in data • Finding value in data • Machine Learning: • Developing models which computational resources can use • Using computational resources to model data to predict a likely outcome.
  • 4. Features of WEKA • Pre-process data • Classification & Clustering • Association rules • 3D visualisation
  • 5. Choosing the Dataset • Public datasets: •data.gov.uk •kaggle.com: such as Titanic dataset •UCI Machine Learning Repository • Dataset which could provide insight to a real world scenario • Would model effectively in WEKA: several properties
  • 6. Capital Bikeshare Picture: Alejandro Castro, flickr, creative commons • Bike-share system in Washington DC and surrounding area • https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  • 7. The Objective • Investigate factors affecting bike-share usage • Could this data be used to predict how busy or quiet a bike share system may be on a given day?
  • 8. Dataset fields • Record index • Time information •Date, day of the week, whether day is holiday, whether day is working day, month, year, season (1-4 spring/summer/autumn/winter) •Weather Information •weather description (separated into four distinct results which are roughly good to bad) •normalized values for temperature, ‘feels like’ temperature, humidity and windspeed •Totals •counts for bikes rented by registered and casual users •total count for bikes registered that day
  • 9. Pre-processing • Remove fields which don’t help prediction •indexes, sub-totals etc • Filters • Discretize - categorise into discrete values • ClassBalancer - re-weights instances so more evenly spread
  • 11. Basic terminology to understand evolution of classifiers •True positive(tp): An instance is correctly predicted to belong to the given class •True negative(tn): An instance is correctly predicted not to belong to the given class •False positive(fp): An instance is incorrectly predicted to belong to the given class •False negative(fn): An instance is incorrectly predicted not to belong to the given class
  • 12. Explanation of Statistics • Precision: • Recall: • F-measure:
  • 13. Algorithms explored Graph based: • J48 - This classifier uses a tree structure to make decisions. •Performs very good for our dataset
  • 14. Algorithms explored Rule based : • ZeroR - ZeroR is the simplest classification method which relies on the target and ignores all predictors. •Not good for our dataset
  • 15. Algorithms explored Naïve Bayes •This is a probabilistic classifier based on Bayes Theorem which analyses the relationship between features and class labels. •. This classifier can handle missing values by ignoring them during calculation of the conditional probabilities.
  • 16. Testset Division Training and Testing set: -Training data is used for building a ML model -Testing data is used for measuring performance of a ML model Supplying testing set in WekaSeparate training and testing
  • 17. Testset Division Cross Validation: -To overcome the problem of overfitting -Makes the predictions more general •Includes: -Splitting the original dataset into k equal parts (folds) -Takes out one fold aside, and performs training over the rest k-1 folds and measures the performance -Repeats the process k times by taking different fold each time. •10-fold cross-validation : k = 10
  • 18. Testset Division Percentage split -Randomly split your dataset into a training and a testing partitions each time you evaluate a model. Dividing original dataset into testing and training For example: If we have a data of 100 instances and we would like to split 66% as training and 34% as test set using percentage split
  • 19. What is Clustering? • Finding the class labels and the number of classes directly from the data (in contrast to classification). • It is unsupervised learning: We want to explore the data to find some structures in them. What is clustering for? ● Grouping items of similar properties together into clusters. ● For example to apply machine learning approaches to make decisions based on data e.g. for classifying : “small”, “medium” and “large” T-Shirts.
  • 22. Some popular Clustering Algorithms •K- means clustering (disjoint sets) •EM clustering (probabilistic) •Cobweb clustering (hierarchical)
  • 23. KMeans: Iterative distance-based clustering (disjoint sets) 1. Specify k, the desired number of clusters 2. Choose k points at random as cluster centers 3. Assign all instances to their closest cluster center 4. Calculate the centroid (i.e., mean) of instances in each cluster 5. These centroids are the new cluster centers 6. Continue until the cluster centers don’t change Minimizes the total squared distance from instances to their cluster centers.
  • 24. K-means in Weka •Note parameters: • numClusters •distanceFunction How can we tell the right number of clusters? In general, this is an unsolved problem Clustering is subjective
  • 25. •Use the AddCluster unsupervised attribute filter •Hard to evaluate clustering
  • 26. Trying to cluster into seasons Using K-means clustering, with k=4, we wish to see if the data falls into the clusters based on the seasons
  • 27. Observations • We found that winter and summer months have separated into two distinct clusters. • The autumn and spring months have not separated so well. • From the visualisation we also see the overall trend of more users in the summer months compared to winter ones. • This is not surprising since these months are hotter and people are more likely to choose to rent bikes.
  • 28. Possible Improvements • Data accuracy • Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc. • As popularity increases -> may affect results. • Data precision • Bad measurements, subjective opinions(weather): generalised - exact calculations needed. • Variable factors e.g. “temperature or weather” is different depending on exact location. • Data itself always changing: only an indicator of some relationships. • Different people: e.g. tourists – different people may have different attitudes • Different locations yield different results: weather is variable across continents.
  • 29. Evaluation of best approach • J48 - easy to visualise • Zero R is a bad idea for our dataset Overall : the best approach is to analyse several different WEKA modules and compare results to focus efforts and find the best solution. • Graphs of properties: can indicate most important factors to be classified • Classification algorithms: to build a model • Testing the model is also crucial.
  • 30. Conclusions based on data • Dataset suitability - probably more suited to classification than clustering • Some prediction was possible • External factors - other changes in the transport network, cycling for health, city events • Other possible analysis: usage by hour, casual users • Applications: Smart cities & planning - effective bikeshare provision