SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
2012


   Regression Analysis
   and Cluster Analysis
   Using WEKA




            Kanishka Chakraborty (10BM60036)
                         VGSoM, IIT Kharagpur
                                    2010-2012
Table of Contents


Introduction ........................................................................................................... 3


Scope of this term paper ........................................................................................ 4


     Data Used ................................................................................................................................ 4
     Analysis Done ........................................................................................................................... 5



Analysis------------------------------------------------------------------------------------------------6


     Regression Analysis ................................................................................................................. 6
     Cluster Analysis ........................................................................................................................ 8



References------------------------------------------------------------------------------------------- 10




                                                                                                                                               2
Introduction

The amount of data generated is huge and growing at exponential rate each moment. But data
is not much of use in itself. It must be into information that can be interpreted and used. There
are multiple methods to convert data into information. Data mining is one of the methods
which help in deducing meaningful patterns and facts from the data. It has an application in
every walk of life. Any organization must rely on data mining in order to get proper insights on
which there decisions will be based. Many data mining tools are present in the market. WEKA
(Waikato Environment for Knowledge Analysis) is one such data mining tool. It is the only
toolkit that has gained such widespread popularity.

It is a java-based free tool available under GNU General Public License. It consists of many
features and hence has made it quite a popular data mining tool. It consists of many
visualization tools, algorithms and preprocessing & modeling techniques to conduct data
mining. It provides the user with both a GUI (Graphical User Interface) and CLI (Command Line
Interface).




The applications available:

      Explorer: An environment to analyze data in WEKA
      Experimenter: Environment for conducting statistical tests
      KnowledgeFlow: Same as explorer with additional feature of drag-and-drop
      Simple CLI: Provides command line interface for WEKA



The tool requires the data to be in .arff format. Arff stands for Attribute Relation File Format. It
is an ASCII file with all the attributes, their relation and values for each instance. It consists of
three parts: Relation, Attribute and Data.



                                                                                                    3
Scope of this term-paper


This paper deals with the analysis of telecom customers about their Value Added Services usage
pattern and experience. This analysis is being carried out in order to identify customers who are
likely to go for a service like 3G. The paper will also try to identify which factors are important in
order to assess which customer will adopt 3G. This information plays a major role in creation of
the marketing strategy of 3G.

DATA USED
The data that has been used for this paper was collected with the help of a survey conducted in
Guwahati, Assam. This is being done in order to identify important factors differentiating
between potential 3G customers and non-3G potential customers. The sample size used for this
analysis is 206 and consists of the following demographic segments:

      Students
      Young Professionals (<35 years of age),Working Professionals (>35 years of age)
      Housewives
      Defense personnel
      Low Income Group (Rickshaw drivers, Auto rickshaw drivers, Shopkeepers etc.)

    Variable                     Description                                 Categories
    Monthly
                     How much the customer spends on
 expenditure on                                                   <100, 100-300, 300-500, >500
                            VAS in a month
      VAS
                       Whether the customer uses
 Mobile Internet                                                              Yes, No
                         internet on their mobile
                     What has been the mobile internet
  Internet speed                                            Satisfied. Neither Satisfied nor Dissatisfied,
                      usage satisfaction level of the
    experience                                                          Dissatisfied, Not used
                                customers
                       How aware is the customer            Using 3G, Fully Aware, Partially Aware, Not
  3G Awareness
                        regarding the 3G services                             Aware
                                                            <3000, 3000-5000, 5000-7000, 7000-10000,
                     What is the price of the handset the
  Handset Price                                             10000-15000, 15000-20000, 20000-30000,
                              customer is using
                                                                              >30000
                      Whether the customer is planning
  3G usage plan                                                               Yes, No
                        to use 3G in the near future
                                                             Low income group, Housewives, Defense,
                     The age-occupation combination of
  Demography                                                Young Professionals, Working Professionals,
                               the customer
                                                                            Students


                                                                                                         4
To be usable in WEKA the data was first converted in .arff format. This is done by introducing a
few things:

       Attribute: Each variable is defined as an attribute. The data type (numeric, string etc.) is
        also defined for each attribute
       Data: The instances are input under the data header. It consists of the value for each
        attribute for the instances.




ANALYSIS DONE
The following analysis will be conducted using the tool:

       Regression
       Clustering

Regression will be carried out in order to understand the relation between the various variables
used in the data in order to predict how any variable will vary with respect to some other
variable(s). Clustering is a technique that helps to form different groups and assign each
instance to one group or another. Each group consists of instances which are similar to each
other. It has widespread usage in segmenting customers according to their characteristics and
preferences.
                                                                                                       5
Analysis


Regression Analysis
The regression analysis is used to understand the relation that a particular variable (Dependent
variable) share with others (Independent variable). For this paper the factors studied are as
follows:

       Dependent Variable: Plan to use 3G
       Independent Variable:
           o Internet mobile user
           o 3G awareness
           o Price of the handset used

STEPS TO FOLLOW
   I.   Select Classify tab
  II.   Click on the Choose button
 III.   Go to functions
 IV.    Select LinearRegression from the list
  V.    Enter the % of data wanted for the test (rest will be used for validation) from Test
        options
 VI.    Click on Start to perform the analysis




                                                                                               6
OUTPUT
The regression analysis conducted on the data gives us the following equation:



3G Planner = 0.4599 * (Internet Mobile User) + 0.0891 * (3G awareness) - 0.1325 * (Handset
price) + 0.9421



ANALYSIS OF THE OUTPUT
The output received leads to the following interpretations:

      Whether a person is planning to buy 3G depends upto a great extent to whether that
       person is using internet on their mobile or not. A person who is using internet on their
       mobile is more likely to try 3G.
      Dependence of 3G trial plan also relates to the price of the handset the respondent is
       currently using. Higher the price higher is the likelihood that the person will try 3G.
      The plan for 3G usage also depends on the 3G awareness level. The dependence is
       weak. According to the output the higher the awareness about 3G more likely it is that
       the person will try 3G.




                                                                                                  7
Cluster Analysis
Before creating a marketing strategy for any product it is very important to identify particular
segments present in the market. These segments can then be studied in order to select the one
which is best suited for targeting. For identifying the segments present in the market clustering
can be used. For this paper, K Means Clustering has been used.

STEPS TO FOLLOW
   I.   Select Cluster tab
  II.   Click on the Choose button
 III.   Select SimpleKMeans from the list
 IV.    Click on the text box besides the Choose button. Enter the number of clusters you want
        to have in numclusters
 V.     Click on Start to perform the analysis




                                                                                                 8
OUTPUT
The outputs obtained are as follows:

Cluster centroids

The centroids obtained by clustering helps in understanding the characteristics of each
segment. It provides us with information regarding each cluster according to the various
variables.

                    Attribute                   Cluster             0          1        2      3
                                               Membership          (65)      (61)     (41)   (39)
         Monthly Expense on VAS                                   1.1692        1     1.1951 1.3333
           Mobile Internet user                                    .6923        2     1.7317 .9487
Satisfaction level of mobile internet usage                       1.9385        0        0   1.2462

               3G Awareness                                       2.4769 2.6885 2.6098 2.1538
                Demography                                        4.18154 2.3607 4.6829 5.2821
           Price of Handset used                                  2.3385 2.1311 3.3415 3.8769
               3G usage plan                                         2       2   1.9268 .8974


Clustered Instances

Cluster instances basically give information regarding the number of instances that belong to
each cluster. This aids in predicting what percentage of the total population is likely to belong
to each cluster

      Cluster 0: 65 (32%)
      Cluster 1: 61 (30%)
      Cluster 2: 41 (20%)
      Cluster 3: 39 (19%)




                                                                                                    9
ANALYSIS OF THE OUTPUT
In K Means Clustering the number of clusters to be formed is entered by the user. Here the
number of clusters to be formed by the clustering tool has been assigned as 4. WEKA provided
us with the description of each cluster in terms of the centroids of each variable with respect to
the cluster. The cluster descriptions are as follows:

   Attribute        Cluster               0              1               2                 3
                   Membership            (65)           (61)           (41)              (39)
    Monthly                               <100           <100           <100             0-300
Expense on VAS
Mobile Internet                            Yes            No             No               Yes
       user
   Satisfaction                       Not Satisfied     Haven’t     Haven’t used       Satisfied
level of mobile                                          used
 internet usage
 3G Awareness                             Low            Low            Low           Fully Aware
                                       awareness      awareness      awareness
 Demography                             Working         House         Working           Young
                                      Professionals     wives       Professional    Professionals &
                                                                                       Students
Price of Handset                       3000-5000      3000-5000      5000-7000        7000-10000
      used
 3G usage plan                             No             No             No               Yes


Thus the segment to be targeted initially is the cluster 3. It consists of Young working
professionals (< 35 years of age) and students. This segment is the most likely to go for 3G
services. The awareness level of this segment is fairly high. The handset used by the members
in this segment is in the price band of 7000-10000. The members of this segment are satisfied
with the speed of internet they receive on their handsets. The cluster membership of this
segment is 19%. Thus it can be deduced according to the analysis that around 19% of the total
population consists of customers who are likely to go for a service like 3G.

                                        References

      http://www.ibm.com/developerworks/opensource/library/os-weka1/index.html
      http://en.wikipedia.org/wiki/Weka_%28machine_learning%29
      http://sourceforge.net/projects/weka/files/documentation/3.6.x/WekaManual-3-6-
       2.pdf/download


                                                                                                 10

Mais conteúdo relacionado

Destaque (6)

Weka Cluster, Classify, Associate
Weka Cluster, Classify, AssociateWeka Cluster, Classify, Associate
Weka Cluster, Classify, Associate
 
Weka
WekaWeka
Weka
 
K means cluster in weka
K means cluster in wekaK means cluster in weka
K means cluster in weka
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
WEKA: The Experimenter
WEKA: The ExperimenterWEKA: The Experimenter
WEKA: The Experimenter
 

Semelhante a Term Paper on WEKA

How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...
How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...
How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...Dana Gardner
 
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...IRJET Journal
 
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESDEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESIRJET Journal
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET Journal
 
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...Dana Gardner
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online ReviewIRJET Journal
 
[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right MarketPanji Winata
 
Consumption capability analysis for Micro-blog users based on data mining
Consumption capability analysis for Micro-blog users based on data miningConsumption capability analysis for Micro-blog users based on data mining
Consumption capability analysis for Micro-blog users based on data miningijaia
 
IRJET- Recommendation System for Electronic Products using BigData
IRJET- Recommendation System for Electronic Products using BigDataIRJET- Recommendation System for Electronic Products using BigData
IRJET- Recommendation System for Electronic Products using BigDataIRJET Journal
 
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...IRJET Journal
 
IRJET - An Intelligent Recommendation for Social Contextual Image using H...
IRJET -  	  An Intelligent Recommendation for Social Contextual Image using H...IRJET -  	  An Intelligent Recommendation for Social Contextual Image using H...
IRJET - An Intelligent Recommendation for Social Contextual Image using H...IRJET Journal
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...IRJET Journal
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating SystemIRJET Journal
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
Comparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining SoftwareComparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining SoftwareUniversitas Pembangunan Panca Budi
 
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET Journal
 
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...rahulmonikasharma
 
IRJET- Shopping Mall Experience using Beacon Technology
IRJET-  	  Shopping Mall Experience using Beacon TechnologyIRJET-  	  Shopping Mall Experience using Beacon Technology
IRJET- Shopping Mall Experience using Beacon TechnologyIRJET Journal
 

Semelhante a Term Paper on WEKA (20)

How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...
How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...
How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty...
 
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
 
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESDEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion Mining
 
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...
Spirent Leverages Big Data to Keep User Experience Quality a Winning Factor f...
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online Review
 
[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market
 
Consumption capability analysis for Micro-blog users based on data mining
Consumption capability analysis for Micro-blog users based on data miningConsumption capability analysis for Micro-blog users based on data mining
Consumption capability analysis for Micro-blog users based on data mining
 
IRJET- Recommendation System for Electronic Products using BigData
IRJET- Recommendation System for Electronic Products using BigDataIRJET- Recommendation System for Electronic Products using BigData
IRJET- Recommendation System for Electronic Products using BigData
 
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
 
IRJET - An Intelligent Recommendation for Social Contextual Image using H...
IRJET -  	  An Intelligent Recommendation for Social Contextual Image using H...IRJET -  	  An Intelligent Recommendation for Social Contextual Image using H...
IRJET - An Intelligent Recommendation for Social Contextual Image using H...
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating System
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
Comparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining SoftwareComparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining Software
 
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
 
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...
Prediction Techniques in Internet of Things (IoT) Environment: A Comparative ...
 
IRJET- Shopping Mall Experience using Beacon Technology
IRJET-  	  Shopping Mall Experience using Beacon TechnologyIRJET-  	  Shopping Mall Experience using Beacon Technology
IRJET- Shopping Mall Experience using Beacon Technology
 

Mais de Kanishka Chakraborty

Mais de Kanishka Chakraborty (7)

Simile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorialSimile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorial
 
Simile Exhibit @ VGSoM
Simile Exhibit @ VGSoMSimile Exhibit @ VGSoM
Simile Exhibit @ VGSoM
 
HTML5 vs Flash : Term paper at VGSOM, IIT Kharagpur
HTML5 vs Flash : Term paper at VGSOM, IIT KharagpurHTML5 vs Flash : Term paper at VGSOM, IIT Kharagpur
HTML5 vs Flash : Term paper at VGSOM, IIT Kharagpur
 
3g wimax
3g wimax3g wimax
3g wimax
 
Urban infrastructure in india
Urban infrastructure in indiaUrban infrastructure in india
Urban infrastructure in india
 
Euro debt crisis
Euro debt crisisEuro debt crisis
Euro debt crisis
 
Case study odct
Case study odctCase study odct
Case study odct
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Term Paper on WEKA

  • 1. 2012 Regression Analysis and Cluster Analysis Using WEKA Kanishka Chakraborty (10BM60036) VGSoM, IIT Kharagpur 2010-2012
  • 2. Table of Contents Introduction ........................................................................................................... 3 Scope of this term paper ........................................................................................ 4 Data Used ................................................................................................................................ 4 Analysis Done ........................................................................................................................... 5 Analysis------------------------------------------------------------------------------------------------6 Regression Analysis ................................................................................................................. 6 Cluster Analysis ........................................................................................................................ 8 References------------------------------------------------------------------------------------------- 10 2
  • 3. Introduction The amount of data generated is huge and growing at exponential rate each moment. But data is not much of use in itself. It must be into information that can be interpreted and used. There are multiple methods to convert data into information. Data mining is one of the methods which help in deducing meaningful patterns and facts from the data. It has an application in every walk of life. Any organization must rely on data mining in order to get proper insights on which there decisions will be based. Many data mining tools are present in the market. WEKA (Waikato Environment for Knowledge Analysis) is one such data mining tool. It is the only toolkit that has gained such widespread popularity. It is a java-based free tool available under GNU General Public License. It consists of many features and hence has made it quite a popular data mining tool. It consists of many visualization tools, algorithms and preprocessing & modeling techniques to conduct data mining. It provides the user with both a GUI (Graphical User Interface) and CLI (Command Line Interface). The applications available:  Explorer: An environment to analyze data in WEKA  Experimenter: Environment for conducting statistical tests  KnowledgeFlow: Same as explorer with additional feature of drag-and-drop  Simple CLI: Provides command line interface for WEKA The tool requires the data to be in .arff format. Arff stands for Attribute Relation File Format. It is an ASCII file with all the attributes, their relation and values for each instance. It consists of three parts: Relation, Attribute and Data. 3
  • 4. Scope of this term-paper This paper deals with the analysis of telecom customers about their Value Added Services usage pattern and experience. This analysis is being carried out in order to identify customers who are likely to go for a service like 3G. The paper will also try to identify which factors are important in order to assess which customer will adopt 3G. This information plays a major role in creation of the marketing strategy of 3G. DATA USED The data that has been used for this paper was collected with the help of a survey conducted in Guwahati, Assam. This is being done in order to identify important factors differentiating between potential 3G customers and non-3G potential customers. The sample size used for this analysis is 206 and consists of the following demographic segments:  Students  Young Professionals (<35 years of age),Working Professionals (>35 years of age)  Housewives  Defense personnel  Low Income Group (Rickshaw drivers, Auto rickshaw drivers, Shopkeepers etc.) Variable Description Categories Monthly How much the customer spends on expenditure on <100, 100-300, 300-500, >500 VAS in a month VAS Whether the customer uses Mobile Internet Yes, No internet on their mobile What has been the mobile internet Internet speed Satisfied. Neither Satisfied nor Dissatisfied, usage satisfaction level of the experience Dissatisfied, Not used customers How aware is the customer Using 3G, Fully Aware, Partially Aware, Not 3G Awareness regarding the 3G services Aware <3000, 3000-5000, 5000-7000, 7000-10000, What is the price of the handset the Handset Price 10000-15000, 15000-20000, 20000-30000, customer is using >30000 Whether the customer is planning 3G usage plan Yes, No to use 3G in the near future Low income group, Housewives, Defense, The age-occupation combination of Demography Young Professionals, Working Professionals, the customer Students 4
  • 5. To be usable in WEKA the data was first converted in .arff format. This is done by introducing a few things:  Attribute: Each variable is defined as an attribute. The data type (numeric, string etc.) is also defined for each attribute  Data: The instances are input under the data header. It consists of the value for each attribute for the instances. ANALYSIS DONE The following analysis will be conducted using the tool:  Regression  Clustering Regression will be carried out in order to understand the relation between the various variables used in the data in order to predict how any variable will vary with respect to some other variable(s). Clustering is a technique that helps to form different groups and assign each instance to one group or another. Each group consists of instances which are similar to each other. It has widespread usage in segmenting customers according to their characteristics and preferences. 5
  • 6. Analysis Regression Analysis The regression analysis is used to understand the relation that a particular variable (Dependent variable) share with others (Independent variable). For this paper the factors studied are as follows:  Dependent Variable: Plan to use 3G  Independent Variable: o Internet mobile user o 3G awareness o Price of the handset used STEPS TO FOLLOW I. Select Classify tab II. Click on the Choose button III. Go to functions IV. Select LinearRegression from the list V. Enter the % of data wanted for the test (rest will be used for validation) from Test options VI. Click on Start to perform the analysis 6
  • 7. OUTPUT The regression analysis conducted on the data gives us the following equation: 3G Planner = 0.4599 * (Internet Mobile User) + 0.0891 * (3G awareness) - 0.1325 * (Handset price) + 0.9421 ANALYSIS OF THE OUTPUT The output received leads to the following interpretations:  Whether a person is planning to buy 3G depends upto a great extent to whether that person is using internet on their mobile or not. A person who is using internet on their mobile is more likely to try 3G.  Dependence of 3G trial plan also relates to the price of the handset the respondent is currently using. Higher the price higher is the likelihood that the person will try 3G.  The plan for 3G usage also depends on the 3G awareness level. The dependence is weak. According to the output the higher the awareness about 3G more likely it is that the person will try 3G. 7
  • 8. Cluster Analysis Before creating a marketing strategy for any product it is very important to identify particular segments present in the market. These segments can then be studied in order to select the one which is best suited for targeting. For identifying the segments present in the market clustering can be used. For this paper, K Means Clustering has been used. STEPS TO FOLLOW I. Select Cluster tab II. Click on the Choose button III. Select SimpleKMeans from the list IV. Click on the text box besides the Choose button. Enter the number of clusters you want to have in numclusters V. Click on Start to perform the analysis 8
  • 9. OUTPUT The outputs obtained are as follows: Cluster centroids The centroids obtained by clustering helps in understanding the characteristics of each segment. It provides us with information regarding each cluster according to the various variables. Attribute Cluster  0 1 2 3 Membership (65) (61) (41) (39) Monthly Expense on VAS 1.1692 1 1.1951 1.3333 Mobile Internet user .6923 2 1.7317 .9487 Satisfaction level of mobile internet usage 1.9385 0 0 1.2462 3G Awareness 2.4769 2.6885 2.6098 2.1538 Demography 4.18154 2.3607 4.6829 5.2821 Price of Handset used 2.3385 2.1311 3.3415 3.8769 3G usage plan 2 2 1.9268 .8974 Clustered Instances Cluster instances basically give information regarding the number of instances that belong to each cluster. This aids in predicting what percentage of the total population is likely to belong to each cluster  Cluster 0: 65 (32%)  Cluster 1: 61 (30%)  Cluster 2: 41 (20%)  Cluster 3: 39 (19%) 9
  • 10. ANALYSIS OF THE OUTPUT In K Means Clustering the number of clusters to be formed is entered by the user. Here the number of clusters to be formed by the clustering tool has been assigned as 4. WEKA provided us with the description of each cluster in terms of the centroids of each variable with respect to the cluster. The cluster descriptions are as follows: Attribute Cluster  0 1 2 3 Membership (65) (61) (41) (39) Monthly <100 <100 <100 0-300 Expense on VAS Mobile Internet Yes No No Yes user Satisfaction Not Satisfied Haven’t Haven’t used Satisfied level of mobile used internet usage 3G Awareness Low Low Low Fully Aware awareness awareness awareness Demography Working House Working Young Professionals wives Professional Professionals & Students Price of Handset 3000-5000 3000-5000 5000-7000 7000-10000 used 3G usage plan No No No Yes Thus the segment to be targeted initially is the cluster 3. It consists of Young working professionals (< 35 years of age) and students. This segment is the most likely to go for 3G services. The awareness level of this segment is fairly high. The handset used by the members in this segment is in the price band of 7000-10000. The members of this segment are satisfied with the speed of internet they receive on their handsets. The cluster membership of this segment is 19%. Thus it can be deduced according to the analysis that around 19% of the total population consists of customers who are likely to go for a service like 3G. References  http://www.ibm.com/developerworks/opensource/library/os-weka1/index.html  http://en.wikipedia.org/wiki/Weka_%28machine_learning%29  http://sourceforge.net/projects/weka/files/documentation/3.6.x/WekaManual-3-6- 2.pdf/download 10