SlideShare uma empresa Scribd logo
1 de 158
Baixar para ler offline
600+ analysts
Our dreams
Inspiration
Online & offline

       Marketing research domains

Friend organisations

               Neighbour countries
website!
Join
the conversation !
Marketing Researchers TODAY ?!
C            L
Well known
          Popular
 Unique
An explosion of data !
Fatigue among respondents !
Me, MySpace & I
     Visual (n) ethnography among 13-17 year olds

Joeri Van den Bergh (InSites Consulting)
      Veerle Colin (MTV Networks)
The Research Briefing

WHY TELL ME WHY
Getting intimate with our target groups


  ID construction of adolescents in a
   digitalised world

 Social groups as an extension of
  psychographic segments

Role of brands in ID construction within
 social groups
The Research Approach

I SEE YOU, BABY
Traditional
 ethnography
      context



      Participant




  R




Hawthorne effect
  Researcher
      gaze
  Time & Cost
    intensive
From traditional to visual ethnography
  Traditional                  360° visual
 ethnography                  ethnography
      context                          context



      Participant                      Participant




  R


                                 R
                     Contact with researcher
Hawthorne effect
                           via 2.0 tools
  Researcher         “Informer” gaze: what is
      gaze                  important?
  Time & Cost         Follow multiple participants from anywhere
    intensive
360° Ethnography


1.User generated MM ethnography
 = Observation takes place via photos/video
 taken by the participants to the study
 = Participants observe their own
 environment & report back to the
 researcher
Ethnographic blog: identity related tasks




     Non me                                   Aspirational
   Pictures of...                                  me
  Clothes you no                                Pictures of...
   longer want to                                                        Other groups
         wear                                 Favorite clothes
 Youngsters where                            youngsters where
 you so not want to                           you would like to   Pictures of other groups that
   be friend with       Personal me            be friend with     are different but okay
                          Pictures of...

                      Place where you can                          Pictures of youth of today
                        really be yourself
                      Clothes you wear at
                               home
                                                                  Pictures of normal persons
                         Objects that are
                          typical for me


                           Social me
                          Pictures of...

                       Important persons
                            in your life
                        Clothes you wear
                             to go out
                            Groups of
                       youngsters you like
360° Ethnography
1.User generated ethnography
 = Observation takes place via photos/video taken
 by the participants to the study
 = Participants observer their own environment &
 report back to the researcher



2.Nethnography: social life of teenagers is very
 much MOVING ONLINE !!!
 = observation of the online behaviour & content
 of a target group or within a certain webspace
Nethnography: we are your friends
Social Network Sites: nethnography


                   Social
                  Personal
                  identity
                   identity


                 Nicknam
                      e
                               Conversation
   Clan member    Profile     Monitoring on
   ship             text         guestbook
                  Profile
                  picture
                  Photo
                 collection
And for the datamining freaks among you, Annelies ripped the internet



• 300 active participants of netlog randomly
  selected. Equal spread age x gender
• Webcrawlers to „scan‟ pages of netlog and
  substract content.
• Textmining: profile pages – photo tags –
  clan membership – conversations on the
  guestbook
Other online behaviour: tracking tool
That‟s why we do call it (visual) (n)ethnograpy: shoot me!

100% UGC OBSERVED BY US
The Research Analysis

THE HARD PART
Analysis




INTIMACY
ME-SEUM
Social Networks are not so social as you think they are

            Only 32% of the conversations on the guestbook
                           are „interactive‟!
Food              The rest are all single statements. kids
                                           Shoes   Litte
                            Mobile phones                                                                             Cars
                                                                Quarrel

  Miss you
                                        Making fun
                                                                                Confirm friendship
                                                                                       Travel
           Music           Welcome by strangers                 Transport
                                                                                     It was great

                    Feedback on picture                                                                    I’ am bored


        Youth movement             Alcohol                           Sleeping
                                                                                            How are you?

  Practical appointment                                         School
                                             Feedback on profiles
                                                                                      MSN                        Festivals

 Congratulations                                            Family          Gaming
                                                                                                    Age

                            Food
                                                                                            Party             Sport
           Online movies                                Express love
Study
                                                  Tokio hotel
one of the 3 key research questions



  Social groups among today‟s youngsters
To what
 youngster
   group
do I belong?
Methodology



                M
                E


 Non
group                    Aspira
              Social     tional
              Group       Group



               Differe
               nt but
                OK
REAL WORLD
             WE                                                                              ME (I’m better)
CHANGE



                                                    Skater
         Alternative

                                                                              Fashion girl
                                                                                              Fashion boy




                                        Tektonic                  Rapper
THINK




                              Rockers




                                                                                                            DRINK
         Hippies
                                                   MAINSTREA
                                                   M                       Breezer sluts


                       Punk


                                                                                              Jumpers

                                           emo




                                                                               Nerds
                   Gothic



                                                         Geek girls
CONSERVATISM
spartacus121 best pk (runescape)
30/04/2008 19:07:55 door laurent
ik hit iets in de 27 en dat is al zot hoog zij hits zijn
gestoord lol. zen zwaard is 140M (ik heb 10 M)
REAL WORLD
             WE                                                                                                   ME (I’m better)
CHANGE

           SKILLS         SKILLS            SKILLS        SKILLS

                                 SKILLS               SKILLS
           SKILLS                                                Skater
         Alternative                          SKILLS
                          SKILLS                                                                  Fashion girl
                                           SKILLS                                                                   Fashion boy
                                                        SKILLS
           SKILLS                   SKILLS
                       SKILLS                  SKILLS
                                     SKILLS          Tektonic
           SKILLS                                                              Rapper
THINK




                                 Rockers                SKILLS




                                                                                                                                  DRINK
                          SKILLS

         Hippies
                                                                                               Breezer sluts
                                                                          SKILLS      SKILLS           SKILLS           SKILLS

                          Punk
                                                                                            SKILLS                SKILLS
                                                                          SKILLS                          SKILLS
                                                                                                                    Jumpers
                                                                                      SKILLS
                                                        emo
                                                                                                     SKILLS
                                                                                                                    SKILLS
                                                                          SKILLS                SKILLS
                                                                                   SKILLS                      SKILLS
                   Gothic
                                                                                                  SKILLS
                                                                                                  Nerds
                                                                          SKILLS                                    SKILLS
                                                                                      SKILLS

                                                                      Geek girls
CONSERVATISM
REAL WORLD
             WE                                                                                            ME (I’m better)
CHANGE

                                                                 LOOKS         LOOKS            LOOKS           LOOKS

                                                                                     LOOKS                 LOOKS
                                                            Skater LOOKS
         Alternative                                                                                LOOKS
                                                                                LOOKS      Fashion girl
                                                                                              LOOKS         Fashion boy
                                                                                                            LOOKS
                                                                 LOOKS                   LOOKS
                                                                             LOOKS                      LOOKS
                                                 Tektonic                                  LOOKS
                                                                 LOOKSRapper
THINK




                               Rockers                                                                      LOOKS




                                                                                                                          DRINK
                                                                          LOOKS

         Hippies

                                                      LOOKS                             Breezer sluts
           LOOKS         LOOKS            LOOKS
                         Punk LOOKS               LOOKS
           LOOKS                           LOOKS
                          LOOKS                                                                             Jumpers
                                         LOOKS
                                                    emo
                                                    LOOKS
           LOOKS                  LOOKS
                       LOOKS                LOOKS
                                   LOOKS                                                    Nerds
           LOOKSGothic                              LOOKS
                    LOOKS

                                                                Geek girls
CONSERVATISM
REAL WORLD
             WE                                                                                                 ME (I’m better)
CHANGE
                                                         SKILLS      LOOKS          LOOKS            LOOKS            LOOKS
           SKILLS         SKILLS           SKILLS

                                SKILLS               SKILLS                               LOOKS                 LOOKS
           SKILLS                                               Skater LOOKS
         Alternative                         SKILLS                                                     LOOKS
                          SKILLS                                                    LOOKS       Fashion girl
                                          SKILLS                                                   LOOKS          Fashion boy
                                                       SKILLS                                                     LOOKS
           SKILLS                                                    LOOKS                    LOOKS
                                   SKILLS
                       SKILLS                                                    LOOKS                       LOOKS
                                              SKILLS
                                    SKILLS          Tektonic                                    LOOKS
           SKILLS                                                    LOOKSRapper
THINK




                                Rockers                SKILLS                                                     LOOKS




                                                                                                                                DRINK
                          SKILLS                                              LOOKS

         Hippies

                                                         LOOKS                               Breezer sluts
           LOOKS          LOOKS            LOOKS                     SKILLS         SKILLS           SKILLS           SKILLS

                          Punk LOOKS                 LOOKS                                SKILLS                SKILLS
           LOOKS                             LOOKS                   SKILLS                             SKILLS
                           LOOKS                                                                                  Jumpers
                                                                                    SKILLS
                                          LOOKS                                                    SKILLS
                                                       emo
                                                       LOOKS                                                      SKILLS
           LOOKS                   LOOKS                             SKILLS                   SKILLS
                       LOOKS                  LOOKS                              SKILLS                      SKILLS
                                    LOOKS                                                       SKILLS
                                                                                                Nerds
           LOOKSGothic                                 LOOKS         SKILLS
                    LOOKS                                                                                         SKILLS
                                                                                    SKILLS

                                                                    Geek girls
CONSERVATISM
REAL WORLD
             WE                                                                                                 ME (I’m better)
CHANGE
                                                         SKILLS      LOOKS          LOOKS            LOOKS            LOOKS
           SKILLS         SKILLS           SKILLS

                                SKILLS               SKILLS                               LOOKS                 LOOKS
           SKILLS                                               Skater LOOKS
         Alternative                         SKILLS                                                     LOOKS
                          SKILLS                                                    LOOKS       Fashion girl
                                          SKILLS                                                   LOOKS          Fashion boy
                                                       SKILLS                                                     LOOKS
           SKILLS                                                    LOOKS                    LOOKS
                                   SKILLS
                       SKILLS                                                    LOOKS                       LOOKS
                                              SKILLS
                                    SKILLS          Tektonic                                    LOOKS
           SKILLS                                                    LOOKSRapper
THINK




                                Rockers                SKILLS                                                     LOOKS




                                                                                                                                DRINK
                          SKILLS                                              LOOKS

         Hippies

                                                         LOOKS                               Breezer sluts
           LOOKS          LOOKS            LOOKS                     SKILLS         SKILLS           SKILLS           SKILLS

                          Punk LOOKS                 LOOKS                                SKILLS                SKILLS
           LOOKS                             LOOKS                   SKILLS                             SKILLS
                           LOOKS                                                                                  Jumpers
                                                                                    SKILLS
                                          LOOKS                                                    SKILLS
                                                       emo
                                                       LOOKS                                                      SKILLS
           LOOKS                   LOOKS                             SKILLS                   SKILLS
                       LOOKS                  LOOKS                              SKILLS                      SKILLS
                                    LOOKS                                                       SKILLS
                                                                                                Nerds
           LOOKSGothic                                 LOOKS         SKILLS
                    LOOKS                                                                                         SKILLS
                                                                                    SKILLS

                                                                    Geek girls
CONSERVATISM
How this research changed our lives

REMEMBER ME
6 changes that rocked our socks off

• The tools for ID construction have changed

• New online quali methods proved to be efficient

• Our target group = the new and better quali researchers

• New reflection kit for entire MTV Networks staff

• Closer connection with MTV & new clients

• Redefine content strategy of TMF on screen & on line
4C Consulting

Introduction to our services
Our Mission | Boosting your customer value




                             4C Consulting


                           helps companies


                         win, keep and grow


                            customer value
Our Solutions | Call us for…




                                                     Business
Customer Value                                       requirements
      Strategy                                       definition

                                                  Package selection
       Customer                                   & implementation
         Insight


       Process                                     Post-launch
     Excellence                                    care
Our Focus | Boosting your customer value



        Increase Revenue                               Reduce Costs

1. Acquire new customers                     1. Efficient Delivery (Process
                                                Excellence)
2. Sell more to existing customers
    • More of the same (increase turnover)   2. Align value propositions
    • Expand value proposition portfolio
      (cross-sell & product development *)
                                             3. Advanced pricing
    • Upgrade value
      proposition (Upsell)


3. Prevent existing
   customers from leaving
Business Intelligence Practice




           SCV         ROMSCCI      Competing on
                                      Analytics




         Infrastruc-
             tuur

Audit                       Exploitatie            Coaching


        Data Quality
Why 4C Consulting | 7 compelling reasons



                   1. 100% focus on customer value
                      management

                   2. Result-oriented project approach

                   3. Connecting marketing, sales &
                      customer care with senior management
                      and IT

                   4. Independent consultant for 10 years

                   5. Experienced crew, passionate about
                      marketing, sales & customer care

                   6. Value based pricing model

                   7. Satisfied & loyal customers: 90
                      customers, more than 380 projects
60
Optimize your business with Business
             &Decision

          Michel Meulders
        - Domain Manager -
Business & Decision Benelux
Founded in 2002
Merger of several companies specialised
in BI                                                                    Business & Decision Benelux is :
Consulting & System Integrator :
                                                                         • a multi-specialist, in specific
- More than 300 consultants                                              technology fields :
- About 18 mio Euro turnover in 2007                                       • Business Intelligence
- 58% organic growth comparing to 2006                                     • Customer Relationship Management
                                                                           • Life sciences
- Last acquisition : BnV Group (BE+NL)
                                                                           • Risk & Compliance
                     Turnover evolution (consolidated)

             25000
                             Belgium   Luxembourg   Netherlands
                                                                         • with foreign offices in Brussels,
 Thousands




                                                                         Amsterdam, Luxembourg
             20000

             15000                                                       • Top accounts in finance, pharma,
             10000                                                       telco, distribution, industry (Fortis, ING,
                                                                         ABN Amro, Dexia, GSK, UCB, Proximus,
              5000
                                                                         Belgacom, Carrefour, Honda,…)
                 0
                      2004      2005      2006      2007      Obj 2008
For more info see
http://www.businessdecision.com
Belgian Federation of
Market Research Institutes

www.febelmar.be
Febelmar mission
Development and promotion of market
research and opinion polls in Belgium
Protecting the sector interests
Watching over correct use of deontological
rules of market research in all phases of the
market research process
Stimulating continuous improvement of
quality of service in market research
Being a platform for communication,
exchange of expertise and networking
Members

27 agencies.

Together they represent about 75% of
the total market research expenditures in
Belgium.
What is I4BI?


i4bi is specialized in implementing Business Intelligence Solutions in
your company.

Our team of BI experts has deep functional and technical experience
with Application development, Business Model definitions and Data
Warehousing. Functional/technical designs, development, application
role out, training etc… are phases in a project where our consultants
have many years of experience.

i4bi consultants have a deep knowledge of the Oracle Business
Intelligence products and solutions.

i4bi sponsors the development of an independent analytical branch,
which will probably see the light in 2009
What do we provide?




           Strategic Decision Making


                  To Support


    Business       Technical           Analytical
   Experience       Abilities          Expertise


                  We Provide
Expertise


• Analytical Expertise
   •   Data Mining
   •   Statistical modelling
   •   Predictive analysis
   •   Basel II compliant modelling
   •   Forecasting

• Business Analysis Expertise
   • Reporting
   • Delivering Business Insight to decision makers
   • Marketing Analysis
   • Data Quality
• Use of technical tools such as SAS – SPSS – Statistica to
  support & extend business knowledge
Contact


• For more general information:

  www.I4BI.be

• For more analytical information:

  Filip.deroover@I4BI.be
InSites Consulting

6 beliefs in 60 seconds
We believe ...
         in the empowered consumer

Human-to-human interactions are more powerful than
      ever and can make or break your brand



                   “Consumers are beginning in a very
                   real sense to own our brands and
                   participate. We need to begin to
                   learn how to let go”

                   A.G. Lafley, CEO & Chairman of P&G
We believe ...
           in giving back

Rewarding experiences for participants
Active involvement of panel members
         Charity contributions
We believe ...
                                     in connecting

Everything we do is aimed at strengthening connections
           between you, your market and us

“Connected Research” brings you closer to your market
      and taps into the wisdom of the crowds

     Some of our connected research methods
                  Research communities
                      Bulletin boards
                       Blog research
                 Online discussion groups

  More information: http://connectedresearch.insites.eu/
We believe ...
in the power of new research methods
 for better marketing decision making


                 Informational
      Providing more depth to research insights


               Transformational
    Doing things that were previously not possible

                 Automational
        Conducting research more efficiently
We believe ...
                     in 1 + 1 = 3

 Old and new methods need to be optimally “fused” in
order to fully grasp the new customer / consumer reality
We believe ...
        in the power of our team

         People make the difference
Open, forward thinking, dedicated, passionate
         Specific knowledge centers
performance management  consulting  technology




                           Welcome to Keyrus
                            BAQMaR, 17 December 2008


©Keyrus – all rights reserved
About Keyrus (Belgium)


• founded in 1996 as SOLIDPartners
• focus on performance management, business
  intelligence & data warehousing
• strong and balanced client base spread over different
  industries
• +100 consultants specialised in both technical and
  business domains
• part of Keyrus group (France)
Keyrus‟ global footprint




 head office in Paris

 present in 9 countries

 +1300 employees

 listed on Paris stock
  exchange Euronext
Vision & mission


Keyrus will be one of the few leading service
                  providers
 in the area of performance management.

 We help our clients to effectively design,
             build and operate
 the adequate performance management
        organization and solutions
   in an integrated end-to-end fashion.
Portfolio of solutions & services

    Information           Business            Analytic      People and                    Corporate
    Management          Intelligence        Applications     Processes                   Performance
                         Platforms                                                       Management

                 IM           BIP               AA             P&P                         (C)PM




                                                                                                   BI Layer (reports, OLAP,
                                                                                                     dashboards, alerts)
                                                                     outflow functions
source systems




                                           data warehouse
& applications




                                                                      data delivery &
                        management
                        information




                                            & data marts
                          functions




                                      CPM data delivery, exchange
                                          & synchronization

             CPM Applications (e.g.                         Analytic Applications
              planning, ABM, PA)                             (e.g. data mining)
Contact us


                                  Keyrus nv
                                  info@keyrus.be

                                  Nijverheidslaan 3/2
                                  B-1853 Strombeek-Bever
                                  t +32 2 706 03 00
                                  f +32 2 706 03 09
www.keyrus.be


                performance management  consulting  technology   17-Dec-2008
Introduction of Profacts
Who is PROFACTS ?




We are „the new kids on the block‟
  in (online) market research ...
1 REVEALING FACTORS FOR SUCCESS
               strategy
                      200%                286              people are nowyrs
                              growth rate                 working mean age
                                                                  @ Profacts


      2
  people have
founded Profacts




       Q4       Q1     Q2     Q3      Q4     Q1     Q2       Q3       Q4
      2006     2007   2007   2007    2007   2008   2008     2008     2008
REVEALING FACTORS FOR SUCCESS


Profacts is active in more then 10 sectors ...

       AUTOMOTIVE                FMCG


       RECRUITMENT                GPS


         BANKING               INSURANCES


           ICT               PHARMACEUTICAL


         TELECOM                ENERGY
Python Predictions
Python Predictions




  PREDICT
Python Predictions
Python Predictions




    GROWTH
Python Predictions




www.pythonpredictions.com
Rogil Research

A research agency with a view
MARKETING & SENSORY RESEARCH
                     OUR PASSION




                                     Sensory research
FTF research (Mobile Unit)
                                   Eye-tracking / Eye|watch
   Telephone research
                                        Tachistoscope
     Online research
                                        Trained panel
      Panel services
                                      Consumer panel
  Fieldwork in Europe
                                           Taste lab
Sensory Safari




Note down in your agenda
SENSORY SAFARI

• March 26th 2009
• 18u
• At Rogil in Leuven
Sensory Safari




5 SENSES                    MARKETING
We hope to welcome you in March.


Thanks for your attention !
SAS Analytics For Challenging Times

      Start Focused, Think Wide
Campaign Managment Requires
        Optimization
CRM is becoming Risk Managment
SAS Breadth of Analytic Offering…




                     • Statistical Analysis
                     • Survey Design/Analysis
                     • Data Mining
                     • Text Mining
                     • Time Series Mining
                     • Forecasting
                     • Quality Improvement
                     • Operations Research
SAS Innovations in Marketing
       Solutions….
http://www.sas.com/feature/analytics/index.html




Copyright © 2006, SAS Institute Inc. All rights reserved.
The Mission




Drive the widespread use of
   data in decision making
The Focus




Attract   Retain     Grow        Fraud    Risk




          Driving and Maximizing Profit
The Vision

 Operational
 Processes




       Interaction data                Attitudinal data




       Descriptive data                Behavioral data


Enterprise
 Enterprise
Data
 Data
Sources
 Sources
The Acceptance

• The rise of the agnostics
   Science vs. Chance
   In numbers we trust!
The myth of the „best‟ algorithm
 lessons learned from innovations in
data sampling and data pre-processing
       for marketing analytics




    Dr. Sven F. Crone
     Deputy Director, Ass. Prof.
Associated Experts
                          Prof. Paul Goodwin
Directors                 Dr. Andrew Eaves
Prof. Robert Fildes
Prof. Peter Young         Research & PhD students
Dr.Sven F. Crone          Heiko Kausch, RA
                          Stavros Asimakopoulos
Researchers               Xi Chen
Dr. Steve Finlay          Bruce Havel
Dr. Alastair Robertson    Suzi Ismail
Dr. Didier Soopramanien   Nikolaos Kourentzes
Dr. Kostas Nikolopoulos   Ioannis Stamatopoulos
                          Andrey Davidenko
Prof. Stephen Taylor      Charlotte Brown
Dr. Wlodek Tych           Hong Juan Liu
Prof. David Peel          T Hu
Prof. Peter Pope          John Prest
                          Huang Tao

                          Visiting Researchers
                          Prof. Geoff Allen
                          Dr. Yukun Bao
                          Young-Sang Cho
“Take away this pudding, it has no
 theme.”           Sir Winston Churchill (1915)
Agenda
• Sampling issues in Data Mining
• Case study 1: Direct Marketing
  • Cross-selling of Magazine subscriptions
  • Effect of data preprocessing: Sampling
  • Interaction of Sampling with Scaling & Coding
• Case study 2: Credit & Behavioral Scoring
  • Predicting consumer credit default
  • Effects of sample size
  • Effects of sample distribution
• Case study 3: Online Shopping Behaviour
  • Predicting consumer shopping channel choice
  • Sample distribution & multiple classes
• Conclusion & Take-aways
Why (Under/Over) Sampling?
   • Knowledge Discovery (KDD) = non-trivial process of identifying
     valid, novel, useful patterns in large data sets
            • Data Mining = only one single step in the KDD process
            • Data sample determines the whole process! ( GIGO)
            • “Research seems preoccupied with algorithms”  [Hand 2000]




SAS SEMMA DM-Process



   Monitoring




CRISP-DM Process
Sampling in Direct Marketing Literature?
                                                             Data reduction**                Data projection
         Input                                    Paramete Feature      Re-        Continuous attributes     Categories
         type*               Methods***            r tuning Selection sampling Standardisation Discretisation Coding
   [2]     2            BMLP, LR, LDA, QDA             X                             X
  [42]     1              MLP, LR, CHAID               X                             X
  [43]     2         MLP, RBF, LR, GP, CHAID           X                 X
  [44]     3               MLP, LR, LDA                X        X
   [4]     2               CHAID, CART                                   X
   [6]     2                  MLP, LR                  X        X        X           X                X
   [9]     2          LVQ, RBF, 22 DT, 9 SC            X                                                         X
                  LDA, LR, KNN, KDE, CART, MLP,
  [45]    2                                          X                                              X
                        RBF, MOE, FAR, LVQ
   [3]    1                    MLP                             X                     X
   [7]    2                   LSSVM                  X         X                     X
  [11]    2          LR, LS-SVM, KNN, NB, DT         X                               X              X
                   LDA, QDA, LR, BMLP, DT, SVM,
  [10]    1                                          X                                              X
                       LSSVM, TAN, LP, KNN
  [46]    2               LR, MLP, BMLP              X         X
                  LSSVM, SVM, DT, RL, LDA, QDA,
  [47]    2                                          X                               X
                            LR, NB, IBL
  [48]    1               DT, MLP, LR, FC            X
  [49]    1                     FC                   X                               X




Majority of direct marketing papers focus on algorithm tuning
  Only 3 papers consider Resampling / Instance Selection
      No analysis of the interaction with Sampling & Projection & …
Classification
          since last purchase … Many                                         Last campaign
                                                                             No response
                                                                             Subscribed to magazine
          Few … Days




                                       1…   Number of subscriptions … Many


Database of customers (instances)
Known attributes for all customers (age, gender, existing subscriptions, …)
Known response (class membership) of buyers & non-buyers from past mailings
Build a model to separate classes  decision boundary of different complexity
Classification
         since last purchase … Many
                                                                            No response
                                                                            Subscribed to magazine
                                                                            Class unknown
         Few … Days




                                      1…   Number of subscriptions … Many


Use the decision boundary to classify unseen instances
  Calculate on which side of hyperplane the instances lie (or distance)
      Assign class to unseen instances
Reality Check: Imbalanced classes
          since last purchase … Many                                             No response
                                                                                 Subscribed to magazine

                                                                         Problem
                                                                         • Classifiers are biased towards
                                                                           the majority class
                                                                         • Shifts the decision boundary
                                                                         • Error / Accuracy based learning
          Few … Days




                                                                           creates naïve classifiers
                                                                         • Invalid separation of classes
                                       1…   Number of subscriptions … Many


Balanced dataset = class distributions are equal P(x|y=A)=P(x|y=B)
    proportional sampling or stratified sampling feasible
Imbalanced dataset = class distributions unequal P(x|y=A)>>P(x|y=B) `
   The class of interest is often the minority (in most business applications)
Imbalanced Data Sampling
         since last purchase … Many                                          No response
                                                                             Subscribed to magazine
                                                                        Stratified Random Sampling
                                                                        divide DB in mutually exclusive
                                                                          strata (subpopulations) & draw
                                                                          random samples from each
                                                                        Proportional
                                                                          assure proportions in samples
         Few … Days




                                                                          equal those in population
                                                                        Disproportional
                                                                          weighted over-& undersampling
                                                                          of important classes
                                      1…   Number of subscriptions … Many



Size of the sample?
   Distribution / location of the sample?
Random Undersampling
         since last purchase … Many
                                                                                No response
                                                                                Subscribed to magazine

                                                                        Benefits
                                                                        • Helps detect rare target levels

                                                                         Risks
                                                                         • Biases predictions (correctable)
                                                                         • Looses information contained in
         Few … Days




                                                                           instances of the majority class
                                                                         • Creates different boundaries
                                                                         • Increases prediction variability
                                      1…   Number of subscriptions … Many
                                                                         •…

Exclude random instances of the majority class
  Retain all instances of the minority class
      Establish a balanced class distribution
Random Oversampling
         since last purchase … Many                                             No response
                                                                                Subscribed to magazine
                                                                        Benefits
                                                                        • Helps detect rare target levels
                                                                        • No loss of information

                                                                        Risks
                                                                        • Biases predictions (correctable)
         Few … Days




                                                                        • Increases prediction variability
                                                                        • Increases processing time

                                      1…   Number of subscriptions … Many


Retain all instances of the majority class in the sample
  Duplicate identical instances of the minority class
      Establish a balanced class distribution
Ready for more theory…?

              x




   rather some case studies ...!
Agenda
• Sampling issues in Data Mining
• Case study 1: Direct Marketing
  • Cross-selling of Magazine subscriptions
  • Effect of data preprocessing: Sampling
  • Interaction of Sampling with Scaling & Coding
• Case study 2: Credit & Behavioral Scoring
  • Predicting consumer credit default
  • Effects of sample size
  • Effects of sample distribution
• Case study 3: Online Shopping Behaviour
  • Predicting consumer shopping channel choice
  • Sample distribution & multiple classes
• Conclusion & Take-aways
Business Case:
         Direct Marketing/Response Optimization

• Sell a magazine subscription to existing customers




• Whom to send mail to? (Which customers are most likely to respond?)
• How many customers to contact? (What is the optimal mailing size?)
Corporate project with leading German Publishing House
   Provided data set of past mailing campaigns
   Benchmark novel methods against in-house SPSS Clementine
   Explore Neural Networks (NN) an Support Vector Machines (SVM)
Benefits of Direct Marketing


                                    Simple           With data mining
           Addressees              100.000          Top 40% = 40.000
           Cost               2€/mail = 200.000€   2,5€/mail = 100.000€
           Response rate         0,5% = 500             1,0% = 400
            Sales volume            300€                  300€
           Sales volume           150.000€               120.000€
           Revenue                -50.000€               20.000€




Smaller mailing (number of letters sent)  lower costs (Euro 1.- per letter)
Higher response rate  higher revenue
More specific mailing  lower cost
More relevant information  higher customer satisfaction
NN get worse with learning …

• Wish to implement Neural Networks for next campaign
   • In-house team (with no NN knowledge) outperformed us EVERY TIME!
   • Analyzed software, training parameters, etc.  internal competition
   • Observed expert in building models … !

               Pred    Pred     Sum              Pred.   Pred.   Sum
       %                                  %
               C0      C1                         C0      C1
       C0     61.86    38.14    100      C0     72.96    27.04    100
       C1     55.09    44.81    100      C1     62.02    37.98    100
              116.95   82.95   54.26            134.98   65.02   55.47

                               Pred.   Pred.   Sum
                        %
                                C0      C1
                       C0      52.87   43.37    100
                       C1      47.13   56.63    100
                                100     100    54.75
Experimental Design:
            Different data pre-processing


            Handle categorical          Scale numerical
              Different Encoding          Different Scaling
            features
              n, n-1, thermo, ordinal
                                        features Standardise
                                          Discretise,



      Adjust imbalanced                       Decide on sample
       Different Sampling
      class distributions
       Over-& Undersampling
                                              size and method



 Handle    outliers                                 Select useful
                                                    features


Evaluate across 3 algorithms:
Neural Networks (MLPs), Support Vector Machines & Decision Trees
Dataset Structure

            Data set size                   Data set structure
    • 300,000 customer records         • 18 categorical features
    • 4,019 subscriptions sold         • 35 numerical features
    • Response rate of 1.3%            • Binary target variable

    Evaluated the Impact of Data Preprocessing
    • Data Sampling (over sampling vs. undersampling)
    • Categorical attribute Encoding (N, N-1, thermo, ordinal)
    • Continuous attribute Projection (Binning vs. Normalisation)
    • Continuous attribute Scaling ( [0,+1] vs. [-1,+1] range)

Multifactorial design to evaluate impact across multiple methods
  Neural Networks (NN)
  Support Vector Machines (SVM)
  Decision Trees (CART)
Sampling


             Created 2 Dataset Sampling candidates
                                    Data partition (number of records)
                                    Oversampling       Undersampling
             Data subset           Class 1 Class -1 Class 1 Class -1
             Training set          20,000 20,000 2,072           2,072
             Validation set        10,000 10,000 1,035           1,035
             SUM                   30,000 30,000 3,107           3,107
             Test (hold-out) set    912     64,088      912     64,088




Different balancing in the training data
  Original distribution in the test data (65,000 instances)
Results


                                          Increase




                                          Increase




                                          Increase




Oversampling outperforms undersampling consistently!
  Gain in Lift depends on method (different sensitivity)
      Oversampling has higher impact than data coding & scaling
Recommendations from Case Study


  • Sampling
          • Oversampling outperfoms undersampling for all methods
          • Undersampling: better in-sample results & worse out of sample
  • Choice of method
          • NN & SVM better than CART
  • Encoding & Projection
          • SVM: avoid Ordinal coding (e.g. 1,2,3) all other similar (incl. N !)
          • NN: avoid standardization & ordinal encoding
          • DT / CART: use temperature, all others similar (incl. ordinal)


Binning & Scaling of continuous attributes irrelevant for all methods!
      Use Undersampling & N-1 encoding with SVM & NN
          Best preprocessed SVM  lift of 0.645 on test set … BUT …
Results across Pre-processing
                     Preprocessing: higher impact than method selection
                          Lift-variation per method from Sampling/Scaling/Coding
                           > Difference of Lift between competing methods!
                          Lift performance on                         Arithmetic Mean Performance                    Geometric Mean Performance
                           Test data subset                                on Test data subset                          on Test data subset

                  0,65                                         0,58                                           0,58

                                                                                                              0,57
                  0,64                                         0,57
                                                                                                              0,56

                                                                                                              0,55
                  0,63                                         0,56




                                                                                                    GM test
      Lift test




                                                     AM test




                                                                                                              0,54

                  0,62                                         0,55
                                                                                                              0,53

                                                                                                              0,52
                  0,61                                         0,54
 DPP causes 50%-70% of the                                                                                    0,51


 differences between models
                  0,60                                         0,53                                           0,50

                         NN      SVM            DT                      NN       SVM         DT                        NN       SVM        DT
                               Method                                           Method                                         Method


Results are consistent across error measures
  Experiments allow identification of „best practices‟ to model methods
      Best-practice preprocessing varies between methods
Agenda
• Sampling issues in Data Mining
• Case study 1: Direct Marketing
  • Cross-selling of Magazine subscriptions
  • Effect of data preprocessing: Sampling
  • Interaction of Sampling with Scaling & Coding
• Case study 2: Credit & Behavioral Scoring
  • Predicting consumer credit default
  • Effects of sample size
  • Effects of sample distribution
• Case study 3: Online Shopping Behaviour
  • Predicting consumer shopping channel choice
  • Sample distribution & multiple classes
• Conclusion & Take-aways
Business Case: Predicting
             Customer Online Shopping Adoption

• Traditional buying process is offline & simultaneous  “bricks” store
• Introduction of the Internet changes consumer behaviour
   • Seek information online & offline
   • Purchasing online & offline
    Changing purchasing behaviour through internet adoption
    Changing purchasing behaviour through Technology Acceptance
• Development of heterogeneous Purchasing Behaviour
   • Example: Purchasing electronic durable consumer goods
   • Search for product info (e.g. video cameras) online
         test product in-store
                 search for best deal on internet & purchase

    Search for Information Online    Purchase Online
                                                                   Online
                                                                   Shoppers

                                                                   Browsers
    Search for Information Offline   Purchase Offline              Non-Internet
                                                                   Shoppers
Stages of Internet Adoption

1. OFFLINE BUYERS
 Information gathering
 & purchasing in Stores
                     2. BROWSERS
                          Information gathering online
                          & purchasing in stores
                                                   3.ONLINE BUYERS
                                                    Information gathering
                                                    & purchasing online
Motivation

       DIDIER: Marketing Modelling                 SVEN: Data Mining Perspective
• Econometric / Marketing Domain              • IS/OR/MS Domain  Data Mining
• Seeks to explain how customers behave in    • Seeks to accurately predict regardless of
  online shopping                               explanation why customers buy
• Use of „black-box” logistic regression      • Use of “black-box” methods from
  models                                        computational intelligence
Models class membership to identify          Models class membership to
  causal variables that explain choices         accurately classify unseen instances
Descriptive & Normative Modelling            Predictive Modelling
               Best practices                                Best practices




                                         
 balance datasets for distribution            Rebalance datasets for equal distribution
  representative of population                  of target variables
 Use ordinal variables & nominal variables    Recode ordinal  binary scale
  without recoding                             Rescale & normalise data to facilitate
 Do not normalise / scale data                 learning speed etc.
     same dataset & same objectives & similar methods
     Conflicting “best practice” approaches to modelling
     Outside of most software simulators!!! Implicit knowledge?
     … WHO IS “CORRECT”? WHAT IS THE IMPACT?
Dataset

    • Survey on Internet Shopping Behaviour
         • 5500 UK households  685 respondents
         • Adjusted for age, income etc. of customers (older less likely to buy)
         • Adjusted for product specific risk of online shopping for branded
           durable consumer goods (inspection required to some extent)
         • 73 questions on factors related to internet shopping, products etc.
Online Shopping Factors:
“Going to the shops is as convenient
  as Internet shopping”                       Demographics
                                                                                          Class 1:
“I would buy online if products are                                                   Browse Ônline &
  branded” etc. [1=strongly agree; …]                                                    Buy Online

                                                Internet                                  Class 2:
                                                                Logistic Regression
                                                specific                              Browse Online &
                                                                 Neural Networks
Demographic Factors                             Factors                                  Buy Offline

Age, Gender, Income                                                                       Class 3:
                                                                                         Browse &
                                                 Online                                  Buy Offline
Internet Utility Factors                        shopping
                                                 specific
Score from 6 correlated variables                Factors

                                              Input Variables          Models         Output Variables
 Mixed scale of nominal, ordinal, interval
Imbalanced Classification problem
                      • Split of Dataset for Training, Validation and Test {50%;25;25%}
                      • Distribution of target classes is skewed
                        {65% online buyers; 22.5% browsers; 12.5% offline shoppers}
                      • Rebalancing of data sets through over- & undersampling)
                                                       Dataset               Dataset
                    Imbalanced            Imbalanced
                                                  Oversampling           Oversampling
                                                                                Undersampling            Undersampling
        400                       400




        300                       300
Count




                          Count




        200                       200




        100                       100
                                                                                                             Data Subset           Data Subset
                                                                                                               Training              Training
                                                                                                               Validation            Validation
                                                                                                               Test                  Test
          0                         0
               Online- Browsers Offline-
                                     Online- Browsers Offline-
                                                Online- Browsers Offline-
                                                                      Online- Browsers Offline-
                                                                                 Online- Browsers Offline-
                                                                                                       Online- Browsers Offline-
              Shoppers         Shoppers
                                   Shoppers Shoppers                Shoppers Shoppers
                                                      Shoppers Shoppers                Shoppers Shoppers
                                                                                                     Shoppers          Shoppers
Results without Discretisation
Logist.Reg.  True            Training Data                  Test Data
  Dataset   Value      Online Browse Offline       Online    Browse     Offline
  Original  Online     93.36      5.17     1.48    88.89       7.78      3.33     MCRtrain=54.3%
Imbalanced Browser     62.77     23.40    13.83    49.39      22.58     29.03     MCRtest =48.9%
            Offline    36.54     17.31    46.15    35.29      29.41     35.29



                                 
  Under-    Online     57.69     30.77    11.54    64.44      23.33     12.22
 Sampling Browser      26.92     48.08    25.00    32.26      25.81     41.94     MCRtrain=55.8%
            Offline    17.31     21.15    61.54    29.41      35.29     35.29     MCRtest =41.8%



                                 
   Over-    Online     68.27     24.35     7.38    74.44      16.67      8.89
 Sampling Browser      30.63     43.91    25.46    35.48      29.03     35.48     MCRtrain=58.4%
            Offline    16.97     19.93    63.10    29.41      29.41     41.18     MCRtest =48.2%

                Neural Net                    Training Data                Test Data
                 Dataset                Online Browse Offline       Online Browse Offline
MCRtrain=54.4%   Original     Online     86.19     12.71    1.10     86.67    8.89      4.44
MCRtest =52.5% Imbalanced    Browser     53.13     31.25    15.63    41.94   35.48     22.58
                              Offline    25.17     28.57    45.71    29.41   35.29     35.29



                                                  
                   Under-     Online     44.86     40.00    17.14    27.78   58.89     13.33
MCRtrain=54.9%    Sampling   Browser     14.29     48.57 37.14       16.13   32.26     51.61
MCRtest =35.7%                Offline     8.57     20.00    71.43    11.76   41.18     47.06



                                                                                 
                   Over-      Online     81.22     18.23    0.55     61.11   22.22     16.67
MCRtrain=88.0%    Sampling   Browser     14.92     83.43    1.66     19.35   77.42      3.23
MCRtest =75.6%                Offline    15.52      0.55    99.45     0.00   11.76     88.24
                                                                    Mean Classification Rate (%)
Results with Discretisation of Ordinal
Logist.Reg.  True           Training Data                   Test Data
  Dataset   Value     Online Browse Offline        Online    Browse Offline
  Original  Online     91.51      6.64     1.85     85.56      7.78    6.67    MCRtrain=61.15%
Imbalanced Browser     54.26     36.17     9.57     48.39     32.26   19.35    MCRtest =45.1%
            Offline    26.92     17.31    55.77     58.82     47.62   17.65



                                 
  Under-    Online     71.15     21.15     7.69     55.56     24.44   20.00
 Sampling Browser      17.31     65.38    17.31     67.74      6.45   25.81    MCRtrain=69.9%
            Offline    15.38     11.54    73.08     58.82      0.00   41.18    MCRtest =34.4%



                                                            
   Over-    Online     68.63     22.88     8.49      70.0     21.11    8.89
 Sampling Browser      17.34     56.83    25.83     12.90     58.06   29.03    MCRtrain=66.0%
            Offline    13.28     14.02    72.69     17.65     23.53   58.82    MCRtest =62.3%

                Neural Net                    Training Data                Test Data
                 Dataset                Online Browse Offline       Online Browse Offline
MCRtrain=56.5%   Original     Online     96.13      3.87     0.00    84.44   11.11      4.44
MCRtest =45.5% Imbalanced    Browser     68.75     28.13     3.13    64.52   22.58     12.90
                              Offline    40.00     14.29    45.17    58.82   11.76     29.41



                                                  
                  Under-      Online     57.14     40.00     2.86    25.56   72.22      2.22
MCRtrain=55.2%   Sampling    Browser     34.29     54.29    11.43    67.74   29.03      3.23
MCRtest =28.0%                Offline    14.29     31.43    54.29    52.94   17.65     29.41



                                                                             
                  Over-       Online     98.34      1.10     0.55    58.89   24.44     16.67
MCRtrain=99.5%   Sampling    Browser      0.00     100.0     0.00     3.23   83.87     12.90
MCRtest =79.0%                Offline     0.00      0.00    100.0     0.00    5.88     94.12
                                                                    Mean Classification Rate (%)
Summary

Oversampling outperforms other samplings
- Across Different Datasets
- Across various data preprocessing

Methods show different sensitivity to Sampling
- More variation from sampling, coding & scaling than between methods
- Using different preprocessing variants is important in modeling

Various sophisticated extensions exist
- SMOTE (Synthetic Minority Oversampling Technique)
- K-nearest Neighbor sampling (removal / creation)
- One-class learning etc. …

Extend your bad of tricks …
- … and experiment with imbalanced sampling!
Questions?




Sven F. Crone
Lancaster University Management School
Centre for Forecasting
Lancaster, LA1 4YX
email s.crone@lancaster.ac.uk


SYt  Yt 1  (1   )SYt 1           
Exploring Innovation
“Online panel” vs “Online „streaming/convenience”
                    sampling
Unfortunately, the presenters of iVOX &
 Corelio cannot share their presentation
  with the BAQMaR community due to
        reasons of confidentiality!
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008
BAQMaR 2008

Mais conteúdo relacionado

Semelhante a BAQMaR 2008

Pbe 3.0 final presentation 2011
Pbe 3.0 final presentation 2011Pbe 3.0 final presentation 2011
Pbe 3.0 final presentation 2011
Lindseyy
 
Change of chinese values across different generations
Change of chinese values across different generationsChange of chinese values across different generations
Change of chinese values across different generations
HorizonKey
 
презентаци субкультура
презентаци субкультурапрезентаци субкультура
презентаци субкультура
Helena Nikiforova
 
Sleep problems
Sleep problemsSleep problems
Sleep problems
lala_k
 

Semelhante a BAQMaR 2008 (20)

Entertainment today
Entertainment todayEntertainment today
Entertainment today
 
Persist or Die! Learning in World of Warcraft
Persist or Die! Learning in World of WarcraftPersist or Die! Learning in World of Warcraft
Persist or Die! Learning in World of Warcraft
 
Global citizens
Global citizensGlobal citizens
Global citizens
 
Online Networking: Finding People,Places and Things
Online Networking: Finding People,Places and ThingsOnline Networking: Finding People,Places and Things
Online Networking: Finding People,Places and Things
 
ID & Data presented at SDForum TechWomen
ID & Data presented at SDForum TechWomenID & Data presented at SDForum TechWomen
ID & Data presented at SDForum TechWomen
 
Social networking tips_teens
Social networking tips_teensSocial networking tips_teens
Social networking tips_teens
 
Pbe 3.0 final presentation 2011
Pbe 3.0 final presentation 2011Pbe 3.0 final presentation 2011
Pbe 3.0 final presentation 2011
 
Change of chinese values across different generations
Change of chinese values across different generationsChange of chinese values across different generations
Change of chinese values across different generations
 
Getting to know you v0.2
Getting to know you v0.2Getting to know you v0.2
Getting to know you v0.2
 
Peer based learning
Peer based learningPeer based learning
Peer based learning
 
Slide show
Slide showSlide show
Slide show
 
презентаци субкультура
презентаци субкультурапрезентаци субкультура
презентаци субкультура
 
High school clicks
High school clicksHigh school clicks
High school clicks
 
The Pitch
The PitchThe Pitch
The Pitch
 
7679924 attitude-ppt
7679924 attitude-ppt7679924 attitude-ppt
7679924 attitude-ppt
 
Thinking Maps Make Learning H.O.T.!
Thinking Maps Make Learning H.O.T.!Thinking Maps Make Learning H.O.T.!
Thinking Maps Make Learning H.O.T.!
 
Thinkingmaps complete ppt good for teachers
Thinkingmaps complete ppt good for teachersThinkingmaps complete ppt good for teachers
Thinkingmaps complete ppt good for teachers
 
Archrival location culture
Archrival location cultureArchrival location culture
Archrival location culture
 
Youth
YouthYouth
Youth
 
Sleep problems
Sleep problemsSleep problems
Sleep problems
 

Mais de BAQMaR

Mais de BAQMaR (20)

Yuri Van Geest - Exponential Organizations
Yuri Van Geest - Exponential OrganizationsYuri Van Geest - Exponential Organizations
Yuri Van Geest - Exponential Organizations
 
Denyse Drummond-Dunn - Winning Customer Centricity
Denyse Drummond-Dunn - Winning Customer CentricityDenyse Drummond-Dunn - Winning Customer Centricity
Denyse Drummond-Dunn - Winning Customer Centricity
 
Stijn Geuens - I know what you’ll buy next summer
Stijn Geuens - I know what you’ll buy next summerStijn Geuens - I know what you’ll buy next summer
Stijn Geuens - I know what you’ll buy next summer
 
Chloé Van Vreckem - Uncovering the true Customer Value by using Survival Anal...
Chloé Van Vreckem - Uncovering the true Customer Value by using Survival Anal...Chloé Van Vreckem - Uncovering the true Customer Value by using Survival Anal...
Chloé Van Vreckem - Uncovering the true Customer Value by using Survival Anal...
 
Andy Petrella - Data Science is changing and you won’t be allowed to claim yo...
Andy Petrella - Data Science is changing and you won’t be allowed to claim yo...Andy Petrella - Data Science is changing and you won’t be allowed to claim yo...
Andy Petrella - Data Science is changing and you won’t be allowed to claim yo...
 
Anouk Willems - Turning Insights into Company-wide Memes
Anouk Willems - Turning Insights into Company-wide MemesAnouk Willems - Turning Insights into Company-wide Memes
Anouk Willems - Turning Insights into Company-wide Memes
 
Anouar El Haji - Auctions Speak Louder than Words
Anouar El Haji - Auctions Speak Louder than WordsAnouar El Haji - Auctions Speak Louder than Words
Anouar El Haji - Auctions Speak Louder than Words
 
Christophe Ovaere - Disrupting the Traditional MR Model
Christophe Ovaere - Disrupting the Traditional MR ModelChristophe Ovaere - Disrupting the Traditional MR Model
Christophe Ovaere - Disrupting the Traditional MR Model
 
Ray Poynter - Keynote: The Mobile Future of Research & Analytics
Ray Poynter - Keynote: The Mobile Future of Research & AnalyticsRay Poynter - Keynote: The Mobile Future of Research & Analytics
Ray Poynter - Keynote: The Mobile Future of Research & Analytics
 
Jon Puleston - Survey Research: The Science of ‘Prediction’
Jon Puleston - Survey Research: The Science of ‘Prediction’Jon Puleston - Survey Research: The Science of ‘Prediction’
Jon Puleston - Survey Research: The Science of ‘Prediction’
 
Filip Maertens - Artificial Intelligence: Building Emotion & Context aware Re...
Filip Maertens - Artificial Intelligence: Building Emotion & Context aware Re...Filip Maertens - Artificial Intelligence: Building Emotion & Context aware Re...
Filip Maertens - Artificial Intelligence: Building Emotion & Context aware Re...
 
Corinne Sandler - Keynote: Wake up or die! Be the only one who does what you ...
Corinne Sandler - Keynote: Wake up or die! Be the only one who does what you ...Corinne Sandler - Keynote: Wake up or die! Be the only one who does what you ...
Corinne Sandler - Keynote: Wake up or die! Be the only one who does what you ...
 
Tom De Ruyck - Opening session: Disrupt or be Disrupted
Tom De Ruyck - Opening session: Disrupt or be DisruptedTom De Ruyck - Opening session: Disrupt or be Disrupted
Tom De Ruyck - Opening session: Disrupt or be Disrupted
 
Wim Hamaekers - Neuro Science: Car Clinics 3.0 – Winner Esomar Effectiveness ...
Wim Hamaekers - Neuro Science: Car Clinics 3.0 – Winner Esomar Effectiveness ...Wim Hamaekers - Neuro Science: Car Clinics 3.0 – Winner Esomar Effectiveness ...
Wim Hamaekers - Neuro Science: Car Clinics 3.0 – Winner Esomar Effectiveness ...
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
Wouter Verbeke - Marketics: Adapted Analytics for Marketing Applications
Wouter Verbeke - Marketics: Adapted Analytics for Marketing ApplicationsWouter Verbeke - Marketics: Adapted Analytics for Marketing Applications
Wouter Verbeke - Marketics: Adapted Analytics for Marketing Applications
 
Anneke Deetlefs - Text Analytics: The Voice of the Customer
Anneke Deetlefs - Text Analytics: The Voice of the CustomerAnneke Deetlefs - Text Analytics: The Voice of the Customer
Anneke Deetlefs - Text Analytics: The Voice of the Customer
 
Yannig Roth - Open Innovation: Using Culturally Diverse Crowds To Source Glob...
Yannig Roth - Open Innovation: Using Culturally Diverse Crowds To Source Glob...Yannig Roth - Open Innovation: Using Culturally Diverse Crowds To Source Glob...
Yannig Roth - Open Innovation: Using Culturally Diverse Crowds To Source Glob...
 
Leonard Murphy - Research Industry Transformation: The Shape of Things to Come
Leonard Murphy - Research Industry Transformation: The Shape of Things to ComeLeonard Murphy - Research Industry Transformation: The Shape of Things to Come
Leonard Murphy - Research Industry Transformation: The Shape of Things to Come
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

BAQMaR 2008

  • 1.
  • 2.
  • 3.
  • 7. Online & offline Marketing research domains Friend organisations Neighbour countries
  • 11. C L Well known Popular Unique
  • 12.
  • 13.
  • 14. An explosion of data !
  • 15.
  • 17.
  • 18. Me, MySpace & I Visual (n) ethnography among 13-17 year olds Joeri Van den Bergh (InSites Consulting) Veerle Colin (MTV Networks)
  • 19.
  • 21. Getting intimate with our target groups ID construction of adolescents in a digitalised world Social groups as an extension of psychographic segments Role of brands in ID construction within social groups
  • 22. The Research Approach I SEE YOU, BABY
  • 23. Traditional ethnography context Participant R Hawthorne effect Researcher gaze Time & Cost intensive
  • 24. From traditional to visual ethnography Traditional 360° visual ethnography ethnography context context Participant Participant R R Contact with researcher Hawthorne effect via 2.0 tools Researcher “Informer” gaze: what is gaze important? Time & Cost Follow multiple participants from anywhere intensive
  • 25. 360° Ethnography 1.User generated MM ethnography = Observation takes place via photos/video taken by the participants to the study = Participants observe their own environment & report back to the researcher
  • 26.
  • 27. Ethnographic blog: identity related tasks Non me Aspirational Pictures of... me Clothes you no Pictures of... longer want to Other groups wear Favorite clothes Youngsters where youngsters where you so not want to you would like to Pictures of other groups that be friend with Personal me be friend with are different but okay Pictures of... Place where you can Pictures of youth of today really be yourself Clothes you wear at home Pictures of normal persons Objects that are typical for me Social me Pictures of... Important persons in your life Clothes you wear to go out Groups of youngsters you like
  • 28. 360° Ethnography 1.User generated ethnography = Observation takes place via photos/video taken by the participants to the study = Participants observer their own environment & report back to the researcher 2.Nethnography: social life of teenagers is very much MOVING ONLINE !!! = observation of the online behaviour & content of a target group or within a certain webspace
  • 29. Nethnography: we are your friends
  • 30. Social Network Sites: nethnography Social Personal identity identity Nicknam e Conversation Clan member Profile Monitoring on ship text guestbook Profile picture Photo collection
  • 31. And for the datamining freaks among you, Annelies ripped the internet • 300 active participants of netlog randomly selected. Equal spread age x gender • Webcrawlers to „scan‟ pages of netlog and substract content. • Textmining: profile pages – photo tags – clan membership – conversations on the guestbook
  • 32. Other online behaviour: tracking tool
  • 33. That‟s why we do call it (visual) (n)ethnograpy: shoot me! 100% UGC OBSERVED BY US
  • 36.
  • 38. Social Networks are not so social as you think they are Only 32% of the conversations on the guestbook are „interactive‟! Food The rest are all single statements. kids Shoes Litte Mobile phones Cars Quarrel Miss you Making fun Confirm friendship Travel Music Welcome by strangers Transport It was great Feedback on picture I’ am bored Youth movement Alcohol Sleeping How are you? Practical appointment School Feedback on profiles MSN Festivals Congratulations Family Gaming Age Food Party Sport Online movies Express love Study Tokio hotel
  • 39. one of the 3 key research questions Social groups among today‟s youngsters
  • 40. To what youngster group do I belong?
  • 41. Methodology M E Non group Aspira Social tional Group Group Differe nt but OK
  • 42. REAL WORLD WE ME (I’m better) CHANGE Skater Alternative Fashion girl Fashion boy Tektonic Rapper THINK Rockers DRINK Hippies MAINSTREA M Breezer sluts Punk Jumpers emo Nerds Gothic Geek girls CONSERVATISM
  • 43. spartacus121 best pk (runescape) 30/04/2008 19:07:55 door laurent ik hit iets in de 27 en dat is al zot hoog zij hits zijn gestoord lol. zen zwaard is 140M (ik heb 10 M)
  • 44. REAL WORLD WE ME (I’m better) CHANGE SKILLS SKILLS SKILLS SKILLS SKILLS SKILLS SKILLS Skater Alternative SKILLS SKILLS Fashion girl SKILLS Fashion boy SKILLS SKILLS SKILLS SKILLS SKILLS SKILLS Tektonic SKILLS Rapper THINK Rockers SKILLS DRINK SKILLS Hippies Breezer sluts SKILLS SKILLS SKILLS SKILLS Punk SKILLS SKILLS SKILLS SKILLS Jumpers SKILLS emo SKILLS SKILLS SKILLS SKILLS SKILLS SKILLS Gothic SKILLS Nerds SKILLS SKILLS SKILLS Geek girls CONSERVATISM
  • 45. REAL WORLD WE ME (I’m better) CHANGE LOOKS LOOKS LOOKS LOOKS LOOKS LOOKS Skater LOOKS Alternative LOOKS LOOKS Fashion girl LOOKS Fashion boy LOOKS LOOKS LOOKS LOOKS LOOKS Tektonic LOOKS LOOKSRapper THINK Rockers LOOKS DRINK LOOKS Hippies LOOKS Breezer sluts LOOKS LOOKS LOOKS Punk LOOKS LOOKS LOOKS LOOKS LOOKS Jumpers LOOKS emo LOOKS LOOKS LOOKS LOOKS LOOKS LOOKS Nerds LOOKSGothic LOOKS LOOKS Geek girls CONSERVATISM
  • 46. REAL WORLD WE ME (I’m better) CHANGE SKILLS LOOKS LOOKS LOOKS LOOKS SKILLS SKILLS SKILLS SKILLS SKILLS LOOKS LOOKS SKILLS Skater LOOKS Alternative SKILLS LOOKS SKILLS LOOKS Fashion girl SKILLS LOOKS Fashion boy SKILLS LOOKS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS Tektonic LOOKS SKILLS LOOKSRapper THINK Rockers SKILLS LOOKS DRINK SKILLS LOOKS Hippies LOOKS Breezer sluts LOOKS LOOKS LOOKS SKILLS SKILLS SKILLS SKILLS Punk LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS Jumpers SKILLS LOOKS SKILLS emo LOOKS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS SKILLS Nerds LOOKSGothic LOOKS SKILLS LOOKS SKILLS SKILLS Geek girls CONSERVATISM
  • 47.
  • 48. REAL WORLD WE ME (I’m better) CHANGE SKILLS LOOKS LOOKS LOOKS LOOKS SKILLS SKILLS SKILLS SKILLS SKILLS LOOKS LOOKS SKILLS Skater LOOKS Alternative SKILLS LOOKS SKILLS LOOKS Fashion girl SKILLS LOOKS Fashion boy SKILLS LOOKS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS Tektonic LOOKS SKILLS LOOKSRapper THINK Rockers SKILLS LOOKS DRINK SKILLS LOOKS Hippies LOOKS Breezer sluts LOOKS LOOKS LOOKS SKILLS SKILLS SKILLS SKILLS Punk LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS Jumpers SKILLS LOOKS SKILLS emo LOOKS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS LOOKS SKILLS SKILLS LOOKS SKILLS Nerds LOOKSGothic LOOKS SKILLS LOOKS SKILLS SKILLS Geek girls CONSERVATISM
  • 49. How this research changed our lives REMEMBER ME
  • 50. 6 changes that rocked our socks off • The tools for ID construction have changed • New online quali methods proved to be efficient • Our target group = the new and better quali researchers • New reflection kit for entire MTV Networks staff • Closer connection with MTV & new clients • Redefine content strategy of TMF on screen & on line
  • 51.
  • 52.
  • 53.
  • 55. Our Mission | Boosting your customer value 4C Consulting helps companies win, keep and grow customer value
  • 56. Our Solutions | Call us for… Business Customer Value requirements Strategy definition Package selection Customer & implementation Insight Process Post-launch Excellence care
  • 57. Our Focus | Boosting your customer value Increase Revenue Reduce Costs 1. Acquire new customers 1. Efficient Delivery (Process Excellence) 2. Sell more to existing customers • More of the same (increase turnover) 2. Align value propositions • Expand value proposition portfolio (cross-sell & product development *) 3. Advanced pricing • Upgrade value proposition (Upsell) 3. Prevent existing customers from leaving
  • 58. Business Intelligence Practice SCV ROMSCCI Competing on Analytics Infrastruc- tuur Audit Exploitatie Coaching Data Quality
  • 59. Why 4C Consulting | 7 compelling reasons 1. 100% focus on customer value management 2. Result-oriented project approach 3. Connecting marketing, sales & customer care with senior management and IT 4. Independent consultant for 10 years 5. Experienced crew, passionate about marketing, sales & customer care 6. Value based pricing model 7. Satisfied & loyal customers: 90 customers, more than 380 projects
  • 60. 60
  • 61. Optimize your business with Business &Decision Michel Meulders - Domain Manager -
  • 62. Business & Decision Benelux Founded in 2002 Merger of several companies specialised in BI Business & Decision Benelux is : Consulting & System Integrator : • a multi-specialist, in specific - More than 300 consultants technology fields : - About 18 mio Euro turnover in 2007 • Business Intelligence - 58% organic growth comparing to 2006 • Customer Relationship Management • Life sciences - Last acquisition : BnV Group (BE+NL) • Risk & Compliance Turnover evolution (consolidated) 25000 Belgium Luxembourg Netherlands • with foreign offices in Brussels, Thousands Amsterdam, Luxembourg 20000 15000 • Top accounts in finance, pharma, 10000 telco, distribution, industry (Fortis, ING, ABN Amro, Dexia, GSK, UCB, Proximus, 5000 Belgacom, Carrefour, Honda,…) 0 2004 2005 2006 2007 Obj 2008
  • 63. For more info see http://www.businessdecision.com
  • 64. Belgian Federation of Market Research Institutes www.febelmar.be
  • 65.
  • 66.
  • 67. Febelmar mission Development and promotion of market research and opinion polls in Belgium Protecting the sector interests Watching over correct use of deontological rules of market research in all phases of the market research process Stimulating continuous improvement of quality of service in market research Being a platform for communication, exchange of expertise and networking
  • 68. Members 27 agencies. Together they represent about 75% of the total market research expenditures in Belgium.
  • 69.
  • 70.
  • 71. What is I4BI? i4bi is specialized in implementing Business Intelligence Solutions in your company. Our team of BI experts has deep functional and technical experience with Application development, Business Model definitions and Data Warehousing. Functional/technical designs, development, application role out, training etc… are phases in a project where our consultants have many years of experience. i4bi consultants have a deep knowledge of the Oracle Business Intelligence products and solutions. i4bi sponsors the development of an independent analytical branch, which will probably see the light in 2009
  • 72. What do we provide? Strategic Decision Making To Support Business Technical Analytical Experience Abilities Expertise We Provide
  • 73. Expertise • Analytical Expertise • Data Mining • Statistical modelling • Predictive analysis • Basel II compliant modelling • Forecasting • Business Analysis Expertise • Reporting • Delivering Business Insight to decision makers • Marketing Analysis • Data Quality • Use of technical tools such as SAS – SPSS – Statistica to support & extend business knowledge
  • 74. Contact • For more general information: www.I4BI.be • For more analytical information: Filip.deroover@I4BI.be
  • 76. We believe ... in the empowered consumer Human-to-human interactions are more powerful than ever and can make or break your brand “Consumers are beginning in a very real sense to own our brands and participate. We need to begin to learn how to let go” A.G. Lafley, CEO & Chairman of P&G
  • 77. We believe ... in giving back Rewarding experiences for participants Active involvement of panel members Charity contributions
  • 78. We believe ... in connecting Everything we do is aimed at strengthening connections between you, your market and us “Connected Research” brings you closer to your market and taps into the wisdom of the crowds Some of our connected research methods Research communities Bulletin boards Blog research Online discussion groups More information: http://connectedresearch.insites.eu/
  • 79. We believe ... in the power of new research methods for better marketing decision making Informational Providing more depth to research insights Transformational Doing things that were previously not possible Automational Conducting research more efficiently
  • 80. We believe ... in 1 + 1 = 3 Old and new methods need to be optimally “fused” in order to fully grasp the new customer / consumer reality
  • 81. We believe ... in the power of our team People make the difference Open, forward thinking, dedicated, passionate Specific knowledge centers
  • 82. performance management  consulting  technology Welcome to Keyrus BAQMaR, 17 December 2008 ©Keyrus – all rights reserved
  • 83. About Keyrus (Belgium) • founded in 1996 as SOLIDPartners • focus on performance management, business intelligence & data warehousing • strong and balanced client base spread over different industries • +100 consultants specialised in both technical and business domains • part of Keyrus group (France)
  • 84. Keyrus‟ global footprint  head office in Paris  present in 9 countries  +1300 employees  listed on Paris stock exchange Euronext
  • 85. Vision & mission Keyrus will be one of the few leading service providers in the area of performance management. We help our clients to effectively design, build and operate the adequate performance management organization and solutions in an integrated end-to-end fashion.
  • 86. Portfolio of solutions & services Information Business Analytic People and Corporate Management Intelligence Applications Processes Performance Platforms Management IM BIP AA P&P (C)PM BI Layer (reports, OLAP, dashboards, alerts) outflow functions source systems data warehouse & applications data delivery & management information & data marts functions CPM data delivery, exchange & synchronization CPM Applications (e.g. Analytic Applications planning, ABM, PA) (e.g. data mining)
  • 87. Contact us Keyrus nv info@keyrus.be Nijverheidslaan 3/2 B-1853 Strombeek-Bever t +32 2 706 03 00 f +32 2 706 03 09 www.keyrus.be performance management  consulting  technology 17-Dec-2008
  • 89. Who is PROFACTS ? We are „the new kids on the block‟ in (online) market research ...
  • 90. 1 REVEALING FACTORS FOR SUCCESS strategy 200% 286 people are nowyrs growth rate working mean age @ Profacts 2 people have founded Profacts Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2006 2007 2007 2007 2007 2008 2008 2008 2008
  • 91. REVEALING FACTORS FOR SUCCESS Profacts is active in more then 10 sectors ... AUTOMOTIVE FMCG RECRUITMENT GPS BANKING INSURANCES ICT PHARMACEUTICAL TELECOM ENERGY
  • 97. Rogil Research A research agency with a view
  • 98. MARKETING & SENSORY RESEARCH OUR PASSION Sensory research FTF research (Mobile Unit) Eye-tracking / Eye|watch Telephone research Tachistoscope Online research Trained panel Panel services Consumer panel Fieldwork in Europe Taste lab
  • 99.
  • 100. Sensory Safari Note down in your agenda SENSORY SAFARI • March 26th 2009 • 18u • At Rogil in Leuven
  • 102. We hope to welcome you in March. Thanks for your attention !
  • 103. SAS Analytics For Challenging Times Start Focused, Think Wide
  • 105. CRM is becoming Risk Managment
  • 106. SAS Breadth of Analytic Offering… • Statistical Analysis • Survey Design/Analysis • Data Mining • Text Mining • Time Series Mining • Forecasting • Quality Improvement • Operations Research
  • 107. SAS Innovations in Marketing Solutions….
  • 108. http://www.sas.com/feature/analytics/index.html Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 109. The Mission Drive the widespread use of data in decision making
  • 110. The Focus Attract Retain Grow Fraud Risk Driving and Maximizing Profit
  • 111. The Vision Operational Processes Interaction data Attitudinal data Descriptive data Behavioral data Enterprise Enterprise Data Data Sources Sources
  • 112. The Acceptance • The rise of the agnostics  Science vs. Chance  In numbers we trust!
  • 113.
  • 114.
  • 115.
  • 116. The myth of the „best‟ algorithm lessons learned from innovations in data sampling and data pre-processing for marketing analytics Dr. Sven F. Crone Deputy Director, Ass. Prof.
  • 117. Associated Experts Prof. Paul Goodwin Directors Dr. Andrew Eaves Prof. Robert Fildes Prof. Peter Young Research & PhD students Dr.Sven F. Crone Heiko Kausch, RA Stavros Asimakopoulos Researchers Xi Chen Dr. Steve Finlay Bruce Havel Dr. Alastair Robertson Suzi Ismail Dr. Didier Soopramanien Nikolaos Kourentzes Dr. Kostas Nikolopoulos Ioannis Stamatopoulos Andrey Davidenko Prof. Stephen Taylor Charlotte Brown Dr. Wlodek Tych Hong Juan Liu Prof. David Peel T Hu Prof. Peter Pope John Prest Huang Tao Visiting Researchers Prof. Geoff Allen Dr. Yukun Bao Young-Sang Cho
  • 118. “Take away this pudding, it has no theme.” Sir Winston Churchill (1915)
  • 119. Agenda • Sampling issues in Data Mining • Case study 1: Direct Marketing • Cross-selling of Magazine subscriptions • Effect of data preprocessing: Sampling • Interaction of Sampling with Scaling & Coding • Case study 2: Credit & Behavioral Scoring • Predicting consumer credit default • Effects of sample size • Effects of sample distribution • Case study 3: Online Shopping Behaviour • Predicting consumer shopping channel choice • Sample distribution & multiple classes • Conclusion & Take-aways
  • 120. Why (Under/Over) Sampling? • Knowledge Discovery (KDD) = non-trivial process of identifying valid, novel, useful patterns in large data sets • Data Mining = only one single step in the KDD process • Data sample determines the whole process! ( GIGO) • “Research seems preoccupied with algorithms” [Hand 2000] SAS SEMMA DM-Process Monitoring CRISP-DM Process
  • 121. Sampling in Direct Marketing Literature? Data reduction** Data projection Input Paramete Feature Re- Continuous attributes Categories type* Methods*** r tuning Selection sampling Standardisation Discretisation Coding [2] 2 BMLP, LR, LDA, QDA X X [42] 1 MLP, LR, CHAID X X [43] 2 MLP, RBF, LR, GP, CHAID X X [44] 3 MLP, LR, LDA X X [4] 2 CHAID, CART X [6] 2 MLP, LR X X X X X [9] 2 LVQ, RBF, 22 DT, 9 SC X X LDA, LR, KNN, KDE, CART, MLP, [45] 2 X X RBF, MOE, FAR, LVQ [3] 1 MLP X X [7] 2 LSSVM X X X [11] 2 LR, LS-SVM, KNN, NB, DT X X X LDA, QDA, LR, BMLP, DT, SVM, [10] 1 X X LSSVM, TAN, LP, KNN [46] 2 LR, MLP, BMLP X X LSSVM, SVM, DT, RL, LDA, QDA, [47] 2 X X LR, NB, IBL [48] 1 DT, MLP, LR, FC X [49] 1 FC X X Majority of direct marketing papers focus on algorithm tuning Only 3 papers consider Resampling / Instance Selection No analysis of the interaction with Sampling & Projection & …
  • 122. Classification since last purchase … Many Last campaign No response Subscribed to magazine Few … Days 1… Number of subscriptions … Many Database of customers (instances) Known attributes for all customers (age, gender, existing subscriptions, …) Known response (class membership) of buyers & non-buyers from past mailings Build a model to separate classes  decision boundary of different complexity
  • 123. Classification since last purchase … Many No response Subscribed to magazine Class unknown Few … Days 1… Number of subscriptions … Many Use the decision boundary to classify unseen instances Calculate on which side of hyperplane the instances lie (or distance) Assign class to unseen instances
  • 124. Reality Check: Imbalanced classes since last purchase … Many No response Subscribed to magazine Problem • Classifiers are biased towards the majority class • Shifts the decision boundary • Error / Accuracy based learning Few … Days creates naïve classifiers • Invalid separation of classes 1… Number of subscriptions … Many Balanced dataset = class distributions are equal P(x|y=A)=P(x|y=B)  proportional sampling or stratified sampling feasible Imbalanced dataset = class distributions unequal P(x|y=A)>>P(x|y=B) ` The class of interest is often the minority (in most business applications)
  • 125. Imbalanced Data Sampling since last purchase … Many No response Subscribed to magazine Stratified Random Sampling divide DB in mutually exclusive strata (subpopulations) & draw random samples from each Proportional assure proportions in samples Few … Days equal those in population Disproportional weighted over-& undersampling of important classes 1… Number of subscriptions … Many Size of the sample? Distribution / location of the sample?
  • 126. Random Undersampling since last purchase … Many No response Subscribed to magazine Benefits • Helps detect rare target levels Risks • Biases predictions (correctable) • Looses information contained in Few … Days instances of the majority class • Creates different boundaries • Increases prediction variability 1… Number of subscriptions … Many •… Exclude random instances of the majority class Retain all instances of the minority class Establish a balanced class distribution
  • 127. Random Oversampling since last purchase … Many No response Subscribed to magazine Benefits • Helps detect rare target levels • No loss of information Risks • Biases predictions (correctable) Few … Days • Increases prediction variability • Increases processing time 1… Number of subscriptions … Many Retain all instances of the majority class in the sample Duplicate identical instances of the minority class Establish a balanced class distribution
  • 128. Ready for more theory…? x  rather some case studies ...!
  • 129. Agenda • Sampling issues in Data Mining • Case study 1: Direct Marketing • Cross-selling of Magazine subscriptions • Effect of data preprocessing: Sampling • Interaction of Sampling with Scaling & Coding • Case study 2: Credit & Behavioral Scoring • Predicting consumer credit default • Effects of sample size • Effects of sample distribution • Case study 3: Online Shopping Behaviour • Predicting consumer shopping channel choice • Sample distribution & multiple classes • Conclusion & Take-aways
  • 130. Business Case: Direct Marketing/Response Optimization • Sell a magazine subscription to existing customers • Whom to send mail to? (Which customers are most likely to respond?) • How many customers to contact? (What is the optimal mailing size?) Corporate project with leading German Publishing House Provided data set of past mailing campaigns Benchmark novel methods against in-house SPSS Clementine Explore Neural Networks (NN) an Support Vector Machines (SVM)
  • 131. Benefits of Direct Marketing Simple With data mining Addressees 100.000 Top 40% = 40.000 Cost 2€/mail = 200.000€ 2,5€/mail = 100.000€ Response rate 0,5% = 500 1,0% = 400  Sales volume 300€ 300€ Sales volume 150.000€ 120.000€ Revenue -50.000€ 20.000€ Smaller mailing (number of letters sent)  lower costs (Euro 1.- per letter) Higher response rate  higher revenue More specific mailing  lower cost More relevant information  higher customer satisfaction
  • 132. NN get worse with learning … • Wish to implement Neural Networks for next campaign • In-house team (with no NN knowledge) outperformed us EVERY TIME! • Analyzed software, training parameters, etc.  internal competition • Observed expert in building models … ! Pred Pred Sum Pred. Pred. Sum % % C0 C1 C0 C1 C0 61.86 38.14 100 C0 72.96 27.04 100 C1 55.09 44.81 100 C1 62.02 37.98 100 116.95 82.95 54.26 134.98 65.02 55.47 Pred. Pred. Sum % C0 C1 C0 52.87 43.37 100 C1 47.13 56.63 100 100 100 54.75
  • 133. Experimental Design: Different data pre-processing Handle categorical Scale numerical Different Encoding Different Scaling features n, n-1, thermo, ordinal features Standardise Discretise, Adjust imbalanced Decide on sample Different Sampling class distributions Over-& Undersampling size and method Handle outliers Select useful features Evaluate across 3 algorithms: Neural Networks (MLPs), Support Vector Machines & Decision Trees
  • 134. Dataset Structure Data set size Data set structure • 300,000 customer records • 18 categorical features • 4,019 subscriptions sold • 35 numerical features • Response rate of 1.3% • Binary target variable Evaluated the Impact of Data Preprocessing • Data Sampling (over sampling vs. undersampling) • Categorical attribute Encoding (N, N-1, thermo, ordinal) • Continuous attribute Projection (Binning vs. Normalisation) • Continuous attribute Scaling ( [0,+1] vs. [-1,+1] range) Multifactorial design to evaluate impact across multiple methods Neural Networks (NN) Support Vector Machines (SVM) Decision Trees (CART)
  • 135. Sampling  Created 2 Dataset Sampling candidates Data partition (number of records) Oversampling Undersampling Data subset Class 1 Class -1 Class 1 Class -1 Training set 20,000 20,000 2,072 2,072 Validation set 10,000 10,000 1,035 1,035 SUM 30,000 30,000 3,107 3,107 Test (hold-out) set 912 64,088 912 64,088 Different balancing in the training data Original distribution in the test data (65,000 instances)
  • 136. Results Increase Increase Increase Oversampling outperforms undersampling consistently! Gain in Lift depends on method (different sensitivity) Oversampling has higher impact than data coding & scaling
  • 137. Recommendations from Case Study • Sampling • Oversampling outperfoms undersampling for all methods • Undersampling: better in-sample results & worse out of sample • Choice of method • NN & SVM better than CART • Encoding & Projection • SVM: avoid Ordinal coding (e.g. 1,2,3) all other similar (incl. N !) • NN: avoid standardization & ordinal encoding • DT / CART: use temperature, all others similar (incl. ordinal) Binning & Scaling of continuous attributes irrelevant for all methods! Use Undersampling & N-1 encoding with SVM & NN Best preprocessed SVM  lift of 0.645 on test set … BUT …
  • 138. Results across Pre-processing  Preprocessing: higher impact than method selection  Lift-variation per method from Sampling/Scaling/Coding > Difference of Lift between competing methods! Lift performance on Arithmetic Mean Performance Geometric Mean Performance Test data subset on Test data subset on Test data subset 0,65 0,58 0,58 0,57 0,64 0,57 0,56 0,55 0,63 0,56 GM test Lift test AM test 0,54 0,62 0,55 0,53 0,52 0,61 0,54 DPP causes 50%-70% of the 0,51 differences between models 0,60 0,53 0,50 NN SVM DT NN SVM DT NN SVM DT Method Method Method Results are consistent across error measures Experiments allow identification of „best practices‟ to model methods Best-practice preprocessing varies between methods
  • 139. Agenda • Sampling issues in Data Mining • Case study 1: Direct Marketing • Cross-selling of Magazine subscriptions • Effect of data preprocessing: Sampling • Interaction of Sampling with Scaling & Coding • Case study 2: Credit & Behavioral Scoring • Predicting consumer credit default • Effects of sample size • Effects of sample distribution • Case study 3: Online Shopping Behaviour • Predicting consumer shopping channel choice • Sample distribution & multiple classes • Conclusion & Take-aways
  • 140. Business Case: Predicting Customer Online Shopping Adoption • Traditional buying process is offline & simultaneous  “bricks” store • Introduction of the Internet changes consumer behaviour • Seek information online & offline • Purchasing online & offline  Changing purchasing behaviour through internet adoption  Changing purchasing behaviour through Technology Acceptance • Development of heterogeneous Purchasing Behaviour • Example: Purchasing electronic durable consumer goods • Search for product info (e.g. video cameras) online  test product in-store  search for best deal on internet & purchase Search for Information Online Purchase Online Online Shoppers Browsers Search for Information Offline Purchase Offline Non-Internet Shoppers
  • 141. Stages of Internet Adoption 1. OFFLINE BUYERS Information gathering & purchasing in Stores 2. BROWSERS Information gathering online & purchasing in stores 3.ONLINE BUYERS Information gathering & purchasing online
  • 142. Motivation DIDIER: Marketing Modelling SVEN: Data Mining Perspective • Econometric / Marketing Domain • IS/OR/MS Domain  Data Mining • Seeks to explain how customers behave in • Seeks to accurately predict regardless of online shopping explanation why customers buy • Use of „black-box” logistic regression • Use of “black-box” methods from models computational intelligence Models class membership to identify Models class membership to causal variables that explain choices accurately classify unseen instances Descriptive & Normative Modelling Predictive Modelling Best practices Best practices   balance datasets for distribution  Rebalance datasets for equal distribution representative of population of target variables  Use ordinal variables & nominal variables  Recode ordinal  binary scale without recoding  Rescale & normalise data to facilitate  Do not normalise / scale data learning speed etc. same dataset & same objectives & similar methods Conflicting “best practice” approaches to modelling Outside of most software simulators!!! Implicit knowledge? … WHO IS “CORRECT”? WHAT IS THE IMPACT?
  • 143. Dataset • Survey on Internet Shopping Behaviour • 5500 UK households  685 respondents • Adjusted for age, income etc. of customers (older less likely to buy) • Adjusted for product specific risk of online shopping for branded durable consumer goods (inspection required to some extent) • 73 questions on factors related to internet shopping, products etc. Online Shopping Factors: “Going to the shops is as convenient as Internet shopping” Demographics Class 1: “I would buy online if products are Browse Ônline & branded” etc. [1=strongly agree; …] Buy Online Internet Class 2: Logistic Regression specific Browse Online & Neural Networks Demographic Factors Factors Buy Offline Age, Gender, Income Class 3: Browse & Online Buy Offline Internet Utility Factors shopping specific Score from 6 correlated variables Factors Input Variables Models Output Variables  Mixed scale of nominal, ordinal, interval
  • 144. Imbalanced Classification problem • Split of Dataset for Training, Validation and Test {50%;25;25%} • Distribution of target classes is skewed {65% online buyers; 22.5% browsers; 12.5% offline shoppers} • Rebalancing of data sets through over- & undersampling) Dataset Dataset Imbalanced Imbalanced Oversampling Oversampling Undersampling Undersampling 400 400 300 300 Count Count 200 200 100 100 Data Subset Data Subset Training Training Validation Validation Test Test 0 0 Online- Browsers Offline- Online- Browsers Offline- Online- Browsers Offline- Online- Browsers Offline- Online- Browsers Offline- Online- Browsers Offline- Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers Shoppers
  • 145. Results without Discretisation Logist.Reg. True Training Data Test Data Dataset Value Online Browse Offline Online Browse Offline Original Online 93.36 5.17 1.48 88.89 7.78 3.33 MCRtrain=54.3% Imbalanced Browser 62.77 23.40 13.83 49.39 22.58 29.03 MCRtest =48.9% Offline 36.54 17.31 46.15 35.29 29.41 35.29  Under- Online 57.69 30.77 11.54 64.44 23.33 12.22 Sampling Browser 26.92 48.08 25.00 32.26 25.81 41.94 MCRtrain=55.8% Offline 17.31 21.15 61.54 29.41 35.29 35.29 MCRtest =41.8%  Over- Online 68.27 24.35 7.38 74.44 16.67 8.89 Sampling Browser 30.63 43.91 25.46 35.48 29.03 35.48 MCRtrain=58.4% Offline 16.97 19.93 63.10 29.41 29.41 41.18 MCRtest =48.2% Neural Net Training Data Test Data Dataset Online Browse Offline Online Browse Offline MCRtrain=54.4% Original Online 86.19 12.71 1.10 86.67 8.89 4.44 MCRtest =52.5% Imbalanced Browser 53.13 31.25 15.63 41.94 35.48 22.58 Offline 25.17 28.57 45.71 29.41 35.29 35.29  Under- Online 44.86 40.00 17.14 27.78 58.89 13.33 MCRtrain=54.9% Sampling Browser 14.29 48.57 37.14 16.13 32.26 51.61 MCRtest =35.7% Offline 8.57 20.00 71.43 11.76 41.18 47.06   Over- Online 81.22 18.23 0.55 61.11 22.22 16.67 MCRtrain=88.0% Sampling Browser 14.92 83.43 1.66 19.35 77.42 3.23 MCRtest =75.6% Offline 15.52 0.55 99.45 0.00 11.76 88.24 Mean Classification Rate (%)
  • 146. Results with Discretisation of Ordinal Logist.Reg. True Training Data Test Data Dataset Value Online Browse Offline Online Browse Offline Original Online 91.51 6.64 1.85 85.56 7.78 6.67 MCRtrain=61.15% Imbalanced Browser 54.26 36.17 9.57 48.39 32.26 19.35 MCRtest =45.1% Offline 26.92 17.31 55.77 58.82 47.62 17.65  Under- Online 71.15 21.15 7.69 55.56 24.44 20.00 Sampling Browser 17.31 65.38 17.31 67.74 6.45 25.81 MCRtrain=69.9% Offline 15.38 11.54 73.08 58.82 0.00 41.18 MCRtest =34.4%   Over- Online 68.63 22.88 8.49 70.0 21.11 8.89 Sampling Browser 17.34 56.83 25.83 12.90 58.06 29.03 MCRtrain=66.0% Offline 13.28 14.02 72.69 17.65 23.53 58.82 MCRtest =62.3% Neural Net Training Data Test Data Dataset Online Browse Offline Online Browse Offline MCRtrain=56.5% Original Online 96.13 3.87 0.00 84.44 11.11 4.44 MCRtest =45.5% Imbalanced Browser 68.75 28.13 3.13 64.52 22.58 12.90 Offline 40.00 14.29 45.17 58.82 11.76 29.41  Under- Online 57.14 40.00 2.86 25.56 72.22 2.22 MCRtrain=55.2% Sampling Browser 34.29 54.29 11.43 67.74 29.03 3.23 MCRtest =28.0% Offline 14.29 31.43 54.29 52.94 17.65 29.41   Over- Online 98.34 1.10 0.55 58.89 24.44 16.67 MCRtrain=99.5% Sampling Browser 0.00 100.0 0.00 3.23 83.87 12.90 MCRtest =79.0% Offline 0.00 0.00 100.0 0.00 5.88 94.12 Mean Classification Rate (%)
  • 147. Summary Oversampling outperforms other samplings - Across Different Datasets - Across various data preprocessing Methods show different sensitivity to Sampling - More variation from sampling, coding & scaling than between methods - Using different preprocessing variants is important in modeling Various sophisticated extensions exist - SMOTE (Synthetic Minority Oversampling Technique) - K-nearest Neighbor sampling (removal / creation) - One-class learning etc. … Extend your bad of tricks … - … and experiment with imbalanced sampling!
  • 148. Questions? Sven F. Crone Lancaster University Management School Centre for Forecasting Lancaster, LA1 4YX email s.crone@lancaster.ac.uk SYt  Yt 1  (1   )SYt 1  
  • 149. Exploring Innovation “Online panel” vs “Online „streaming/convenience” sampling
  • 150. Unfortunately, the presenters of iVOX & Corelio cannot share their presentation with the BAQMaR community due to reasons of confidentiality!