SlideShare a Scribd company logo
1 of 16
New Data Sources for
Statistics: Experiences at
Statistics Netherlands
Social media: Twitter



Piet Daas, Marko Roos, Mark van de Ven and Joyce Neroni
Statistics Netherlands

                                               AAPOR 2012
Why are we interested in data sources,
such as Twitter?
• All National Statistical Institutes use:
     • Survey data
     • Sometimes also Administrative data

• But there are other sources of information out there
  (in increasing numbers: BIG Data)
     • Can they be used for statistics?
          • Burden and cost reduction
     • Try it!
          • Innovative research is greatly stimulated


AAPOR 2012: Twitter as a potential data source for statistics
                                                                1
Why study Twitter?




                                                          Maps by Eric Fischer (via Fast Company)


AAPOR 2012: Twitter as a potential data source for statistics
                                                                                                    2
About Twitter

•      Twitter is used intensively in the Netherlands
           • Relatively easily accessible (text)data
•      Potential source of personal information,
       opinions, and sentiments
•      But what kind of information is actually
       discussed?
           1) Identify the topics discussed in the Netherlands
              • In public tweets only
           2) Is this information useful?

    AAPOR 2012: Twitter as a potential data source for statistics
                                                                    3
Start with collecting data

• How?
  • Tried several ways
     • Best option was to:
          1) Collect usernames
          2) Identify ‘Dutch’ users
          3) Collect tweets from Dutch users
          4) Identify topics in those tweets


AAPOR 2012: Twitter as a potential data source for statistics
                                                                4
1) Collect usernames

• Breadth first algorithm / snowball sampling
   • Started with a user with many followers
         • A famous Dutch politician with 79,798 followers
   • Collect the followers of her followers etc.
         • By Twitter REST API, 12 user accounts and PHP-scripts
   • After 4 weeks we obtained
         • 4,413,391 unique users (id’s)
         • Collected user id, username, location and profile information


 AAPOR 2012: Twitter as a potential data source for statistics
                                                                     5
2) Identify ‘Dutch’ users

• By using location information provided
     • A considerable number of users do this
          • Checked the location names provided
              • Inclusion and exclusion list
          • A total of 380,415 (~9%) users were identified as
            located in the Netherlands
          • 38% of the users, 1,661,467, provided no location info




AAPOR 2012: Twitter as a potential data source for statistics
                                                                     6
3) Collect tweets

• For the 380,415 users the 200 most
  recent tweets were collected
     • A total of 12,093,065 messages was obtained
     • 39% of the users had no ‘tweets’
     • Some characteristics




AAPOR 2012: Twitter as a potential data source for statistics
                                                                7
4) Identify topics

•     Used 2 approaches
     1) Hashtags (1,750,074 with 1 hash, 14.5%)
          •     Hashsign (#) identifies ‘keyword’
                • E.g. #ned, #fail, #wk2010
          •     Manual and text-mining approach
     2) Non-hashtags (10,330,613 in total, 85.4%)
          •     Manual (sample)
          •     Text-mining approach failed here
                •    Result of the large ‘Other’ group



AAPOR 2012: Twitter as a potential data source for statistics
                                                                8
Topic identification: Hashtags
           Economy
                                                                                  Hashtags
           Education                                                              Non-hashtags
         Environment                                                              Total
               Events
               Health
              Holiday
                  ICT
                Living
                Media
              Politics
                                                (20%)
           Relations
Themes




             Security
          Spare time               (9%)
               Sports
                                      (13%)
           Transport
            Weather
                Work
                Other                         (18%)

                         0        10           20            30              40             50

                                                Contribution (%)

             AAPOR 2012: Twitter as a potential data source for statistics
                                                                                                 9
Topic identification: Non-hashtags*
           Economy
                                                                                  Hashtags
           Education                                                              Non-hashtags
         Environment                                                              Total
               Events
               Health
              Holiday
                  ICT
                Living
                Media
              Politics
           Relations
Themes




             Security
          Spare time                   (10%)
               Sports                     (6%)
           Transport
            Weather
                Work
                Other                                                                             (51%)

                         0        10             20          30              40             50

                                                  Contribution (%)                               * A random sample

             AAPOR 2012: Twitter as a potential data source for statistics
                                                                                                          10
Topic identification: Combined
           Economy
                                                                                  Hashtags
           Education                                                              Non-hashtags
         Environment                                                              Total
               Events        (1%)
               Health
              Holiday
                  ICT
                Living
                Media
                                     (7%)
              Politics         (3%)
           Relations
Themes




             Security
                                         (10%)
          Spare time
               Sports                       (7%)
           Transport
            Weather
                Work          (5%)
                                                                                                 (46%)
                Other

                         0          10             20          30            40             50

                                                    Contribution (%)

             AAPOR 2012: Twitter as a potential data source for statistics
                                                                                                         11
Conclusions

• Is Twitter of potential interest for statistics?
     • Yes
• What are the interesting topics for us?
     • Work (5%), politics (3%), spare time (10%)
       and events (1%)
• Can the data be used ‘as is’?
     • No - ‘Low information content’
          - Representativity of users


AAPOR 2012: Twitter as a potential data source for statistics
                                                                12
Conclusions (2)
• Representativity of the data is a serious issue
  • Clear that only a subset of the (Dutch) population
    is observed
         • Not everybody in the Netherlands is active on Twitter
   • Hardly any background information available
         • Although some users provide very interesting details in
           their user profile

• Work around?
  • (Only) use twitter to get quick info (a trend) on a
    specific topic

  AAPOR 2012: Twitter as a potential data source for statistics
                                                                  13
Future work

• Continue to study Social media!
• But:
     1) No longer collect data ourselves (                      )
     2) In future studies focus on:
          • Mine sentiment towards specific topics
            • E.g. Economy, Consumer sentiment, but also
               statistics and Statistics Netherlands survey’s
          • Background info of users


AAPOR 2012: Twitter as a potential data source for statistics
                                                                14
Thank you for your attention!

• #Questions?




    Contact or follow me at: @pietdaas



AAPOR 2012: Twitter as a potential data source for statistics
                                                                15

More Related Content

Viewers also liked

Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明National Cheng Kung University
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesSeveralnines
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Tda presentation
Tda presentationTda presentation
Tda presentationHJ van Veen
 

Viewers also liked (8)

Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
 
Yellow fever
Yellow feverYellow fever
Yellow fever
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slides
 
VIRGINIA CAMACHO SUAREZ
VIRGINIA CAMACHO SUAREZ	VIRGINIA CAMACHO SUAREZ
VIRGINIA CAMACHO SUAREZ
 
chinthaka silva
chinthaka silvachinthaka silva
chinthaka silva
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 

Similar to New Data Sources for Statistics, Social media: Twitter.

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Piet J.H. Daas
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Trends in education technology and what this means for Talis Aspire (Dave Err...
Trends in education technology and what this means for Talis Aspire (Dave Err...Trends in education technology and what this means for Talis Aspire (Dave Err...
Trends in education technology and what this means for Talis Aspire (Dave Err...Talis
 
Roadmapping enterprise learning: Drivers, trends and potential game changers
Roadmapping enterprise learning: Drivers, trends and potential game changersRoadmapping enterprise learning: Drivers, trends and potential game changers
Roadmapping enterprise learning: Drivers, trends and potential game changersChristian Voigt
 
KM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationKM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationMYRA School of Business
 
ODDT beta-user guide-v0.9.12amc
ODDT beta-user guide-v0.9.12amcODDT beta-user guide-v0.9.12amc
ODDT beta-user guide-v0.9.12amcSean Ekins
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 
Social Media Start with the Listening
Social Media Start with the ListeningSocial Media Start with the Listening
Social Media Start with the Listeningsiriporn pongvinyoo
 
Future Of Data Nlb (18 Dec 09 Blog)
Future Of Data Nlb (18 Dec 09 Blog)Future Of Data Nlb (18 Dec 09 Blog)
Future Of Data Nlb (18 Dec 09 Blog)cherylzeng
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statisticsEdwin de Jonge
 
Esg research 2010 data protection trends
Esg research 2010 data protection trendsEsg research 2010 data protection trends
Esg research 2010 data protection trendsCA RMDM Latam
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
 
Atlantic Conversations 4
Atlantic Conversations 4Atlantic Conversations 4
Atlantic Conversations 4MediaBadger
 
Elections 2.0 2of2 (federal)
Elections 2.0   2of2 (federal)Elections 2.0   2of2 (federal)
Elections 2.0 2of2 (federal)Larry Hicock
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 

Similar to New Data Sources for Statistics, Social media: Twitter. (20)

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.
 
Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Strategy elements
Strategy elementsStrategy elements
Strategy elements
 
Trends in education technology and what this means for Talis Aspire (Dave Err...
Trends in education technology and what this means for Talis Aspire (Dave Err...Trends in education technology and what this means for Talis Aspire (Dave Err...
Trends in education technology and what this means for Talis Aspire (Dave Err...
 
Roadmapping enterprise learning: Drivers, trends and potential game changers
Roadmapping enterprise learning: Drivers, trends and potential game changersRoadmapping enterprise learning: Drivers, trends and potential game changers
Roadmapping enterprise learning: Drivers, trends and potential game changers
 
KM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and InnovationKM2.0: Knowledge, Creativity and Innovation
KM2.0: Knowledge, Creativity and Innovation
 
ODDT beta-user guide-v0.9.12amc
ODDT beta-user guide-v0.9.12amcODDT beta-user guide-v0.9.12amc
ODDT beta-user guide-v0.9.12amc
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
Social Media Start with the Listening
Social Media Start with the ListeningSocial Media Start with the Listening
Social Media Start with the Listening
 
Future Of Data Nlb (18 Dec 09 Blog)
Future Of Data Nlb (18 Dec 09 Blog)Future Of Data Nlb (18 Dec 09 Blog)
Future Of Data Nlb (18 Dec 09 Blog)
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Esg research 2010 data protection trends
Esg research 2010 data protection trendsEsg research 2010 data protection trends
Esg research 2010 data protection trends
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
Atlantic Conversations 4
Atlantic Conversations 4Atlantic Conversations 4
Atlantic Conversations 4
 
Elections 2.0 2of2 (federal)
Elections 2.0   2of2 (federal)Elections 2.0   2of2 (federal)
Elections 2.0 2of2 (federal)
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 

More from Piet J.H. Daas

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their usePiet J.H. Daas
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsPiet J.H. Daas
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesPiet J.H. Daas
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statisticsPiet J.H. Daas
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasPiet J.H. Daas
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsPiet J.H. Daas
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSPiet J.H. Daas
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45Piet J.H. Daas
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation MannheimPiet J.H. Daas
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media dataPiet J.H. Daas
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daasPiet J.H. Daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekPiet J.H. Daas
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityPiet J.H. Daas
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statisticsPiet J.H. Daas
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big DataPiet J.H. Daas
 

More from Piet J.H. Daas (20)

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

New Data Sources for Statistics, Social media: Twitter.

  • 1. New Data Sources for Statistics: Experiences at Statistics Netherlands Social media: Twitter Piet Daas, Marko Roos, Mark van de Ven and Joyce Neroni Statistics Netherlands AAPOR 2012
  • 2. Why are we interested in data sources, such as Twitter? • All National Statistical Institutes use: • Survey data • Sometimes also Administrative data • But there are other sources of information out there (in increasing numbers: BIG Data) • Can they be used for statistics? • Burden and cost reduction • Try it! • Innovative research is greatly stimulated AAPOR 2012: Twitter as a potential data source for statistics 1
  • 3. Why study Twitter? Maps by Eric Fischer (via Fast Company) AAPOR 2012: Twitter as a potential data source for statistics 2
  • 4. About Twitter • Twitter is used intensively in the Netherlands • Relatively easily accessible (text)data • Potential source of personal information, opinions, and sentiments • But what kind of information is actually discussed? 1) Identify the topics discussed in the Netherlands • In public tweets only 2) Is this information useful? AAPOR 2012: Twitter as a potential data source for statistics 3
  • 5. Start with collecting data • How? • Tried several ways • Best option was to: 1) Collect usernames 2) Identify ‘Dutch’ users 3) Collect tweets from Dutch users 4) Identify topics in those tweets AAPOR 2012: Twitter as a potential data source for statistics 4
  • 6. 1) Collect usernames • Breadth first algorithm / snowball sampling • Started with a user with many followers • A famous Dutch politician with 79,798 followers • Collect the followers of her followers etc. • By Twitter REST API, 12 user accounts and PHP-scripts • After 4 weeks we obtained • 4,413,391 unique users (id’s) • Collected user id, username, location and profile information AAPOR 2012: Twitter as a potential data source for statistics 5
  • 7. 2) Identify ‘Dutch’ users • By using location information provided • A considerable number of users do this • Checked the location names provided • Inclusion and exclusion list • A total of 380,415 (~9%) users were identified as located in the Netherlands • 38% of the users, 1,661,467, provided no location info AAPOR 2012: Twitter as a potential data source for statistics 6
  • 8. 3) Collect tweets • For the 380,415 users the 200 most recent tweets were collected • A total of 12,093,065 messages was obtained • 39% of the users had no ‘tweets’ • Some characteristics AAPOR 2012: Twitter as a potential data source for statistics 7
  • 9. 4) Identify topics • Used 2 approaches 1) Hashtags (1,750,074 with 1 hash, 14.5%) • Hashsign (#) identifies ‘keyword’ • E.g. #ned, #fail, #wk2010 • Manual and text-mining approach 2) Non-hashtags (10,330,613 in total, 85.4%) • Manual (sample) • Text-mining approach failed here • Result of the large ‘Other’ group AAPOR 2012: Twitter as a potential data source for statistics 8
  • 10. Topic identification: Hashtags Economy Hashtags Education Non-hashtags Environment Total Events Health Holiday ICT Living Media Politics (20%) Relations Themes Security Spare time (9%) Sports (13%) Transport Weather Work Other (18%) 0 10 20 30 40 50 Contribution (%) AAPOR 2012: Twitter as a potential data source for statistics 9
  • 11. Topic identification: Non-hashtags* Economy Hashtags Education Non-hashtags Environment Total Events Health Holiday ICT Living Media Politics Relations Themes Security Spare time (10%) Sports (6%) Transport Weather Work Other (51%) 0 10 20 30 40 50 Contribution (%) * A random sample AAPOR 2012: Twitter as a potential data source for statistics 10
  • 12. Topic identification: Combined Economy Hashtags Education Non-hashtags Environment Total Events (1%) Health Holiday ICT Living Media (7%) Politics (3%) Relations Themes Security (10%) Spare time Sports (7%) Transport Weather Work (5%) (46%) Other 0 10 20 30 40 50 Contribution (%) AAPOR 2012: Twitter as a potential data source for statistics 11
  • 13. Conclusions • Is Twitter of potential interest for statistics? • Yes • What are the interesting topics for us? • Work (5%), politics (3%), spare time (10%) and events (1%) • Can the data be used ‘as is’? • No - ‘Low information content’ - Representativity of users AAPOR 2012: Twitter as a potential data source for statistics 12
  • 14. Conclusions (2) • Representativity of the data is a serious issue • Clear that only a subset of the (Dutch) population is observed • Not everybody in the Netherlands is active on Twitter • Hardly any background information available • Although some users provide very interesting details in their user profile • Work around? • (Only) use twitter to get quick info (a trend) on a specific topic AAPOR 2012: Twitter as a potential data source for statistics 13
  • 15. Future work • Continue to study Social media! • But: 1) No longer collect data ourselves ( ) 2) In future studies focus on: • Mine sentiment towards specific topics • E.g. Economy, Consumer sentiment, but also statistics and Statistics Netherlands survey’s • Background info of users AAPOR 2012: Twitter as a potential data source for statistics 14
  • 16. Thank you for your attention! • #Questions? Contact or follow me at: @pietdaas AAPOR 2012: Twitter as a potential data source for statistics 15