SlideShare uma empresa Scribd logo
1 de 25
Big Data as a
data source for
official statistics

Piet Daas, Marco Puts, Bart Buelens and Paul van den Hurk
Statistics Netherlands


                            Big Data Target Conference, April 4, Groningen
Overview

• Data sources and statistics
     • More & more data becomes available
     • Effect on statistics production
• How we study Big Data: 2 examples
     • Traffic loop detection data
     • Social media messages




Big Data Target Conference, April 4, Groningen   1
Introduction




  “Statistics Netherlands has produced
  about 5000 official publications and
  tables in 2012”
            For this we need DATA




Big Data Target Conference, April 4, Groningen   2
Data sources for official statistics




         Primary data                              Secondary data



                                                  Data from ‘others’
       Our own surveys                             - Administrative sources
                                                   - ‘New’ data sources

 Big Data Target Conference, April 4, Groningen                               3
Statistics Netherlands law

• “Statistics Netherlands aims to reduce the
  administrative burden for companies and the
  public as much as possible”
  • By (re-)using existing administrative registrations of both
    government and government-funded organizations.
  • And study potential new sources of information




 Big Data Target Conference, April 4, Groningen               3
• Data, data everywhere!




  X

Big Data Target Conference, April 4, Groningen   4
Statistics Netherlands and Data
•    Data is generated in increasing amounts and at increasing frequencies:
    •      From ‘Data scarcity’ (sample survey) to ‘Data abundance’ (administrative
           & Big)
           •   Ever increasing amounts of data need to be checked, processed and
               analyzed
           •   More sources of information become available
           •   Opportunities to produce statistics faster (‘real-time statistics’)
    •      Need for new methods and tools
           1. Methods to quickly uncover information from massive amounts of data
              available, such as visualisation methods and data-, text- and stream-
              mining techniques (‘making Big Data small’), High Performance Comp.
           2. Methods capable of integrating the information in the statistical process,
              e.g. linking at massive scale, macro/meso-integration, estimation methods
              suited for large datasets


        Big Data Target Conference, April 4, Groningen                               5
2 Big Data case studies

Research findings on the study of Big Data sources
from a statistics point of view

     1. Traffic loop detection data
               80 million records/day, studied 90 days so far,
               number of vehicles detected each minute

     2. Dutch social media messages
               1~2 million public messages/day, studied up to 2 billion
               records, content and sentiment


Big Data Target Conference, April 4, Groningen                            6
1. Traffic loop detection data

• Traffic ‘loops’
   • Every minute (24/7) the number of passing
     vehicles is counted by >10,000 road sensors
     & camera’s in the Netherlands
      • Total vehicles and in different length classes

   • Interesting source to produce traffic and
     transport statistics (and more)
       • Huge amounts of data, about 100 million
         records a day
                                                         Locations


   Big Data Target Conference, April 4, Groningen                7
Number of detected vehicles on a single day




By all loops                                     Total = ~ 295 million

Big Data Target Conference, April 4, Groningen                           8
Traffic loop detection activity (only first 10 min.)




Big Data Target Conference, April 4, Groningen          9
Correct for missing data
 • ‘Corrected’ data (for blocks of 5 min)

            Before                                 After




                Total = ~ 295 million            Total = ~ 330 million (+ 12%)

Big Data Target Conference, April 4, Groningen                                   10
Total vehicles during the day (snapshots)




Big Data Target Conference, April 4, Groningen   12
For different vehicle lengths
      1 categorie              3 categoriën       5 categoriën

      Totaal                   Totaal             Totaal
                               <= 5.6m            > 1.85 & <= 2.4m
                               > 5.6 & <= 12.2m   > 2.4 & <= 5.6m
                               > 12.2m            > 5.6 & <= 11.5m
                                                  > 11.5 & <= 12.2m
                                                  > 12.2m


         Small vehicles <= 5.6 m
         Medium sized vehicles > 5.6 m & <= 12.2 m
         Large vehicles > 12.2 m



Big Data Target Conference, April 4, Groningen                        13
Small vehicles




                                                 ~75% of total

Big Data Target Conference, April 4, Groningen            14
Small & medium vehicles




Big Data Target Conference, April 4, Groningen   15
Small, medium & large vehicles




Big Data Target Conference, April 4, Groningen   16
Volatile behaviour at the micro-level




Big Data Target Conference, April 4, Groningen   17
2. Social media messages

• Dutch are very active on social media platforms
     • Bijna altijd bij zich en staat vrijwel altijd aan
          • Steeds meer mensen hebben een smartphone!

• Mogelijke informatiebron voor:
     • Welke onderwerpen zijn actueel:
          • Aantal berichten en sentiment hierover


     • Als meetinstrument te gebruiken voor:
          • .
                                                     Map by Eric Fischer (via Fast Company)



Big Data Target Conference, April 4, Groningen                                                18
2. Social media messages
  • Dutch are very active on social media platforms
    • Potential information source for:
            • Topics discussed and sentiment over these topics (quickly
              available!) and probably more?
            • Investigate it to obtain an answer on potential use


2a. Content:
    - Collected Dutch Twitter messages for study: ‘selection’ of 12 million

2b. Sentiment
    - Sentiment in Dutch social media messages: ‘all’ ~2 billion



 Big Data Target Conference, April 4, Groningen                          19
Social media: Dutch Twitter topics

               (3%)




                    (7%)
                (3%)


                         (10%)
                      (7%)
               (3%)
                  (5%)
                                                       (46%)


                                                 12 million messages

Big Data Target Conference, April 4, Groningen                    20
Sentiment in Social media
• Access to Coosto database
  • > 2 billion publicly available messages
          • Twitter, Facebook, Hyves, Webfora, Blogs etc.
     • Sentiment of each message
          • Positive, negative or neutral
     • Interesting finding
          • Determine so-called ‘Mood of the nation’ compared
            to Consumer confidence of Statistics Netherlands



Big Data Target Conference, April 4, Groningen                  21
Consumer confidence, survey data

                                          Sentiment towards the economic climate

              (pos – neg) as % of total




                                                                        ~1000 respondents/month

  Big Data Target Conference, April 4, Groningen                                            22
Final remarks: Big Data and statistics
 •   Preparing Big data for statistics is time consuming
      • Exploration phase takes a lot of time
      • Try to reduce amount of data without losing information (‘making big data
        small’, noise reduction)
      • Risk: ‘garbage in’     ‘garbage statistics out’
 •   Traditional approach does not suffice
      • Big data sources are definitely not ‘large’ sample surveys or admin data
      • Often a selective but a large part of the ‘population’ is included
      • Events are registered, not units!
      • Careful with using ‘traditional’ statistical analysis (everything is significant!)
 •   More need for:
      • Visualisation methods (to rapidly gain insight)
      • Methods & models specific for large dataset (fast and ‘robust’)
      • Learn from ‘computational statistics’ & (try to) use dedicated hardware
      • Beware of privacy issues!



     Big Data Target Conference, April 4, Groningen                                      27
Big Data Target Conference, April 4, Groningen   The future of Stat Neth?

Mais conteúdo relacionado

Mais procurados

2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...e-ROSA
 
#opendata Back to the future
#opendata Back to the future#opendata Back to the future
#opendata Back to the futureSlim Turki, Dr.
 
Open Data Engagement - Using Open Data w3c Workshop
Open Data Engagement - Using Open Data w3c Workshop Open Data Engagement - Using Open Data w3c Workshop
Open Data Engagement - Using Open Data w3c Workshop Tim Davies
 
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013AmbasciatadelCanada
 
networks inparliament-ccct
 networks inparliament-ccct networks inparliament-ccct
networks inparliament-ccctmaartenmarx
 
Data sharing for development: a case of Infrastructural development in Uganda...
Data sharing for development: a case of Infrastructural development in Uganda...Data sharing for development: a case of Infrastructural development in Uganda...
Data sharing for development: a case of Infrastructural development in Uganda...African Open Science Platform
 
Digital preservation through Digital Sustainability
Digital preservation through Digital SustainabilityDigital preservation through Digital Sustainability
Digital preservation through Digital SustainabilityMatthias Stürmer
 
SK INSPIRE Data sharing
SK INSPIRE Data sharingSK INSPIRE Data sharing
SK INSPIRE Data sharingMartin Tuchyna
 

Mais procurados (10)

2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
 
#opendata Back to the future
#opendata Back to the future#opendata Back to the future
#opendata Back to the future
 
Open Data Engagement - Using Open Data w3c Workshop
Open Data Engagement - Using Open Data w3c Workshop Open Data Engagement - Using Open Data w3c Workshop
Open Data Engagement - Using Open Data w3c Workshop
 
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013
Domenico Donvito - Istat - Open Data in Official Statistics - 10 July 2013
 
networks inparliament-ccct
 networks inparliament-ccct networks inparliament-ccct
networks inparliament-ccct
 
Data sharing for development: a case of Infrastructural development in Uganda...
Data sharing for development: a case of Infrastructural development in Uganda...Data sharing for development: a case of Infrastructural development in Uganda...
Data sharing for development: a case of Infrastructural development in Uganda...
 
Digital preservation through Digital Sustainability
Digital preservation through Digital SustainabilityDigital preservation through Digital Sustainability
Digital preservation through Digital Sustainability
 
Open Data in a Day - Introduction to Open Data
Open Data in a Day - Introduction to Open DataOpen Data in a Day - Introduction to Open Data
Open Data in a Day - Introduction to Open Data
 
Case Studies: Burkina Open Data Initiative/Malick Tapsoba
Case Studies: Burkina Open Data Initiative/Malick TapsobaCase Studies: Burkina Open Data Initiative/Malick Tapsoba
Case Studies: Burkina Open Data Initiative/Malick Tapsoba
 
SK INSPIRE Data sharing
SK INSPIRE Data sharingSK INSPIRE Data sharing
SK INSPIRE Data sharing
 

Destaque

Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdataLex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdataAlmereDataCapital
 
Необычные СПА процедуры мира
Необычные СПА процедуры мираНеобычные СПА процедуры мира
Необычные СПА процедуры мираАйназ Волкова
 
Delitos Contra la Administración pública
Delitos Contra la Administración públicaDelitos Contra la Administración pública
Delitos Contra la Administración públicaJhon Abad Robles
 
Qué y a dónde más parte 1 de 3
Qué y a dónde más parte 1 de 3Qué y a dónde más parte 1 de 3
Qué y a dónde más parte 1 de 3gotsis
 
Revolucioindustrial
RevolucioindustrialRevolucioindustrial
Revolucioindustrialfinamorenoo
 
Chapter7 International Finance Management
Chapter7 International Finance ManagementChapter7 International Finance Management
Chapter7 International Finance ManagementPiyush Gaur
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMoshe Kaplan
 
Using spider for sharding in production
Using spider for sharding in productionUsing spider for sharding in production
Using spider for sharding in productionKentoku
 
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫Insight Technology, Inc.
 

Destaque (13)

Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdataLex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
 
Необычные СПА процедуры мира
Необычные СПА процедуры мираНеобычные СПА процедуры мира
Необычные СПА процедуры мира
 
Relaciones laborales en Salud Publica
Relaciones laborales en Salud Publica Relaciones laborales en Salud Publica
Relaciones laborales en Salud Publica
 
Delitos Contra la Administración pública
Delitos Contra la Administración públicaDelitos Contra la Administración pública
Delitos Contra la Administración pública
 
October 2016 classes
October 2016 classesOctober 2016 classes
October 2016 classes
 
Qué y a dónde más parte 1 de 3
Qué y a dónde más parte 1 de 3Qué y a dónde más parte 1 de 3
Qué y a dónde más parte 1 de 3
 
Revolucioindustrial
RevolucioindustrialRevolucioindustrial
Revolucioindustrial
 
Chapter7 International Finance Management
Chapter7 International Finance ManagementChapter7 International Finance Management
Chapter7 International Finance Management
 
Yellow Fever: Risk Mapping
Yellow Fever: Risk MappingYellow Fever: Risk Mapping
Yellow Fever: Risk Mapping
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Using spider for sharding in production
Using spider for sharding in productionUsing spider for sharding in production
Using spider for sharding in production
 
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫
20160929_InnoDBの全文検索を使ってみた by 株式会社インサイトテクノロジー 中村範夫
 

Semelhante a Piet daas big_data_official_statistics_target_groningen

Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statisticsEdwin de Jonge
 
OSFair2017 Workshop | OpenDataMonitor
OSFair2017 Workshop | OpenDataMonitorOSFair2017 Workshop | OpenDataMonitor
OSFair2017 Workshop | OpenDataMonitorOpen Science Fair
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...CambridgeshireInsight
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
Developing a Data Management Plan
Developing a Data Management PlanDeveloping a Data Management Plan
Developing a Data Management PlanMartin Donnelly
 
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014GSDI Association
 
Aligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data CommunityAligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data CommunityAdegboyega Ojo
 
Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015Research Data Alliance
 
Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance
 
Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015Research Data Alliance
 
Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015Research Data Alliance
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015Research Data Alliance
 
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 WorkshopSC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 WorkshopBigData_Europe
 
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...BOBCATSSS 2017
 
Open Government Data for Transparency & Innovation
Open Government Data for Transparency & InnovationOpen Government Data for Transparency & Innovation
Open Government Data for Transparency & InnovationData Portal India
 

Semelhante a Piet daas big_data_official_statistics_target_groningen (20)

Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
OSFair2017 Workshop | OpenDataMonitor
OSFair2017 Workshop | OpenDataMonitorOSFair2017 Workshop | OpenDataMonitor
OSFair2017 Workshop | OpenDataMonitor
 
Open data: Where do we go from here
Open data: Where do we go from hereOpen data: Where do we go from here
Open data: Where do we go from here
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Developing a Data Management Plan
Developing a Data Management PlanDeveloping a Data Management Plan
Developing a Data Management Plan
 
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014
R A Longhorn Presentation at Taiwan Open Data Forum, Taipei, 9 July 2014
 
Aligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data CommunityAligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data Community
 
Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015
 
Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015
 
Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015
 
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 WorkshopSC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
 
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...
Jovana Pistek and Christian van der Kooi - Open government data workshop - BO...
 
Open Government Data for Transparency & Innovation
Open Government Data for Transparency & InnovationOpen Government Data for Transparency & Innovation
Open Government Data for Transparency & Innovation
 
#FIWAREPamplona Aporta IODC16 Open Data
#FIWAREPamplona Aporta IODC16 Open Data#FIWAREPamplona Aporta IODC16 Open Data
#FIWAREPamplona Aporta IODC16 Open Data
 

Mais de Piet J.H. Daas

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their usePiet J.H. Daas
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsPiet J.H. Daas
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesPiet J.H. Daas
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statisticsPiet J.H. Daas
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasPiet J.H. Daas
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsPiet J.H. Daas
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSPiet J.H. Daas
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45Piet J.H. Daas
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation MannheimPiet J.H. Daas
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media dataPiet J.H. Daas
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daasPiet J.H. Daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekPiet J.H. Daas
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityPiet J.H. Daas
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statisticsPiet J.H. Daas
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big DataPiet J.H. Daas
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidencePiet J.H. Daas
 

Mais de Piet J.H. Daas (20)

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
 
Big data @ CBS
Big data @ CBSBig data @ CBS
Big data @ CBS
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Piet daas big_data_official_statistics_target_groningen

  • 1. Big Data as a data source for official statistics Piet Daas, Marco Puts, Bart Buelens and Paul van den Hurk Statistics Netherlands Big Data Target Conference, April 4, Groningen
  • 2. Overview • Data sources and statistics • More & more data becomes available • Effect on statistics production • How we study Big Data: 2 examples • Traffic loop detection data • Social media messages Big Data Target Conference, April 4, Groningen 1
  • 3. Introduction “Statistics Netherlands has produced about 5000 official publications and tables in 2012” For this we need DATA Big Data Target Conference, April 4, Groningen 2
  • 4. Data sources for official statistics Primary data Secondary data Data from ‘others’ Our own surveys - Administrative sources - ‘New’ data sources Big Data Target Conference, April 4, Groningen 3
  • 5. Statistics Netherlands law • “Statistics Netherlands aims to reduce the administrative burden for companies and the public as much as possible” • By (re-)using existing administrative registrations of both government and government-funded organizations. • And study potential new sources of information Big Data Target Conference, April 4, Groningen 3
  • 6. • Data, data everywhere! X Big Data Target Conference, April 4, Groningen 4
  • 7. Statistics Netherlands and Data • Data is generated in increasing amounts and at increasing frequencies: • From ‘Data scarcity’ (sample survey) to ‘Data abundance’ (administrative & Big) • Ever increasing amounts of data need to be checked, processed and analyzed • More sources of information become available • Opportunities to produce statistics faster (‘real-time statistics’) • Need for new methods and tools 1. Methods to quickly uncover information from massive amounts of data available, such as visualisation methods and data-, text- and stream- mining techniques (‘making Big Data small’), High Performance Comp. 2. Methods capable of integrating the information in the statistical process, e.g. linking at massive scale, macro/meso-integration, estimation methods suited for large datasets Big Data Target Conference, April 4, Groningen 5
  • 8. 2 Big Data case studies Research findings on the study of Big Data sources from a statistics point of view 1. Traffic loop detection data 80 million records/day, studied 90 days so far, number of vehicles detected each minute 2. Dutch social media messages 1~2 million public messages/day, studied up to 2 billion records, content and sentiment Big Data Target Conference, April 4, Groningen 6
  • 9. 1. Traffic loop detection data • Traffic ‘loops’ • Every minute (24/7) the number of passing vehicles is counted by >10,000 road sensors & camera’s in the Netherlands • Total vehicles and in different length classes • Interesting source to produce traffic and transport statistics (and more) • Huge amounts of data, about 100 million records a day Locations Big Data Target Conference, April 4, Groningen 7
  • 10. Number of detected vehicles on a single day By all loops Total = ~ 295 million Big Data Target Conference, April 4, Groningen 8
  • 11. Traffic loop detection activity (only first 10 min.) Big Data Target Conference, April 4, Groningen 9
  • 12. Correct for missing data • ‘Corrected’ data (for blocks of 5 min) Before After Total = ~ 295 million Total = ~ 330 million (+ 12%) Big Data Target Conference, April 4, Groningen 10
  • 13. Total vehicles during the day (snapshots) Big Data Target Conference, April 4, Groningen 12
  • 14. For different vehicle lengths 1 categorie 3 categoriën 5 categoriën Totaal Totaal Totaal <= 5.6m > 1.85 & <= 2.4m > 5.6 & <= 12.2m > 2.4 & <= 5.6m > 12.2m > 5.6 & <= 11.5m > 11.5 & <= 12.2m > 12.2m Small vehicles <= 5.6 m Medium sized vehicles > 5.6 m & <= 12.2 m Large vehicles > 12.2 m Big Data Target Conference, April 4, Groningen 13
  • 15. Small vehicles ~75% of total Big Data Target Conference, April 4, Groningen 14
  • 16. Small & medium vehicles Big Data Target Conference, April 4, Groningen 15
  • 17. Small, medium & large vehicles Big Data Target Conference, April 4, Groningen 16
  • 18. Volatile behaviour at the micro-level Big Data Target Conference, April 4, Groningen 17
  • 19. 2. Social media messages • Dutch are very active on social media platforms • Bijna altijd bij zich en staat vrijwel altijd aan • Steeds meer mensen hebben een smartphone! • Mogelijke informatiebron voor: • Welke onderwerpen zijn actueel: • Aantal berichten en sentiment hierover • Als meetinstrument te gebruiken voor: • . Map by Eric Fischer (via Fast Company) Big Data Target Conference, April 4, Groningen 18
  • 20. 2. Social media messages • Dutch are very active on social media platforms • Potential information source for: • Topics discussed and sentiment over these topics (quickly available!) and probably more? • Investigate it to obtain an answer on potential use 2a. Content: - Collected Dutch Twitter messages for study: ‘selection’ of 12 million 2b. Sentiment - Sentiment in Dutch social media messages: ‘all’ ~2 billion Big Data Target Conference, April 4, Groningen 19
  • 21. Social media: Dutch Twitter topics (3%) (7%) (3%) (10%) (7%) (3%) (5%) (46%) 12 million messages Big Data Target Conference, April 4, Groningen 20
  • 22. Sentiment in Social media • Access to Coosto database • > 2 billion publicly available messages • Twitter, Facebook, Hyves, Webfora, Blogs etc. • Sentiment of each message • Positive, negative or neutral • Interesting finding • Determine so-called ‘Mood of the nation’ compared to Consumer confidence of Statistics Netherlands Big Data Target Conference, April 4, Groningen 21
  • 23. Consumer confidence, survey data Sentiment towards the economic climate (pos – neg) as % of total ~1000 respondents/month Big Data Target Conference, April 4, Groningen 22
  • 24. Final remarks: Big Data and statistics • Preparing Big data for statistics is time consuming • Exploration phase takes a lot of time • Try to reduce amount of data without losing information (‘making big data small’, noise reduction) • Risk: ‘garbage in’ ‘garbage statistics out’ • Traditional approach does not suffice • Big data sources are definitely not ‘large’ sample surveys or admin data • Often a selective but a large part of the ‘population’ is included • Events are registered, not units! • Careful with using ‘traditional’ statistical analysis (everything is significant!) • More need for: • Visualisation methods (to rapidly gain insight) • Methods & models specific for large dataset (fast and ‘robust’) • Learn from ‘computational statistics’ & (try to) use dedicated hardware • Beware of privacy issues! Big Data Target Conference, April 4, Groningen 27
  • 25. Big Data Target Conference, April 4, Groningen The future of Stat Neth?