SlideShare uma empresa Scribd logo
1 de 22
Incidental data
for serious social research
Daniel Oberski
Utrecht Applied Data Science
Dept Methodology & Statistics
http://daob.nl
https://uu.nl/ads
• Incidental data are used throughout business and government
• What about social science?
1. Done - 2. To do - 3. Conclusion
1. Some of the applied work done so far
Incomplete timeline key applied papers
Some names: Pentland, Lazer, Ginsberg, Kosinski, Nguyen,
Daas, O’Connor, Tumasjan, Preoţiuc-Pietro, Mellon, …
Done (individual-level!):
Facebook, Twitter:
• Political orientation, Personality, Age, Sex, Education, Job title,
Income, Well-being, Depression, Multilingualism, Dialect,
Sexual orientation, Ethnicity, Weak network ties…
Phone sensors
• GPS: Movement type, Activity, Depression, Health,
Employment
• Bluetooth + cell tower: Friendship networks
• Accelerometer + Microphone: Activity
• …
Pirates win German elections!
… at least, on Twitter
Jungherr et al. (2012). Why the Pirate Party won the German Election
of
2009. Soc Sci Comp Rev.
Gayo-Avello (2012). I tried to predict elections from Twitter and all
I got was this lousy paper.
What kind of things are people doing right now?
Blandfort et al. (23 Jul 2018). Multimodal Social Media Analysis
for Gang Violence Prevention. ArXiV:1807.08465v1.
“High af”
“Shyt Dnt always happen how u plan it”
“Goodmorning cold ass world”
“Rip lil B”
Image+Text -> Aggression/Loss/Substance use/Other
2. What still needs to be done?
“The (implicit) hope is that analyses of
social media content might be substituted for costly
and burdensome survey responses.
Current evidence suggests we are far from that…”
Conrad (2015)
Problems with incidental data:
methodological
Selectivity Reliability
Source:Mellon&Prosser(2017)
Comparability:
Problems with incidental data:
ID-specific
API changes
Reproducibility
Daniel’s delightful
data science dictionary
A special service for savvy social scientists
Data science term Social science term
Learning Estimating a model
Supervised learning Predicting stuff
Unsupervised learning Latent variable modeling
Example / instance Case
Feature (Independent) variable
Target Dependent variable
Loss * log-likelihood
Gaussian Bayesian
network
Structural equation model
Classifier Model for categorical DV
Regression Model for continuous DV
Softmax Multinomial regression
Error Prediction error
Variance * Prediction sampling error
Bias * “Average prediction error”
Social science term Data science term
Criterion variable ~ Ground truth
Capitalization on chance,
p-hacking, HARKing, etc.
Overfitting
Reliability ?
Internal validity ?
External validity ?
(-> generalization error)
Measurement invariance ~ Concept drift
(-> transfer learning;)
Measurement error Noise
Measurement error model
(correction)
Noise-aware machine
learning
Measurement error model
(estimation)
Inverse model
~Deviance; Chi-square
(exponential of)
Perplexity
? Grand challenge
Legend: *: Usually. ~: Not really the same, but close enough. ->: Relates to. ?: Work to do!
Essential tools for methodologists
• Cross-validation and its relationship to generalizability
Train/validation/test paradigm
“Overfitting” theory
• Penalized estimation
L1 LASSO; L2 ridge; horseshoe; …
• Standard data science prediction workflow
Solving key social science challenges?
Grand challenge approach (thanks to Adrienne Mendrik, NL eScience center)
Multimodal learning (“data fusion”; see work Katrijn van Deun, Tilburg University)
Privacy-aware ML (differential privacy, federated learning; see Cynthia Dwork,
Microsoft)
Resources > Books > Beginners
Resources > Books > Advanced
Summary
• Incidental data haven’t revolutionized our field yet;
• Probably because we need to work the methodology first;
• Although scores of authors have come to the same conclusion,,
most of the work remains to be done;
You are the ideal person to do this work.
Thank you for your attention!
E: d.l.oberski@uu.nl
T: @DanielOberski
W: http://daob.nl
W: https://uu.nl/ads

Mais conteúdo relacionado

Mais procurados

Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña
 
W2-Unit4-advanced-searchterms-813-230pm
W2-Unit4-advanced-searchterms-813-230pmW2-Unit4-advanced-searchterms-813-230pm
W2-Unit4-advanced-searchterms-813-230pmJill McKeon
 
Using Big Data to Improve Official Economic Statistics - Discussion
Using Big Data to Improve Official Economic Statistics - DiscussionUsing Big Data to Improve Official Economic Statistics - Discussion
Using Big Data to Improve Official Economic Statistics - DiscussionFrauke Kreuter
 
IT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationIT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationBabakFarshchian
 
C:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through ConversationC:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through Conversationstacycj
 
Survey Research (SOC2029). Seminar 7: ethics in survey research
Survey Research (SOC2029). Seminar 7: ethics in survey researchSurvey Research (SOC2029). Seminar 7: ethics in survey research
Survey Research (SOC2029). Seminar 7: ethics in survey researchDavid Rozas
 
Overview of investigation
Overview of investigationOverview of investigation
Overview of investigationISM
 
Berlin 6 Open Access Conference: Jelena Kovacevic
Berlin 6 Open Access Conference: Jelena KovacevicBerlin 6 Open Access Conference: Jelena Kovacevic
Berlin 6 Open Access Conference: Jelena KovacevicCornelius Puschmann
 
IT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureIT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureBabakFarshchian
 
Thesis review Presentation
Thesis review PresentationThesis review Presentation
Thesis review PresentationAndrew Harvey
 
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...Lieven J.R. Pauwels
 
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...Bob O'Hara
 
Who creates trends in online social media
Who creates trends in online social mediaWho creates trends in online social media
Who creates trends in online social mediaAmir Razmjou
 
Clare llewellyn Lasiuk July 5th 2013
Clare llewellyn Lasiuk July 5th 2013Clare llewellyn Lasiuk July 5th 2013
Clare llewellyn Lasiuk July 5th 2013Clare Llewellyn
 

Mais procurados (19)

Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
W2-Unit4-advanced-searchterms-813-230pm
W2-Unit4-advanced-searchterms-813-230pmW2-Unit4-advanced-searchterms-813-230pm
W2-Unit4-advanced-searchterms-813-230pm
 
Literature review
Literature reviewLiterature review
Literature review
 
Using Big Data to Improve Official Economic Statistics - Discussion
Using Big Data to Improve Official Economic Statistics - DiscussionUsing Big Data to Improve Official Economic Statistics - Discussion
Using Big Data to Improve Official Economic Statistics - Discussion
 
IT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationIT3010 Lecture Design and Creation
IT3010 Lecture Design and Creation
 
C:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through ConversationC:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through Conversation
 
Survey Research (SOC2029). Seminar 7: ethics in survey research
Survey Research (SOC2029). Seminar 7: ethics in survey researchSurvey Research (SOC2029). Seminar 7: ethics in survey research
Survey Research (SOC2029). Seminar 7: ethics in survey research
 
Overview of investigation
Overview of investigationOverview of investigation
Overview of investigation
 
Berlin 6 Open Access Conference: Jelena Kovacevic
Berlin 6 Open Access Conference: Jelena KovacevicBerlin 6 Open Access Conference: Jelena Kovacevic
Berlin 6 Open Access Conference: Jelena Kovacevic
 
Aslin.discussion
Aslin.discussionAslin.discussion
Aslin.discussion
 
IT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureIT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literature
 
Icse 2020 bof reviewing papers
Icse 2020 bof reviewing papersIcse 2020 bof reviewing papers
Icse 2020 bof reviewing papers
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
 
Thesis review Presentation
Thesis review PresentationThesis review Presentation
Thesis review Presentation
 
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...
Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...
 
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
 
Who creates trends in online social media
Who creates trends in online social mediaWho creates trends in online social media
Who creates trends in online social media
 
Clare llewellyn Lasiuk July 5th 2013
Clare llewellyn Lasiuk July 5th 2013Clare llewellyn Lasiuk July 5th 2013
Clare llewellyn Lasiuk July 5th 2013
 

Semelhante a Oberski EAM 2018 - Incidental data for serious social research

Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Cagatay Turkay
 
Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?jeansoulin
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"Micah Altman
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveMicah Altman
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedmannatematias
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 
321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)Iin Angriyani
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data scienceNathaniel Shimoni
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
COM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data MiningCOM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data Miningbutest
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015Alan Dix
 
060 techniques of_data_analysis
060 techniques of_data_analysis060 techniques of_data_analysis
060 techniques of_data_analysisNouman Zia
 

Semelhante a Oberski EAM 2018 - Incidental data for serious social research (20)

Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?
 
Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
 
Glued Ecology
Glued EcologyGlued Ecology
Glued Ecology
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspective
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedman
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)321423152 e-0016087606-session39134-201012122352 (1)
321423152 e-0016087606-session39134-201012122352 (1)
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data science
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Final october interviewing_techniques
Final october interviewing_techniquesFinal october interviewing_techniques
Final october interviewing_techniques
 
COM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data MiningCOM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data Mining
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015
 
060 techniques of_data_analysis
060 techniques of_data_analysis060 techniques of_data_analysis
060 techniques of_data_analysis
 

Mais de Daniel Oberski

Differential Privacy and social science
Differential Privacy and social scienceDifferential Privacy and social science
Differential Privacy and social scienceDaniel Oberski
 
ESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchDaniel Oberski
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable modelsDaniel Oberski
 
lavaan.survey: An R package for complex survey analysis of structural equatio...
lavaan.survey: An R package for complex survey analysis of structural equatio...lavaan.survey: An R package for complex survey analysis of structural equatio...
lavaan.survey: An R package for complex survey analysis of structural equatio...Daniel Oberski
 
How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?Daniel Oberski
 
Multidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM modelMultidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM modelDaniel Oberski
 
Predicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQPPredicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQPDaniel Oberski
 
Predicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristicsPredicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristicsDaniel Oberski
 
Detecting local dependence in latent class models
Detecting local dependence in latent class modelsDetecting local dependence in latent class models
Detecting local dependence in latent class modelsDaniel Oberski
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisDaniel Oberski
 

Mais de Daniel Oberski (10)

Differential Privacy and social science
Differential Privacy and social scienceDifferential Privacy and social science
Differential Privacy and social science
 
ESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable models
 
lavaan.survey: An R package for complex survey analysis of structural equatio...
lavaan.survey: An R package for complex survey analysis of structural equatio...lavaan.survey: An R package for complex survey analysis of structural equatio...
lavaan.survey: An R package for complex survey analysis of structural equatio...
 
How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?
 
Multidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM modelMultidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM model
 
Predicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQPPredicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQP
 
Predicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristicsPredicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristics
 
Detecting local dependence in latent class models
Detecting local dependence in latent class modelsDetecting local dependence in latent class models
Detecting local dependence in latent class models
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 

Último

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Último (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Oberski EAM 2018 - Incidental data for serious social research

  • 1. Incidental data for serious social research Daniel Oberski Utrecht Applied Data Science Dept Methodology & Statistics http://daob.nl https://uu.nl/ads
  • 2. • Incidental data are used throughout business and government • What about social science? 1. Done - 2. To do - 3. Conclusion
  • 3. 1. Some of the applied work done so far
  • 4. Incomplete timeline key applied papers Some names: Pentland, Lazer, Ginsberg, Kosinski, Nguyen, Daas, O’Connor, Tumasjan, Preoţiuc-Pietro, Mellon, …
  • 5. Done (individual-level!): Facebook, Twitter: • Political orientation, Personality, Age, Sex, Education, Job title, Income, Well-being, Depression, Multilingualism, Dialect, Sexual orientation, Ethnicity, Weak network ties… Phone sensors • GPS: Movement type, Activity, Depression, Health, Employment • Bluetooth + cell tower: Friendship networks • Accelerometer + Microphone: Activity • …
  • 6.
  • 7. Pirates win German elections!
  • 8. … at least, on Twitter Jungherr et al. (2012). Why the Pirate Party won the German Election of 2009. Soc Sci Comp Rev. Gayo-Avello (2012). I tried to predict elections from Twitter and all I got was this lousy paper.
  • 9. What kind of things are people doing right now?
  • 10. Blandfort et al. (23 Jul 2018). Multimodal Social Media Analysis for Gang Violence Prevention. ArXiV:1807.08465v1. “High af” “Shyt Dnt always happen how u plan it” “Goodmorning cold ass world” “Rip lil B” Image+Text -> Aggression/Loss/Substance use/Other
  • 11. 2. What still needs to be done?
  • 12. “The (implicit) hope is that analyses of social media content might be substituted for costly and burdensome survey responses. Current evidence suggests we are far from that…” Conrad (2015)
  • 13. Problems with incidental data: methodological Selectivity Reliability Source:Mellon&Prosser(2017) Comparability:
  • 14. Problems with incidental data: ID-specific API changes Reproducibility
  • 15. Daniel’s delightful data science dictionary A special service for savvy social scientists
  • 16. Data science term Social science term Learning Estimating a model Supervised learning Predicting stuff Unsupervised learning Latent variable modeling Example / instance Case Feature (Independent) variable Target Dependent variable Loss * log-likelihood Gaussian Bayesian network Structural equation model Classifier Model for categorical DV Regression Model for continuous DV Softmax Multinomial regression Error Prediction error Variance * Prediction sampling error Bias * “Average prediction error” Social science term Data science term Criterion variable ~ Ground truth Capitalization on chance, p-hacking, HARKing, etc. Overfitting Reliability ? Internal validity ? External validity ? (-> generalization error) Measurement invariance ~ Concept drift (-> transfer learning;) Measurement error Noise Measurement error model (correction) Noise-aware machine learning Measurement error model (estimation) Inverse model ~Deviance; Chi-square (exponential of) Perplexity ? Grand challenge Legend: *: Usually. ~: Not really the same, but close enough. ->: Relates to. ?: Work to do!
  • 17. Essential tools for methodologists • Cross-validation and its relationship to generalizability Train/validation/test paradigm “Overfitting” theory • Penalized estimation L1 LASSO; L2 ridge; horseshoe; … • Standard data science prediction workflow
  • 18. Solving key social science challenges? Grand challenge approach (thanks to Adrienne Mendrik, NL eScience center) Multimodal learning (“data fusion”; see work Katrijn van Deun, Tilburg University) Privacy-aware ML (differential privacy, federated learning; see Cynthia Dwork, Microsoft)
  • 19. Resources > Books > Beginners
  • 20. Resources > Books > Advanced
  • 21. Summary • Incidental data haven’t revolutionized our field yet; • Probably because we need to work the methodology first; • Although scores of authors have come to the same conclusion,, most of the work remains to be done; You are the ideal person to do this work.
  • 22. Thank you for your attention! E: d.l.oberski@uu.nl T: @DanielOberski W: http://daob.nl W: https://uu.nl/ads