Oberski EAM 2018 - Incidental data for serious social research

•Transferir como PPTX, PDF•

1 gostou•768 visualizações

Incidental data from social media has been used in some individual-level social science research to predict attributes like political views, personality, and health indicators. However, the author notes issues with selectivity, reliability, and comparability that limit its use in serious social research. While some methodological work has been done applying machine learning techniques, significant challenges remain around generalizability, privacy, reproducibility, and integrating different data sources and modalities. The author argues more work is needed to solve key social science challenges through a "grand challenge" approach and techniques like cross-validation, penalized models, and multimodal learning.

Ciências

Incidental data
for serious social research
Daniel Oberski
Utrecht Applied Data Science
Dept Methodology & Statistics
http://daob.nl
https://uu.nl/ads

• Incidental data are used throughout business and government
• What about social science?
1. Done - 2. To do - 3. Conclusion

Incomplete timeline key applied papers
Some names: Pentland, Lazer, Ginsberg, Kosinski, Nguyen,
Daas, O’Connor, Tumasjan, Preoţiuc-Pietro, Mellon, …

Done (individual-level!):
Facebook, Twitter:
• Political orientation, Personality, Age, Sex, Education, Job title,
Income, Well-being, Depression, Multilingualism, Dialect,
Sexual orientation, Ethnicity, Weak network ties…
Phone sensors
• GPS: Movement type, Activity, Depression, Health,
Employment
• Bluetooth + cell tower: Friendship networks
• Accelerometer + Microphone: Activity
• …

… at least, on Twitter
Jungherr et al. (2012). Why the Pirate Party won the German Election
of
2009. Soc Sci Comp Rev.
Gayo-Avello (2012). I tried to predict elections from Twitter and all
I got was this lousy paper.

What kind of things are people doing right now?

Blandfort et al. (23 Jul 2018). Multimodal Social Media Analysis
for Gang Violence Prevention. ArXiV:1807.08465v1.
“High af”
“Shyt Dnt always happen how u plan it”
“Goodmorning cold ass world”
“Rip lil B”
Image+Text -> Aggression/Loss/Substance use/Other

“The (implicit) hope is that analyses of
social media content might be substituted for costly
and burdensome survey responses.
Current evidence suggests we are far from that…”
Conrad (2015)

Problems with incidental data:
methodological
Selectivity Reliability
Source:Mellon&Prosser(2017)
Comparability:

Problems with incidental data:
ID-specific
API changes
Reproducibility

Daniel’s delightful
data science dictionary
A special service for savvy social scientists

Data science term Social science term
Learning Estimating a model
Supervised learning Predicting stuff
Unsupervised learning Latent variable modeling
Example / instance Case
Feature (Independent) variable
Target Dependent variable
Loss * log-likelihood
Gaussian Bayesian
network
Structural equation model
Classifier Model for categorical DV
Regression Model for continuous DV
Softmax Multinomial regression
Error Prediction error
Variance * Prediction sampling error
Bias * “Average prediction error”
Social science term Data science term
Criterion variable ~ Ground truth
Capitalization on chance,
p-hacking, HARKing, etc.
Overfitting
Reliability ?
Internal validity ?
External validity ?
(-> generalization error)
Measurement invariance ~ Concept drift
(-> transfer learning;)
Measurement error Noise
Measurement error model
(correction)
Noise-aware machine
learning
Measurement error model
(estimation)
Inverse model
~Deviance; Chi-square
(exponential of)
Perplexity
? Grand challenge
Legend: *: Usually. ~: Not really the same, but close enough. ->: Relates to. ?: Work to do!

Essential tools for methodologists
• Cross-validation and its relationship to generalizability
Train/validation/test paradigm
“Overfitting” theory
• Penalized estimation
L1 LASSO; L2 ridge; horseshoe; …
• Standard data science prediction workflow

Solving key social science challenges?
Grand challenge approach (thanks to Adrienne Mendrik, NL eScience center)
Multimodal learning (“data fusion”; see work Katrijn van Deun, Tilburg University)
Privacy-aware ML (differential privacy, federated learning; see Cynthia Dwork,
Microsoft)

Summary
• Incidental data haven’t revolutionized our field yet;
• Probably because we need to work the methodology first;
• Although scores of authors have come to the same conclusion,,
most of the work remains to be done;
You are the ideal person to do this work.

Thank you for your attention!
E: d.l.oberski@uu.nl
T: @DanielOberski
W: http://daob.nl
W: https://uu.nl/ads

Mais conteúdo relacionado

Mais procurados

Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña

Pedersen acl2011-business-meetingUniversity of Minnesota, Duluth

W2-Unit4-advanced-searchterms-813-230pmJill McKeon

Literature reviewDarshit Kanziya

Using Big Data to Improve Official Economic Statistics - DiscussionFrauke Kreuter

IT3010 Lecture Design and CreationBabakFarshchian

$C:\Fakepath\Learning Through Conversation$ $C:\Fakepath\Learning Through Conversation$

C:\Fakepath\Learning Through Conversationstacycj

Survey Research (SOC2029). Seminar 7: ethics in survey researchDavid Rozas

Overview of investigationISM

Berlin 6 Open Access Conference: Jelena KovacevicCornelius Puschmann

Aslin.discussionJesse Lingeman

IT3010 Lecture on Reviewing the literatureBabakFarshchian

Icse 2020 bof reviewing papersMargaret-Anne Storey

User Centered Design of an Android appSatheesh Kumar Chandran

Thesis review PresentationAndrew Harvey

Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...Lieven J.R. Pauwels

Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...Bob O'Hara

Who creates trends in online social mediaAmir Razmjou

Clare llewellyn Lasiuk July 5th 2013Clare Llewellyn

Mais procurados (19)

Case Studies: When you can't or won't run an experiment (and still want to...

Pedersen acl2011-business-meeting

W2-Unit4-advanced-searchterms-813-230pm

Literature review

Using Big Data to Improve Official Economic Statistics - Discussion

IT3010 Lecture Design and Creation

$C:\Fakepath\Learning Through Conversation$ $C:\Fakepath\Learning Through Conversation$

C:\Fakepath\Learning Through Conversation

Survey Research (SOC2029). Seminar 7: ethics in survey research

Overview of investigation

Berlin 6 Open Access Conference: Jelena Kovacevic

Aslin.discussion

IT3010 Lecture on Reviewing the literature

Icse 2020 bof reviewing papers

User Centered Design of an Android app

Thesis review Presentation

Pauwels Schepers Eifler Choosing crime as alternative? Presentation ESC Confe...

Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...

Who creates trends in online social media

Clare llewellyn Lasiuk July 5th 2013

Semelhante a Oberski EAM 2018 - Incidental data for serious social research

Data Science: Origins, Methods, Challenges and the future?Cagatay Turkay

Does Data Quality lays in facts, or in acts?jeansoulin

"Reproducibility from the Informatics Perspective"Micah Altman

Glued EcologyBob O'Hara

Scientific Reproducibility from an Informatics PerspectiveMicah Altman

Reproducibility from an infomatics perspectiveMicah Altman

Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed

Research Methods 101, by Elliott Hedmannatematias

Data Science Master SpecialisationArjen de Vries

Meta-Analysis -- Introduction.pptxACSRM

321423152 e-0016087606-session39134-201012122352 (1)Iin Angriyani

Managing Confidential Information – Trends and ApproachesMicah Altman

Text analysis-semantic-searchDiana Maynard

Current and future challenges in data scienceNathaniel Shimoni

Why L-3 Data Tactics Data Science?Rich Heimann

Final october interviewing_techniquesKrishnamoorthy Ramakrishnan

COM 578 Empirical Methods in Machine Learning and Data Miningbutest

BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman

Data matters-bournemouth-2015Alan Dix

060 techniques of_data_analysisNouman Zia

Semelhante a Oberski EAM 2018 - Incidental data for serious social research (20)

Data Science: Origins, Methods, Challenges and the future?

Does Data Quality lays in facts, or in acts?

"Reproducibility from the Informatics Perspective"

Glued Ecology

Scientific Reproducibility from an Informatics Perspective

Reproducibility from an infomatics perspective

Data Analysis in Research: Descriptive Statistics & Normality

Research Methods 101, by Elliott Hedman

Data Science Master Specialisation

Meta-Analysis -- Introduction.pptx

321423152 e-0016087606-session39134-201012122352 (1)

Managing Confidential Information – Trends and Approaches

Text analysis-semantic-search

Current and future challenges in data science

Why L-3 Data Tactics Data Science?

Final october interviewing_techniques

COM 578 Empirical Methods in Machine Learning and Data Mining

BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES

Data matters-bournemouth-2015

060 techniques of_data_analysis

Mais de Daniel Oberski

Differential Privacy and social scienceDaniel Oberski

ESRA2015 course: Latent Class Analysis for Survey ResearchDaniel Oberski

Complex sampling in latent variable modelsDaniel Oberski

lavaan.survey: An R package for complex survey analysis of structural equatio...Daniel Oberski

How good are administrative register data and what can we do about it?Daniel Oberski

Multidirectional survey measurement errors: the latent class MTMM modelDaniel Oberski

Predicting the quality of a survey question from its design characteristics: SQPDaniel Oberski

Predicting the quality of a survey question from its design characteristicsDaniel Oberski

Detecting local dependence in latent class modelsDaniel Oberski

A measure to evaluate latent variable model fit by sensitivity analysisDaniel Oberski

Mais de Daniel Oberski (10)

Differential Privacy and social science

ESRA2015 course: Latent Class Analysis for Survey Research

Complex sampling in latent variable models

lavaan.survey: An R package for complex survey analysis of structural equatio...

How good are administrative register data and what can we do about it?

Multidirectional survey measurement errors: the latent class MTMM model

Predicting the quality of a survey question from its design characteristics: SQP

Predicting the quality of a survey question from its design characteristics

Detecting local dependence in latent class models

A measure to evaluate latent variable model fit by sensitivity analysis

Último

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

Natural Polymer Based NanomaterialsAArockiyaNisha

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Boyles law module in the grade 10 sciencefloriejanemacaya1

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl

Biological Classification BioHack (3).pdfmuntazimhurra

Oberski EAM 2018 - Incidental data for serious social research

1. Incidental data for serious social research Daniel Oberski Utrecht Applied Data Science Dept Methodology & Statistics http://daob.nl https://uu.nl/ads

2. • Incidental data are used throughout business and government • What about social science? 1. Done - 2. To do - 3. Conclusion

3. 1. Some of the applied work done so far

4. Incomplete timeline key applied papers Some names: Pentland, Lazer, Ginsberg, Kosinski, Nguyen, Daas, O’Connor, Tumasjan, Preoţiuc-Pietro, Mellon, …

5. Done (individual-level!): Facebook, Twitter: • Political orientation, Personality, Age, Sex, Education, Job title, Income, Well-being, Depression, Multilingualism, Dialect, Sexual orientation, Ethnicity, Weak network ties… Phone sensors • GPS: Movement type, Activity, Depression, Health, Employment • Bluetooth + cell tower: Friendship networks • Accelerometer + Microphone: Activity • …

7. Pirates win German elections!

8. … at least, on Twitter Jungherr et al. (2012). Why the Pirate Party won the German Election of 2009. Soc Sci Comp Rev. Gayo-Avello (2012). I tried to predict elections from Twitter and all I got was this lousy paper.

9. What kind of things are people doing right now?

10. Blandfort et al. (23 Jul 2018). Multimodal Social Media Analysis for Gang Violence Prevention. ArXiV:1807.08465v1. “High af” “Shyt Dnt always happen how u plan it” “Goodmorning cold ass world” “Rip lil B” Image+Text -> Aggression/Loss/Substance use/Other

11. 2. What still needs to be done?

12. “The (implicit) hope is that analyses of social media content might be substituted for costly and burdensome survey responses. Current evidence suggests we are far from that…” Conrad (2015)

13. Problems with incidental data: methodological Selectivity Reliability Source:Mellon&Prosser(2017) Comparability:

14. Problems with incidental data: ID-specific API changes Reproducibility

15. Daniel’s delightful data science dictionary A special service for savvy social scientists

16. Data science term Social science term Learning Estimating a model Supervised learning Predicting stuff Unsupervised learning Latent variable modeling Example / instance Case Feature (Independent) variable Target Dependent variable Loss * log-likelihood Gaussian Bayesian network Structural equation model Classifier Model for categorical DV Regression Model for continuous DV Softmax Multinomial regression Error Prediction error Variance * Prediction sampling error Bias * “Average prediction error” Social science term Data science term Criterion variable ~ Ground truth Capitalization on chance, p-hacking, HARKing, etc. Overfitting Reliability ? Internal validity ? External validity ? (-> generalization error) Measurement invariance ~ Concept drift (-> transfer learning;) Measurement error Noise Measurement error model (correction) Noise-aware machine learning Measurement error model (estimation) Inverse model ~Deviance; Chi-square (exponential of) Perplexity ? Grand challenge Legend: *: Usually. ~: Not really the same, but close enough. ->: Relates to. ?: Work to do!

17. Essential tools for methodologists • Cross-validation and its relationship to generalizability Train/validation/test paradigm “Overfitting” theory • Penalized estimation L1 LASSO; L2 ridge; horseshoe; … • Standard data science prediction workflow

18. Solving key social science challenges? Grand challenge approach (thanks to Adrienne Mendrik, NL eScience center) Multimodal learning (“data fusion”; see work Katrijn van Deun, Tilburg University) Privacy-aware ML (differential privacy, federated learning; see Cynthia Dwork, Microsoft)

19. Resources > Books > Beginners

20. Resources > Books > Advanced

21. Summary • Incidental data haven’t revolutionized our field yet; • Probably because we need to work the methodology first; • Although scores of authors have come to the same conclusion,, most of the work remains to be done; You are the ideal person to do this work.

22. Thank you for your attention! E: d.l.oberski@uu.nl T: @DanielOberski W: http://daob.nl W: https://uu.nl/ads

Oberski EAM 2018 - Incidental data for serious social research

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Oberski EAM 2018 - Incidental data for serious social research

Semelhante a Oberski EAM 2018 - Incidental data for serious social research (20)

Mais de Daniel Oberski

Mais de Daniel Oberski (10)

Último

Último (20)

Oberski EAM 2018 - Incidental data for serious social research