SlideShare uma empresa Scribd logo
1 de 32
Not-So-Obvious Online
Data Sources for
Demographic Research
Ingmar Weber
@ingmarweber
https://sites.google.com/site/smdrworkshop/
Targeted Advertising as a Digital Census
All the Internet giants make money with targeted advertising
It’s in their commercial interest to “understand” their users
Rich data on both demographic and behavioral attributes
Usually not available for outside researchers, but …
Some aggregate “audience estimates” available for advertisers:
How many users/impressions match criteria X?
Supported by (at least) Facebook, Twitter, and Google
Facebook’s Advertising Reach Estimates
https://www.facebook.com/ads/manager/creation/creation/
https://developers.facebook.com/docs/marketing-api/buying-api/targeting/v2.8
Easy-to-Use Python code
https://github.com/maraujo/pySocialWatcher
Created by Matheus Araujo at QCRI
Contact me if you want to (i) know about important
details, and (ii) know what’s in the pipeline.
Sneak Preview: Estimating Stocks of Migrants
Joint work with Emilio Zagheni and Krishna Gummadi. Currently under review.
Twitter’s Advertising Reach Estimates
https://dev.twitter.com/ads/reference/1/get/
accounts/%3Aaccount_id/reach_estimate
https://ads.twitter.com/login
Google’s Advertising Reach Estimates
https://support.google.com/adwords/answer/2475441?hl=en
https://developers.google.com/adwords/api/docs/guides/traffic-
estimator-servicehttp://adwords.google.com/
Using Online Ads to Reach Migrants
Only described use as a passive data source. But can be used as an active
outreach channel. Examples below.
“Migrant Sampling Using Facebook Advertisements A Case Study of Polish:
Migrants in Four European Countries”; S. Pötzschke, M. Braun; 2016
“Using Internet to Recruit Immigrants with Language and Culture Barriers for
Tobacco and Alcohol Use Screening: A Study Among Brazilians”; B. H. Carlini, L.
Safioti, T. C. Rue, L. Miles; 2014
“Reaching and recruiting Turkish migrants for a clinical trial through Facebook: A
process evaluation”; B. Ü. Ince, P. Cuijpers, E. van 't Hof, H. Riper; 2014
Google Trends on Steroids
Google Trends does not provide demographic information
Get DMA-level demographic information (race, income, …)
Join with DMA-level Google Trends information
Can potentially give “average income of a web search query over time”
But often sparsity problems, with data only showing for bigger cities (=> bias)
See “The cost of racial animus on a black candidate: Evidence using Google
search data”, Seth Stephens-Davidowitz; Journal of Public Economics; 2014
Also: “Demographic information flows”, Ingmar Weber, Alejandro Jaimes; CIKM 2010
“Fertility and its Meaning: Evidence from Search Behavior”
Jussi Ojala, Emilio Zagheni, Francesco C. Billari, Ingmar Weber
ICWSM; 2017
https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15579
Example study using Google Correlate
Study Goals
(i) detect evidence for different contexts surrounding different types of fertility;
Teen, low/high income, (un-)married, …
(ii) model regional variation across states for different fertility levels;
What distinguishes Alabama from California from New York?
(iii) track temporal changes in fertility across time.
Train a model across space, predict across time.
Different Contexts of Fertility
Discover search terms correlated with different fertility rates across US states
https://www.google.com/trends/correlate/search?e=id:f7PU4mFDWV-&t=all
Remove terms with no conceivable link to sex, pregnancy or maternity
Predicting Spatial Variability
Performance of the regression models using
leave-one-out cross-validation. SMAPE is in [%], RMSE
values are multiplied by 1,000.
Use the previous terms to build models
predicting state-level fertility rates
All these models make predictions based on
linear combinations of search intensity
Goal: apply these spatial models across time
Learning Across Space, Predicting Across Time
Temporal trend when applying the “teen” model across
time. Values are rescaled to a maximum of 1.0.
Pearson r correlation across 2010-2015 when
using the spatial model to predict trends across
time.
“Quantitative analysis of population-scale family trees using
millions of relatives”
Joanna Kaplanis, Assaf Gordon, Mary Wahl, Michael Gershovits, Barak Markus,
Mona Sheikh, Melissa Gymrek, Gaurav Bhatia, Daniel G MarArthur, Alkes Price,
Yaniv Erlich
bioRxiv; 2017
http://biorxiv.org/content/early/2017/02/07/106427
Example study using an online genealogy database
Online Genealogy Data - Again
13 million people, after
cleaning, in a single pedigree
Small sample of mitochondria
and Y-STR haplotypes (not
discussed)
Also location information.
Cleaned, de-identified data
available at:
http://familinx.org/
Geographical Distribution of Data (Place of Birth)
Pre 1800 Post 1800
Mortality and City Growth
Their model (red) validated against
previous models (Oeppen & Vaupel, black)
Mobility Over Time
And a lot more! Check out the paper.
Median migration distance in North American
born individuals as a function of time.
Red: mother-offspring,
blue: father-offspring,
black: marital radius.
Dots represent the data before smoothing.
“A novel web informatics approach for automated
surveillance of cancer mortality trends”
Georgia Tourassi, Hong-Jun Yoon, Songhua Xu
Journal of Biomedical Informatics; 2016
http://www.sciencedirect.com/science/article/pii/S1532046416300181
Example study using online obituaries
Crawling Cancer-Related Obituaries
Use a web search engine to get seeds
for queries such as “breast cancer
obituary, New York”
Example
Then post-filter
Then lung vs. breast cancer
Then infer age and gender
Cancer Mortality Rates from Online Obituaries
Percent of lung cancer deaths per age
group based on SEER data and
obituaries for both genders.
Annual female breast cancer death rates based on
obituaries and on National Vital Statistics Report
(NVSR) for 2008–2012.
“From Migration Corridors to Clusters: The Value of Google+
Data for Migration Studies”
Johnnatan Messias, Fabricio Benevenuto, Ingmar Weber, Emilio Zagheni
ASONAM; 2016
http://ieeexplore.ieee.org/document/7752269/
Example study using public Google Plus profiles
Beyond Origin-Destination Migration Analysis
I’m a German citizen living in Qatar. So did I migrate from Germany to Qatar?
Yes, according to Qatari border control.
But: Germany (78->99), United Kingdom (99->03),
Germany (03->07), Switzerland (07->09),
Spain (09->12), Qatar (12->now)
Use the “places lived” on Google+
In 2012, no “currently”, just set of places
Get tuples of co-lived countries
Flows/Corridors vs. Tuples/Clusters
This is what border
control can obtain
(with directionality)
This is what the Google+ “places lived” provides
Expected Cluster Frequencies
Lots of migrant flows on (A,B), (A,C) and (B,C) => expect lots on (A,B,C)
“Expect” = rank clusters according to:
min(freqAB; freqAC; freqBC) * mean(freqAB; freqAC; freqBC)
Best performing ranking approximation (Kendall .565, Spearman .754)
Look at outliers and try to explain those
Outlier Frequencies
Look at “expected rank – actual rank”
Middle 20%: “close to expected”
Top 20%: “higher than expected”
Low 20%: “lower than expected”
Feature Analysis
More than expected:
(Spain, France, Italy)
(UAE, India, Singapore)
Less than expected:
(Brazil, Mexico, USA)
(Canada, China, UK)
Most discriminative features for 3-class distinction
Enriching Your Data
Demographic Inference 101
Demographic Inference – Name Dictionaries
First name gender dictionaries:
https://ideas.repec.org/c/wip/eccode/10.html
http://gender.io/
Contact me for dictionary in “International Gender Differences and Gaps in Online
Social Networks”
Ethnicity Dictionary:
https://www.census.gov/topics/population/genealogy/data/2010_surnames.html
Also see “Inferring Nationalities of Twitter Users and Studying Inter-National Linking”
Demographic Inference – Image-Based Inference
Face++ Cognitive Services
https://www.faceplusplus.com/face-detection/
Microsoft Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/computer-vision-api
Demographic Inference – Build Your Training Data
FollowerWonk by Moz
https://moz.com/followerwonk/bio
https://moz.com/followerwonk/bio/?q=(38-yr%7C38-yrs%7C38%20years)%20old%0A%0A
Questions, Comments, Thoughts?
https://sites.google.com/site/digitaldemography/

Mais conteúdo relacionado

Mais procurados

INResearch Social Media Proposal
INResearch Social Media ProposalINResearch Social Media Proposal
INResearch Social Media Proposal
Jillian Schurr
 

Mais procurados (20)

Digital methods for Social Sciences: origin and definitions
Digital methods for Social Sciences: origin and definitionsDigital methods for Social Sciences: origin and definitions
Digital methods for Social Sciences: origin and definitions
 
GitHub as Transparency Device in Data Journalism, Open Data and Data Activism
GitHub as Transparency Device in  Data Journalism, Open Data and Data ActivismGitHub as Transparency Device in  Data Journalism, Open Data and Data Activism
GitHub as Transparency Device in Data Journalism, Open Data and Data Activism
 
Doing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsDoing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer Schools
 
Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...
 
Global Pulse: Mining Indonesian Tweets to Understand Food Price Crises copy
Global Pulse: Mining Indonesian Tweets to Understand Food Price Crises copyGlobal Pulse: Mining Indonesian Tweets to Understand Food Price Crises copy
Global Pulse: Mining Indonesian Tweets to Understand Food Price Crises copy
 
How to get started with Data Journalism
How to get started with Data JournalismHow to get started with Data Journalism
How to get started with Data Journalism
 
INResearch Social Media Proposal
INResearch Social Media ProposalINResearch Social Media Proposal
INResearch Social Media Proposal
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual data
 
Data Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresData Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data Infrastructures
 
Mapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital MethodsMapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital Methods
 
Frontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and TrustFrontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and Trust
 
Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...
 
Big Data and the Social Sciences
Big Data and the Social SciencesBig Data and the Social Sciences
Big Data and the Social Sciences
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
GlobalPulse_SAS_MethodsPaper2011
GlobalPulse_SAS_MethodsPaper2011GlobalPulse_SAS_MethodsPaper2011
GlobalPulse_SAS_MethodsPaper2011
 
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
 
Analyzing Attitudes Towards Contraception & Teenage Pregnancy Using Social Da...
Analyzing Attitudes Towards Contraception & Teenage Pregnancy Using Social Da...Analyzing Attitudes Towards Contraception & Teenage Pregnancy Using Social Da...
Analyzing Attitudes Towards Contraception & Teenage Pregnancy Using Social Da...
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
Crowdsourcing High- Frequency Food Price Data in Rural Indonesia - Project Ov...
Crowdsourcing High- Frequency Food Price Data in Rural Indonesia - Project Ov...Crowdsourcing High- Frequency Food Price Data in Rural Indonesia - Project Ov...
Crowdsourcing High- Frequency Food Price Data in Rural Indonesia - Project Ov...
 
Social Media Analysis: Present and Future
Social Media Analysis: Present and FutureSocial Media Analysis: Present and Future
Social Media Analysis: Present and Future
 

Semelhante a Not-so-obvious Online Data Sources for Demographic Research

Finger On The Pulse
Finger On The PulseFinger On The Pulse
Finger On The Pulse
mccannpulse
 
Finger On The Pulse
Finger On The PulseFinger On The Pulse
Finger On The Pulse
mccannpulse
 

Semelhante a Not-so-obvious Online Data Sources for Demographic Research (20)

Digital Demography - Keynote at SocInfo'18
Digital Demography - Keynote at SocInfo'18Digital Demography - Keynote at SocInfo'18
Digital Demography - Keynote at SocInfo'18
 
Monitoring migration using social media data an introduction
Monitoring migration using social media data   an introductionMonitoring migration using social media data   an introduction
Monitoring migration using social media data an introduction
 
Ethical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systemsEthical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systems
 
Scraping the Social Graph with Ushahidi and SwiftRiver
Scraping the Social Graph with Ushahidi and SwiftRiverScraping the Social Graph with Ushahidi and SwiftRiver
Scraping the Social Graph with Ushahidi and SwiftRiver
 
Opportunities in technology and connected health for population science
Opportunities in technology and connected health for population science Opportunities in technology and connected health for population science
Opportunities in technology and connected health for population science
 
Visualisation; help understand and communicate
Visualisation; help understand and communicateVisualisation; help understand and communicate
Visualisation; help understand and communicate
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Big Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarBig Data Analytics - The New Cold War
Big Data Analytics - The New Cold War
 
Using internet advertising data for studying international migration
Using internet advertising data for studying international migrationUsing internet advertising data for studying international migration
Using internet advertising data for studying international migration
 
Google Insights and public data
Google Insights and public data Google Insights and public data
Google Insights and public data
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as power
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Big data for development
Big data for development Big data for development
Big data for development
 
H(app)athon
H(app)athon H(app)athon
H(app)athon
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Finger On The Pulse
Finger On The PulseFinger On The Pulse
Finger On The Pulse
 
Finger On The Pulse
Finger On The PulseFinger On The Pulse
Finger On The Pulse
 

Mais de Ingmar Weber

Mais de Ingmar Weber (16)

Digital Gender Gaps Seen Through Social Media
Digital Gender Gaps Seen Through Social MediaDigital Gender Gaps Seen Through Social Media
Digital Gender Gaps Seen Through Social Media
 
Different Hashtags, Different Opinions - Twitter Polarization in Egypt
Different Hashtags, Different Opinions - Twitter Polarization in EgyptDifferent Hashtags, Different Opinions - Twitter Polarization in Egypt
Different Hashtags, Different Opinions - Twitter Polarization in Egypt
 
Data on Polarization, Peace, and Propaganda
Data on Polarization, Peace, and PropagandaData on Polarization, Peace, and Propaganda
Data on Polarization, Peace, and Propaganda
 
Using Advertising Platforms for Social Good
Using Advertising Platforms for Social GoodUsing Advertising Platforms for Social Good
Using Advertising Platforms for Social Good
 
Not so-obvious social media analysis to study current affairs
Not so-obvious social media analysis to study current affairsNot so-obvious social media analysis to study current affairs
Not so-obvious social media analysis to study current affairs
 
Digital advertising data for migration research
Digital advertising data for migration researchDigital advertising data for migration research
Digital advertising data for migration research
 
Advertising Data for Good
Advertising Data for GoodAdvertising Data for Good
Advertising Data for Good
 
Using advertising data to model migration, poverty and digital gender gaps
Using advertising data to model migration, poverty and digital gender gapsUsing advertising data to model migration, poverty and digital gender gaps
Using advertising data to model migration, poverty and digital gender gaps
 
Correlated Impulses: Using Facebook Interests to Improve Predictions of Crime...
Correlated Impulses: Using Facebook Interests to Improve Predictions of Crime...Correlated Impulses: Using Facebook Interests to Improve Predictions of Crime...
Correlated Impulses: Using Facebook Interests to Improve Predictions of Crime...
 
Tapping into advertising platforms to monitor ict usage and more
Tapping into advertising platforms to monitor ict usage and moreTapping into advertising platforms to monitor ict usage and more
Tapping into advertising platforms to monitor ict usage and more
 
Hate Speech, Polarization and Online Data
Hate Speech, Polarization and Online DataHate Speech, Polarization and Online Data
Hate Speech, Polarization and Online Data
 
Tracking Digital Gender Gaps
Tracking Digital Gender GapsTracking Digital Gender Gaps
Tracking Digital Gender Gaps
 
Estimating Migration and Quantifying Migrant Assimilation Using Internet Adve...
Estimating Migration and Quantifying Migrant Assimilation Using Internet Adve...Estimating Migration and Quantifying Migrant Assimilation Using Internet Adve...
Estimating Migration and Quantifying Migrant Assimilation Using Internet Adve...
 
Social media analysis for better policy making
Social media analysis for better policy makingSocial media analysis for better policy making
Social media analysis for better policy making
 
Matching Methods and Natural Experiments - Examples of Causal Inference from ...
Matching Methods and Natural Experiments - Examples of Causal Inference from ...Matching Methods and Natural Experiments - Examples of Causal Inference from ...
Matching Methods and Natural Experiments - Examples of Causal Inference from ...
 
A Warm Welcome Matters! The Link Between Social Feedback and Weight Loss in /...
A Warm Welcome Matters! The Link Between Social Feedback and Weight Loss in /...A Warm Welcome Matters! The Link Between Social Feedback and Weight Loss in /...
A Warm Welcome Matters! The Link Between Social Feedback and Weight Loss in /...
 

Último

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Último (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 

Not-so-obvious Online Data Sources for Demographic Research

  • 1. Not-So-Obvious Online Data Sources for Demographic Research Ingmar Weber @ingmarweber https://sites.google.com/site/smdrworkshop/
  • 2. Targeted Advertising as a Digital Census All the Internet giants make money with targeted advertising It’s in their commercial interest to “understand” their users Rich data on both demographic and behavioral attributes Usually not available for outside researchers, but … Some aggregate “audience estimates” available for advertisers: How many users/impressions match criteria X? Supported by (at least) Facebook, Twitter, and Google
  • 3. Facebook’s Advertising Reach Estimates https://www.facebook.com/ads/manager/creation/creation/ https://developers.facebook.com/docs/marketing-api/buying-api/targeting/v2.8 Easy-to-Use Python code https://github.com/maraujo/pySocialWatcher Created by Matheus Araujo at QCRI Contact me if you want to (i) know about important details, and (ii) know what’s in the pipeline.
  • 4. Sneak Preview: Estimating Stocks of Migrants Joint work with Emilio Zagheni and Krishna Gummadi. Currently under review.
  • 5. Twitter’s Advertising Reach Estimates https://dev.twitter.com/ads/reference/1/get/ accounts/%3Aaccount_id/reach_estimate https://ads.twitter.com/login
  • 6. Google’s Advertising Reach Estimates https://support.google.com/adwords/answer/2475441?hl=en https://developers.google.com/adwords/api/docs/guides/traffic- estimator-servicehttp://adwords.google.com/
  • 7. Using Online Ads to Reach Migrants Only described use as a passive data source. But can be used as an active outreach channel. Examples below. “Migrant Sampling Using Facebook Advertisements A Case Study of Polish: Migrants in Four European Countries”; S. Pötzschke, M. Braun; 2016 “Using Internet to Recruit Immigrants with Language and Culture Barriers for Tobacco and Alcohol Use Screening: A Study Among Brazilians”; B. H. Carlini, L. Safioti, T. C. Rue, L. Miles; 2014 “Reaching and recruiting Turkish migrants for a clinical trial through Facebook: A process evaluation”; B. Ü. Ince, P. Cuijpers, E. van 't Hof, H. Riper; 2014
  • 8. Google Trends on Steroids Google Trends does not provide demographic information Get DMA-level demographic information (race, income, …) Join with DMA-level Google Trends information Can potentially give “average income of a web search query over time” But often sparsity problems, with data only showing for bigger cities (=> bias) See “The cost of racial animus on a black candidate: Evidence using Google search data”, Seth Stephens-Davidowitz; Journal of Public Economics; 2014 Also: “Demographic information flows”, Ingmar Weber, Alejandro Jaimes; CIKM 2010
  • 9. “Fertility and its Meaning: Evidence from Search Behavior” Jussi Ojala, Emilio Zagheni, Francesco C. Billari, Ingmar Weber ICWSM; 2017 https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15579 Example study using Google Correlate
  • 10. Study Goals (i) detect evidence for different contexts surrounding different types of fertility; Teen, low/high income, (un-)married, … (ii) model regional variation across states for different fertility levels; What distinguishes Alabama from California from New York? (iii) track temporal changes in fertility across time. Train a model across space, predict across time.
  • 11. Different Contexts of Fertility Discover search terms correlated with different fertility rates across US states https://www.google.com/trends/correlate/search?e=id:f7PU4mFDWV-&t=all Remove terms with no conceivable link to sex, pregnancy or maternity
  • 12. Predicting Spatial Variability Performance of the regression models using leave-one-out cross-validation. SMAPE is in [%], RMSE values are multiplied by 1,000. Use the previous terms to build models predicting state-level fertility rates All these models make predictions based on linear combinations of search intensity Goal: apply these spatial models across time
  • 13. Learning Across Space, Predicting Across Time Temporal trend when applying the “teen” model across time. Values are rescaled to a maximum of 1.0. Pearson r correlation across 2010-2015 when using the spatial model to predict trends across time.
  • 14. “Quantitative analysis of population-scale family trees using millions of relatives” Joanna Kaplanis, Assaf Gordon, Mary Wahl, Michael Gershovits, Barak Markus, Mona Sheikh, Melissa Gymrek, Gaurav Bhatia, Daniel G MarArthur, Alkes Price, Yaniv Erlich bioRxiv; 2017 http://biorxiv.org/content/early/2017/02/07/106427 Example study using an online genealogy database
  • 15. Online Genealogy Data - Again 13 million people, after cleaning, in a single pedigree Small sample of mitochondria and Y-STR haplotypes (not discussed) Also location information. Cleaned, de-identified data available at: http://familinx.org/
  • 16. Geographical Distribution of Data (Place of Birth) Pre 1800 Post 1800
  • 17. Mortality and City Growth Their model (red) validated against previous models (Oeppen & Vaupel, black)
  • 18. Mobility Over Time And a lot more! Check out the paper. Median migration distance in North American born individuals as a function of time. Red: mother-offspring, blue: father-offspring, black: marital radius. Dots represent the data before smoothing.
  • 19. “A novel web informatics approach for automated surveillance of cancer mortality trends” Georgia Tourassi, Hong-Jun Yoon, Songhua Xu Journal of Biomedical Informatics; 2016 http://www.sciencedirect.com/science/article/pii/S1532046416300181 Example study using online obituaries
  • 20. Crawling Cancer-Related Obituaries Use a web search engine to get seeds for queries such as “breast cancer obituary, New York” Example Then post-filter Then lung vs. breast cancer Then infer age and gender
  • 21. Cancer Mortality Rates from Online Obituaries Percent of lung cancer deaths per age group based on SEER data and obituaries for both genders. Annual female breast cancer death rates based on obituaries and on National Vital Statistics Report (NVSR) for 2008–2012.
  • 22. “From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies” Johnnatan Messias, Fabricio Benevenuto, Ingmar Weber, Emilio Zagheni ASONAM; 2016 http://ieeexplore.ieee.org/document/7752269/ Example study using public Google Plus profiles
  • 23. Beyond Origin-Destination Migration Analysis I’m a German citizen living in Qatar. So did I migrate from Germany to Qatar? Yes, according to Qatari border control. But: Germany (78->99), United Kingdom (99->03), Germany (03->07), Switzerland (07->09), Spain (09->12), Qatar (12->now) Use the “places lived” on Google+ In 2012, no “currently”, just set of places Get tuples of co-lived countries
  • 24. Flows/Corridors vs. Tuples/Clusters This is what border control can obtain (with directionality) This is what the Google+ “places lived” provides
  • 25. Expected Cluster Frequencies Lots of migrant flows on (A,B), (A,C) and (B,C) => expect lots on (A,B,C) “Expect” = rank clusters according to: min(freqAB; freqAC; freqBC) * mean(freqAB; freqAC; freqBC) Best performing ranking approximation (Kendall .565, Spearman .754) Look at outliers and try to explain those
  • 26. Outlier Frequencies Look at “expected rank – actual rank” Middle 20%: “close to expected” Top 20%: “higher than expected” Low 20%: “lower than expected”
  • 27. Feature Analysis More than expected: (Spain, France, Italy) (UAE, India, Singapore) Less than expected: (Brazil, Mexico, USA) (Canada, China, UK) Most discriminative features for 3-class distinction
  • 29. Demographic Inference – Name Dictionaries First name gender dictionaries: https://ideas.repec.org/c/wip/eccode/10.html http://gender.io/ Contact me for dictionary in “International Gender Differences and Gaps in Online Social Networks” Ethnicity Dictionary: https://www.census.gov/topics/population/genealogy/data/2010_surnames.html Also see “Inferring Nationalities of Twitter Users and Studying Inter-National Linking”
  • 30. Demographic Inference – Image-Based Inference Face++ Cognitive Services https://www.faceplusplus.com/face-detection/ Microsoft Cognitive Services https://www.microsoft.com/cognitive-services/en-us/computer-vision-api
  • 31. Demographic Inference – Build Your Training Data FollowerWonk by Moz https://moz.com/followerwonk/bio https://moz.com/followerwonk/bio/?q=(38-yr%7C38-yrs%7C38%20years)%20old%0A%0A