Social media sites like Twitter provide readily accessible sources of large-volume, high-velocity data streams, now referred to as ``Big Data.''
While private companies have already made great strides in leveraging these social media sources, many public organizations and government agencies could reap significant benefits from these resources.
Care must be exercised in this integration, however, as huge data sets come with their own intrinsic issues.
This presentation explores these advantages and hazards with several experiments that demonstrate social media data's ability to support government organizations and supplement existing programs.
This is a Powerpoint about research into the codes and conventions of a film ...
Powers and Problems of Integrating Social Media Data with Public Health and Safety
1. 1!
Powers and Problems of Integrating Social
Media Data with Public Health and Safety
Bloomberg D4GX
28 September 2015
Bloomberg HQ
Cody Buntain and Jennifer Golbeck
{cbuntain,golbeck}@cs.umd.edu
Human-Computer Interaction Lab
University of Maryland
Gary LaFree
garylafree@gmail.com
START Center
University of Maryland
11. The Problem
• Surveys are expensive
11!
$0!
$3,500,000,000!
$7,000,000,000!
$10,500,000,000!
$14,000,000,000!
Total Population!
Census Cost!
Census Year!CensusCost!
12. A Solution
• Augment surveys with accessible
and cheap* social media data!
• Social media data can supplement
when surveys are too costly or slow!
12!
3,000,000!
3,500,000!
4,000,000!
4,500,000!
5,000,000!
5,500,000!
6,000,000!
Day of Collection!
TweetCount!
13. A Solution
• Augment surveys with easily
accessible social media data!
• Social media data could substitute
when surveys are too costly or slow!
13!
3,000,000!
3,500,000!
4,000,000!
4,500,000!
5,000,000!
5,500,000!
6,000,000!
Day of Collection!
TweetCount!
15. Demonstrations
• Geocoded tweets and the US Census!
• Drug references on social media!
• Sentiment and law enforcement!
15!
Focus on Twitter
because tweets are
easy to acquire via
public sample
stream!
16. Twitter and the US Census
• 1-3% of tweets have geolocation
information!
• GPS coordinates!
• Bounding boxes!
• Places!
• Compare Twitter distributions per
state with 2013 US Census data!
16!
17. • 1-3% of tweets have geolocation
information!
• GPS coordinates!
• Bounding boxes!
• Places!
• Compare Twitter distributions per
state with 2013 US Census data!
17!
Twitter and the US Census
18. • Can Twitter track geographic
trends in drug use?!
• National Survey on Drug Use and
Mental Health (NSDUH) has
ground truth for:!
• Marijuana!
• Cocaine!
• Compare orders of states for
each drug!
18!
Twitter and the NSDUH
19. • Can Twitter track geographic
trends in drug use?!
• National Survey on Drug Use and
Mental Health (NSDUH) has
ground truth for:!
• Marijuana!
• Cocaine!
• Compare orders of states for
each drug!
19!
High ranked correlation!
for marijuana!
Twitter and the NSDUH
20. • Can Twitter track geographic
trends in drug use?!
• National Survey on Drug Use and
Mental Health (NSDUH) has
ground truth for:!
• Marijuana!
• Cocaine!
• Compare orders of states for
each drug!
20!
High ranked correlation!
for marijuana!
Low correlation!
for cocaine!
Twitter and the NSDUH
21. • Can we measure public response to
law enforcement?!
• Outreach from law enforcement to
community is important!
21!
Twitter Sentiment and Police
22. • Compare sentiment towards police
around two events:!
• 2013 Boston Marathon Bombing!
• 2014 Ferguson, MO Protests!
22!
Twitter Sentiment and Police
2013 Boston Marathon Bombings!
Sentiment!
23. • Compare sentiment towards police
around two events:!
• 2013 Boston Marathon Bombing!
• 2014 Ferguson, MO Protests!
23!
Twitter Sentiment and Police
2013 Boston Marathon Bombings!
Sentiment!
Significant !
drop!
24. • Compare sentiment towards police
around two events:!
• 2013 Boston Marathon Bombing!
• 2014 Ferguson, MO Protests!
24!
Twitter Sentiment and Police
2014 Ferguson, MO Protests!
Sentiment!
25. • Location data in the public stream!
• 1-3% of the 1% public sample
might work for coarse insights!
• Difficult to get community-level
granularity!
• Can sacrifice volume for specificity
with GPS bounding boxes!
25!
The Limitations
1%!
Sample Stream!
26. • Limited availability of data from
other sources!
• Facebook!
• Full access to data sources can be
expensive!
• Addressable by companies?!
26!
The Limitations
27. • Twitter has known biases [1]!
• Mostly male (though decreasing)!
• Oversamples Caucasians!
27!
The Limitations
and the list previously described. We do so by compar
the first word of their self-reported name to the gender
We observe that there exists a match for 64.2% of the us
Moreover, we find a strong bias towards male users: F
71.8% of the the users who we find a name match for ha
male name.
0
0.2
0.4
0.6
0.8
1
2007-01 2007-07 2008-01 2008-07 2009-01 2009-07
FractionofJoiningUsers
whoareMale
Date
Figure 3: Gender of joining users over time, binned
groups of 10,000 joining users (note that the join rate
28. • Relevancy and credibility!
• Huge amounts of data is nice, but
we want the right data!
• No guarantee things on social
media are true
28!
The Limitations
!
!
!
31. 31!
The Conclusion
Social media data can augment !
public survey-based research efforts!
The Point
The Limits
• User location information!
• Limited data from other sources!
• Sampling bias!