This document discusses challenges and opportunities for researching social media data at scale using Twitter as a case study. It outlines several related projects analyzing Twitter data to understand public communication during events. Key challenges include developing multidisciplinary teams, training students in new skills, building infrastructure to capture and analyze large datasets, rapidly disseminating findings, and facilitating collaboration nationally and internationally to establish shared approaches and standards.
Media Ecologies and Methodological Innovation: Understanding Public Communication through Social Media
1. Media Ecologies and Methodological
Innovation: The Case of Twitter
Assoc. Prof. Axel Bruns
Queensland University of Technology
a.bruns@qut.edu.au – http://snurb.info/ – @snurb_dot_info
http://mappingonlinepublics.net/
2. Background: Related Projects
• CCI Project:
– Media Ecologies and Methodological Innovation
(Axel Bruns, Jean Burgess, Kate Crawford, Gerard Goggin, Terry Flew, John Hartley + RA Frances Shaw)
• ARC Discovery Project (2010-13):
– New Media and Public Communication
(AB & JB + Sociomantic Labs, RAs: Caro Jende, Tim Highfield, Jen Lofgren, Mirka Streckhardt)
• ATN-DAAD Projects (2011-12):
– Social Media Monitoring: Analysis of Social Networks for Enterprises’ Issue Management (AsNIM)
(AB & JB + Tanya Nitins, University of Münster: Stefan Stieglitz, Nina Krüger, Tobias Brockmann et al.)
– Extending Computer-Aided Methods for the Analysis of Blogging and Microblogging Discourses and Publics
(AB & JB + Stephen Harrington, University of Düsseldorf: Katrin Weller, Cornelius Puschmann et al.)
• ASSA-ISL Project (2010-11):
– Flood and Fire: Understanding the Structure and Process of Public Communication during Times of Crisis
(AB & JB + National Cheng Chi University, Taipei: Pai-Lin Chen, Tsai-Yen Li, Yu-Chung Cheng et al.)
• ARC Linkage Project (2012-14):
– Social Media in Times of Crisis: Learning from Recent Natural Disasters to Improve Future Strategies
(AB, JB, KC, TF + Queensland Department of Community Safety, Eidos Institute, Sociomantic Labs)
• Website: http://mappingonlinepublics.net/
3. Focus on Twitter
• Real-time public communication:
– Social media coverage as a first draft of the present
– Especially Twitter: flat, open, self-organising network
– First-hand, unfiltered, direct insights into Australians’ views
– Rich data on specific events and on long-term trends
• Readily available data:
– Access to rich data (and metadata) through standard APIs
– Especially on Twitter, limited immediate ethical concerns
– Ephemeral content which is lost to posterity unless archived
– ‘Big data’, but far from unmanageable
5. Filipinos
Marketing / PR
Adelaide
Perth / PR Wine
News / Business
Latika Bourke
Food
Journalism /
Politics / News
Australia on Twitter
Annabel Crabb Mumbrella
Leigh Sales
Fashion / Style / Parenting
Malcolm Turnbull
Marie Claire
ABC News
Crikey
Fashion / Magazines
Celebrities / Media Arts
Joe Hockey
Mia Freedman
Laurie Oakes
Sunrise on 7
Tony Abbott Music / Triple J
Julia Gillard
Matt Preston
Kevin Rudd Triple J
Teens / TV Hits
Wil Anderson TV
Football (Soccer) 7pm Project (follower/followee network –
140,000 most connected
AFL Radio
NRL
Australia users, of 550,000
Cricket Hamish and Andy Teens
Sports processed so far)
7. ‘Big Data’ Challenges: Teamwork
• Team-based research approaches:
– Interdisciplinary: media, communication and cultural studies; social science;
informatics; mathematics; statistics; journalism; crisis communication;
communication design; computer science; data visualisation; …
– Exploratory: rapid prototyping of research methods and tools; use of
emerging technology at the bleeding edge; following the data without a clear
and specific research goal in mind (beyond ‘mapping online publics’ in
general)
– Flexible: dealing with real-time data may mean ‘ambulance chasing’ (e.g.
#eqnz, #tsunami, #qantas); rapid data analysis and online publication well
ahead of journal publication turnarounds; application across wide range of
thematic and research domains
– Collaborative: towards natural sciences-style lab-based research models;
team research and multi-authored publications; individual sub-projects
developed and driven by specific team members
8. ‘Big Data’ Challenges: Graduate Training
• New postgraduate and postdoctoral skillsets:
– Postgraduate and postdoctoral recruitment: need for high-level
undergraduate/honours project units to enthuse and encourage promising
students; need to recruit well beyond standard media and communication
fields means need to be visible in those fields (why would a computer
scientist or statistician want to work with us?)
– Postgraduate training: need to be able to supervise highly multidisciplinary
research projects means multidisciplinary supervision teams; lab-style
collaborative research projects means exploration of collaborative
postgraduate research and cohort supervision
– Risky research: changeable technological frameworks and reliance on third
parties means whole PhD projects may be wiped out by a single Twitter API
change; lack of university ethics guidelines means need to develop own
research ethics and/or follow external standards (e.g. AoIR Ethics Guide)
9. ‘Big Data’ Challenges: Infrastructure
• Tools and support for ‘big data’ research:
– Data capture: some available (open source) tools; need for customisation
and further development; need for reliable, always-on capture infrastructure
(and IT support); API changes likely to break existing frameworks; truly big,
long term data access can be costly (may need industry partnerships?)
– Data processing: need for significant computing power to process and
visualise large data corpora; need for computer scientists to help develop
customised processing tools addressing specific research questions
– Data storage: even Twitter datasets can get very big; no standard solutions
for short- and medium-term storage; what about long-term archiving of
significant records of public communication (National Library)?
– Research Dissemination: publications on real-time events need to be faster
than standard journal cycles; need to embrace rapid publication of results
and analysis online; also need to share tools (e.g. as open source); but
what about sharing datasets to enable independent verification of results?
10. ‘Big Data’ Challenges: Collaborations
• Emerging field of research needs shared approaches:
– International comparisons: parallel research projects to explore national
differences and overlaps – e.g. Twitter and elections; Twitter and crisis
communication; …
– National consortia: shared infrastructure and datasets for particularly large-
scale projects – e.g. comprehensive tracking and analysis of public
communication by Australians on Twitter
– General sharing of methods and tools: natural sciences-style frameworks for
sharing tools and methods (and datasets?) to enable independent
verification of research results; development of shared standards for data
formats; researcher exchanges and internships
– Industry collaborations: e.g. application partnerships with domain partners
(media organisations, government departments, etc.); data capture,
processing, and storage partnerships with major technology partners
(Google, Microsoft, …); perhaps even partnerships with Twitter itself (?); but
also need to consider research ethics implications of such partnerships