Slides by TEAM BIRCH from the SICSA Big Data InfoVis Summer School 2013 -
Members:
Ruth Agbakoba
Anil Bandhakavi
Aminu Muhammad
Chris Hillman
Nut Limsopathan
2. Overview
1. Introduction
2. Background to Twitter and Boston Bombings
3. Big and Dirty Data Issues
4. Process: Capturing the integrated learning process
5. 5 W’ of Twitter Analytics
6. DEMO ‘Visualisation’
7. Further Work
8. Learning Outcomes
4. Our Data
• Twitter Data from 16:00 to 19:00
RE: Boston Marathon (Bombing)
• Approx 550,000 tweets covering the 3 hour Period
• Challenges
– Data format
– Lack of information
– UserIDs vs. UserNames
5. Big and Dirty Data Issues
1. Each tweet should have a record of its own! (Lines)
2. Formatting Issues
3. No standardisation (only ~10% tweets geo-location)
4. Only 5 fields > had to create three more
5. Different languages
6. Information overload – many different patterns identified
therefore difficult to focus on a particular visualisation.
6. Overview of Process
Python
Script
Harvests
Tweets using
the Twitter
API
MapReduce
code
processes
tweets
Acquire Parse/Filter/Mine
Create
Visualisation
in Tableau
Public and
Google
Fusion
Write out
Text Files
relevant to
the analytics
Display in
Web Portal
on Users
Screen
Represent Interact
7. Map Reduce
MapReduce code processes tweets
• Parse
• Added information where possible – retweet/hashtag/touser
• Filter
• Remove Records with invalid fields
• Split into Geocoded, non- Geocoded
• Mine
• Word Counts
• Hashtag Counts – all and split by location / original vs. retweet
• Sentiment Extraction
Acquire Parse/Filter/Mine Represent Interact
8. Visualisation Tools Used
Created a Real-time Twitter Analytics Portal with
• Tableau Public
• Google Fusion
• Wix Web Portal
• Purpose:
– Insight
– Exploratory
– Confirmation
11. Future Work
• Gain an holistic view of the story over time
– Bombing – 15th April
– Shooting – 18th April
– Fire fight & Manhunt – 19th April
• Reflect the story as it evolved
– Clustering
– NLP (to move from basic to advanced analytics)
– Explore more visualisation types
12. Thank you for Listening!
TEAM BIRCH: Chris, Ruth, Nut, Aminu and Anil