SlideShare a Scribd company logo
1 of 28
Twitter floods when it rains:
A case study of the UK floods in
early 2014
Antonia Saravanou
University of Athens
Dimitrios Gunopulos
University of Athens
George Valkanas
Stevens Institute of Technology
Gennady Andrienko
Fraunhofer Institute IAIS, DE
Social Web for Disaster Management (WWW workshop 2015)
Florence, Italy
National and Kapodistrian
University of Athens
Outline
● Motivation
● Research Questions
● Methodology
○ Data Collection
○ Filtering Step: Flood-Related Lexicon
○ Clustering Step
○ Second Level Clustering
● Results
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Motivation
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Motivation
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Motivation
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Motivation
● Identify early the event and the affected area
● Monitor the evolution of the event
● Inform users for emergencies
● Resource allocation
● Ιmmediate notification of special incident
management units
Research questions
RQ1: How can we identify the areas that have been
hit the most by an event?
- where to dispatch emergency response units
RQ2: How effective can we be in identifying these
areas?
- robust and effective techniques to base decisions
RQ3: Can we identify areas that have been stricken
by the event in a similar manner?
- transfer the same techniques to similar affected areas
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Data Collection
● Twitter - custom crawler
○ Streaming API
● Collection of public tweets
○ Bounding box that covers UK
○ Extract only tweets with GPS
● 13-17 January 2014
● > 2.3 million geotagged tweets
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Flood Related Tweets
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
?
Entire Dataset
gps location within
UK b.b.
Flood - Related
Tweets
Filtering Step: Custom Flood-Related
lexicon
rain,
flood,
weather,
storm,
showers,
...
13 tokens
1546 tokens
456 tokens
tokens that contain
at least one word of
the initial seed set
as a substring
only related
tokens to the
event
initial seed set
Entire
Dataset
manually
review each
keyword and
discard non-
related
false positives
e.g. brain, train, e.t.c.
e.g. raining, floods,
#ukweather, e.t.c.
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Original vs. Flood Related Lexicon
● Manual cleaning process
is necessary
● Only 4 keywords flood-
related in the original
lexicon
● Flood Lexicon is ⅓ of the
Original
- Slow process
+ One time at the beginning
Top-10 most frequent keywords
Flood Related Tweets
exact match with at least
one keyword from our
flood related lexicon
Entire
Dataset
Flood - Related
Tweets
Flood
Related
Lexicon
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
● Why we care
○ where to dispatch emergency response units
○ notify citizens about areas with problems caused by
floods
● From GPS to areas
○ Perform spatial clustering using the GPS
coordinates
■ Convert GPS coordinates to Cartesian ones
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
RQ1: Identifying flood-affected areas
Clustering Step: K-Means
K = 10, 100, 500, 1000
Generated clusters as Voronoi polygons
➔ more splits in the densely populated areas
10 clusters 100 clusters 500 clusters 1000 clusters
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Which areas are the most affected?
● Prioritize generated areas by their potential of
being affected
Prioritization schemes by area a:
1. By total #tweets: baseline
2. By flood-related #tweets
3. By Signal-to-Noise Ratio:score(a) =
#flood-related tweets in a
#tweets in a
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Visualization of top-100 most affected
areas
1. total #tweets 2. flood-related #tweets
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
add map
with SNR
3. SNR
Top 100 for K-Means (K=500)
RQ2: Identification Effectiveness
1. Likert Scale [1-5]: to specify
the degree that an area has
been affected
a. 1 = “normal levels of rainfall”
b. 5 = “completely flooded”
2. Running Average Likert:
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Ground Truth
- MetOffice
add
map
with
SNR
3. SNR
Results
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Results (k = 100)
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
● Baseline < Flood, SNR
● Flood ~ SNR
Results (k = 500)
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
● Baseline << Flood < SNR
● #tweets is not a good proxy
● #flood-related tweets is a better one
Results (k = 1000)
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
● SNR the best metric (especially top20)
● how many users talk about the specific event
RQ3: Similarly affected areas
Identify areas with similar behavior on a temporal
aspect, in the way that the flooding event was
perceived by Twitter users
Underlying connection:
● population level, e.g., similar posting patterns
● other variable, e.g., a nearby river
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Second Level Clustering: Attributes
Features that show the temporal evolution of the
event in an area
1. Number of tweets in day d, count(d)
2. Ratio of day d from area a,
ratio(d) = count(d) / Σ count(d’), forall d’
3. Speed of day d, speed(d) = ratio(d)-ratio(d-1)
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Second Level Clustering:
Areas from 2 clusters
: cluster 1
: cluster 2
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
● Speed feature
● Red cluster: Scotland,
Liverpool and Ireland,
mostly unaffected
● Purple cluster: Midlands,
affected
● Red speed decreases
● Purple speed increases
● Verification with historical
data
The INSIGHT project
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Detecting Events:
- censors on road
network
- censors on buses
- Twitter data
http://www.insight-ict.eu/
Intelligent Synthesis and Real-time Response
using Massive Streaming of Heterogeneous Data
Conclusions
● Analysis on Twitter data
○ emergencies, disaster management & relief
● Experimental analysis on floodings
○ establishment of “flood related lexicon”
○ division of the entire UK to affected areas
○ identification of flood-stricken areas with high accuracy
● Comparison with ground truth data
○ quality evaluation
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Future Work
● Collect more data of similar flooding events and
test our approach in larger datasets
○ generalize in other areas
○ test with larger timespan
● Develop online clustering approaches (1ier)
● To incorporate into the INSIGHT tool
Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
Thank you!
Acknowledgements:
MMD - Mining Mobility DataINSIGHT - Intelligent
Synthesis and Real-time
Response using Massive
Streaming of Heterogeneous
Data

More Related Content

Viewers also liked

Mozambique Floods 2000
Mozambique Floods 2000Mozambique Floods 2000
Mozambique Floods 2000
missm
 
Bangladesh Flooding
Bangladesh FloodingBangladesh Flooding
Bangladesh Flooding
samuel valko
 

Viewers also liked (20)

Kashmir floods 2014
Kashmir floods 2014Kashmir floods 2014
Kashmir floods 2014
 
Jammu and Kashmir - Then and Now
Jammu and Kashmir - Then and NowJammu and Kashmir - Then and Now
Jammu and Kashmir - Then and Now
 
Data Mining on Twitter
Data Mining on TwitterData Mining on Twitter
Data Mining on Twitter
 
How has England been affected by Floods?
How has England been affected by Floods?How has England been affected by Floods?
How has England been affected by Floods?
 
Flooding in bangladesh
Flooding in bangladeshFlooding in bangladesh
Flooding in bangladesh
 
Mozambique Floods 2000
Mozambique Floods 2000Mozambique Floods 2000
Mozambique Floods 2000
 
Project soli
Project soliProject soli
Project soli
 
Project soli
Project  soliProject  soli
Project soli
 
Bangladesh Flooding
Bangladesh FloodingBangladesh Flooding
Bangladesh Flooding
 
Urban Flood Risk from Flood Plains to Floor Drains
Urban Flood Risk from Flood Plains to Floor DrainsUrban Flood Risk from Flood Plains to Floor Drains
Urban Flood Risk from Flood Plains to Floor Drains
 
Google project soli report
Google project soli reportGoogle project soli report
Google project soli report
 
Project Soli by Google ATAP
Project Soli by Google ATAPProject Soli by Google ATAP
Project Soli by Google ATAP
 
Bangladesh floods
Bangladesh floodsBangladesh floods
Bangladesh floods
 
Urban Flooding causes and Management Dr.Reddy
Urban Flooding causes and Management Dr.ReddyUrban Flooding causes and Management Dr.Reddy
Urban Flooding causes and Management Dr.Reddy
 
Chennai flood 2015, The Disaster, The Challenges and The Solutions
Chennai flood 2015, The Disaster, The Challenges and The SolutionsChennai flood 2015, The Disaster, The Challenges and The Solutions
Chennai flood 2015, The Disaster, The Challenges and The Solutions
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
Mumbai floods 2005
Mumbai floods 2005Mumbai floods 2005
Mumbai floods 2005
 
project Soli ppt
project Soli pptproject Soli ppt
project Soli ppt
 
URBAN FLOODS an opportunity for water conservation
URBAN FLOODS an opportunity for water conservationURBAN FLOODS an opportunity for water conservation
URBAN FLOODS an opportunity for water conservation
 
Flood Management in Bangladesh
Flood Management in Bangladesh Flood Management in Bangladesh
Flood Management in Bangladesh
 

Similar to Twitter floods when it rains: A case study of the UK floods in early 2014

Social media use in the queensland floods
Social media use in the queensland floodsSocial media use in the queensland floods
Social media use in the queensland floods
Eidos Australia
 
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
Deltares
 
Final PresentationRodent Baiting
Final PresentationRodent BaitingFinal PresentationRodent Baiting
Final PresentationRodent Baiting
Sanchit Khandelwal
 
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Ioannis Katakis
 

Similar to Twitter floods when it rains: A case study of the UK floods in early 2014 (20)

Social media use in the queensland floods
Social media use in the queensland floodsSocial media use in the queensland floods
Social media use in the queensland floods
 
Social Media Use in the Queensland Floods
Social Media Use in the Queensland FloodsSocial Media Use in the Queensland Floods
Social Media Use in the Queensland Floods
 
IAHR 2015 - Managing flood risk in coastal cities through an integrated model...
IAHR 2015 - Managing flood risk in coastal cities through an integrated model...IAHR 2015 - Managing flood risk in coastal cities through an integrated model...
IAHR 2015 - Managing flood risk in coastal cities through an integrated model...
 
Twitter & mobility disruptions
Twitter & mobility disruptionsTwitter & mobility disruptions
Twitter & mobility disruptions
 
New York City Case Study
New York City Case StudyNew York City Case Study
New York City Case Study
 
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
DSD-INT 2014 - Delft-FEWS Users Meeting - Extending FEWS with Floodtags - soc...
 
11. Forecasting Tools - Sandra Mancini & Michael Jones
11. Forecasting Tools - Sandra Mancini & Michael Jones11. Forecasting Tools - Sandra Mancini & Michael Jones
11. Forecasting Tools - Sandra Mancini & Michael Jones
 
Final PresentationRodent Baiting
Final PresentationRodent BaitingFinal PresentationRodent Baiting
Final PresentationRodent Baiting
 
Results, calculations, and assumptions of the resilience.io WASH sector in GA...
Results, calculations, and assumptions of the resilience.io WASH sector in GA...Results, calculations, and assumptions of the resilience.io WASH sector in GA...
Results, calculations, and assumptions of the resilience.io WASH sector in GA...
 
Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)
 
Public crowd-sensing of heat-waves by social media data
Public crowd-sensing of heat-waves by social media dataPublic crowd-sensing of heat-waves by social media data
Public crowd-sensing of heat-waves by social media data
 
ENACTS: A New Technical Innovation to Meet Climate Information Needs
ENACTS: A New Technical Innovation to Meet Climate Information NeedsENACTS: A New Technical Innovation to Meet Climate Information Needs
ENACTS: A New Technical Innovation to Meet Climate Information Needs
 
DSD-INT 2020 Beyond the Forecast - Communicating Flood - Risk in the Toronto ...
DSD-INT 2020 Beyond the Forecast - Communicating Flood - Risk in the Toronto ...DSD-INT 2020 Beyond the Forecast - Communicating Flood - Risk in the Toronto ...
DSD-INT 2020 Beyond the Forecast - Communicating Flood - Risk in the Toronto ...
 
The Future of Water in New York
The Future of Water in New YorkThe Future of Water in New York
The Future of Water in New York
 
4 intersucho zalud
4 intersucho zalud4 intersucho zalud
4 intersucho zalud
 
Smart Water nella Città del Futuro - Michele Romano: Event Recognition System...
Smart Water nella Città del Futuro - Michele Romano: Event Recognition System...Smart Water nella Città del Futuro - Michele Romano: Event Recognition System...
Smart Water nella Città del Futuro - Michele Romano: Event Recognition System...
 
Local Memory Project
Local Memory ProjectLocal Memory Project
Local Memory Project
 
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
 
DSD-INT 2018 Improvement of hazard and damage estimations from tropical cyclo...
DSD-INT 2018 Improvement of hazard and damage estimations from tropical cyclo...DSD-INT 2018 Improvement of hazard and damage estimations from tropical cyclo...
DSD-INT 2018 Improvement of hazard and damage estimations from tropical cyclo...
 
Smart Water nella Città del Futuro - Anders Lynggaard-Jensen: Real time sew...
 Smart Water nella Città del Futuro -  Anders Lynggaard-Jensen: Real time sew... Smart Water nella Città del Futuro -  Anders Lynggaard-Jensen: Real time sew...
Smart Water nella Città del Futuro - Anders Lynggaard-Jensen: Real time sew...
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Recently uploaded (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 

Twitter floods when it rains: A case study of the UK floods in early 2014

  • 1. Twitter floods when it rains: A case study of the UK floods in early 2014 Antonia Saravanou University of Athens Dimitrios Gunopulos University of Athens George Valkanas Stevens Institute of Technology Gennady Andrienko Fraunhofer Institute IAIS, DE Social Web for Disaster Management (WWW workshop 2015) Florence, Italy National and Kapodistrian University of Athens
  • 2. Outline ● Motivation ● Research Questions ● Methodology ○ Data Collection ○ Filtering Step: Flood-Related Lexicon ○ Clustering Step ○ Second Level Clustering ● Results Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 3. Motivation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 4. Motivation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 5. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Motivation
  • 6. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Motivation ● Identify early the event and the affected area ● Monitor the evolution of the event ● Inform users for emergencies ● Resource allocation ● Ιmmediate notification of special incident management units
  • 7. Research questions RQ1: How can we identify the areas that have been hit the most by an event? - where to dispatch emergency response units RQ2: How effective can we be in identifying these areas? - robust and effective techniques to base decisions RQ3: Can we identify areas that have been stricken by the event in a similar manner? - transfer the same techniques to similar affected areas Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 8. Data Collection ● Twitter - custom crawler ○ Streaming API ● Collection of public tweets ○ Bounding box that covers UK ○ Extract only tweets with GPS ● 13-17 January 2014 ● > 2.3 million geotagged tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 9. Flood Related Tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ? Entire Dataset gps location within UK b.b. Flood - Related Tweets
  • 10. Filtering Step: Custom Flood-Related lexicon rain, flood, weather, storm, showers, ... 13 tokens 1546 tokens 456 tokens tokens that contain at least one word of the initial seed set as a substring only related tokens to the event initial seed set Entire Dataset manually review each keyword and discard non- related false positives e.g. brain, train, e.t.c. e.g. raining, floods, #ukweather, e.t.c. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 11. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Original vs. Flood Related Lexicon ● Manual cleaning process is necessary ● Only 4 keywords flood- related in the original lexicon ● Flood Lexicon is ⅓ of the Original - Slow process + One time at the beginning Top-10 most frequent keywords
  • 12. Flood Related Tweets exact match with at least one keyword from our flood related lexicon Entire Dataset Flood - Related Tweets Flood Related Lexicon Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 13. ● Why we care ○ where to dispatch emergency response units ○ notify citizens about areas with problems caused by floods ● From GPS to areas ○ Perform spatial clustering using the GPS coordinates ■ Convert GPS coordinates to Cartesian ones Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 RQ1: Identifying flood-affected areas
  • 14. Clustering Step: K-Means K = 10, 100, 500, 1000 Generated clusters as Voronoi polygons ➔ more splits in the densely populated areas 10 clusters 100 clusters 500 clusters 1000 clusters Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 15. Which areas are the most affected? ● Prioritize generated areas by their potential of being affected Prioritization schemes by area a: 1. By total #tweets: baseline 2. By flood-related #tweets 3. By Signal-to-Noise Ratio:score(a) = #flood-related tweets in a #tweets in a Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 16. Visualization of top-100 most affected areas 1. total #tweets 2. flood-related #tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 add map with SNR 3. SNR Top 100 for K-Means (K=500)
  • 17. RQ2: Identification Effectiveness 1. Likert Scale [1-5]: to specify the degree that an area has been affected a. 1 = “normal levels of rainfall” b. 5 = “completely flooded” 2. Running Average Likert: Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Ground Truth - MetOffice add map with SNR 3. SNR
  • 18. Results Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 19. Results (k = 100) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Baseline < Flood, SNR ● Flood ~ SNR
  • 20. Results (k = 500) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Baseline << Flood < SNR ● #tweets is not a good proxy ● #flood-related tweets is a better one
  • 21. Results (k = 1000) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● SNR the best metric (especially top20) ● how many users talk about the specific event
  • 22. RQ3: Similarly affected areas Identify areas with similar behavior on a temporal aspect, in the way that the flooding event was perceived by Twitter users Underlying connection: ● population level, e.g., similar posting patterns ● other variable, e.g., a nearby river Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 23. Second Level Clustering: Attributes Features that show the temporal evolution of the event in an area 1. Number of tweets in day d, count(d) 2. Ratio of day d from area a, ratio(d) = count(d) / Σ count(d’), forall d’ 3. Speed of day d, speed(d) = ratio(d)-ratio(d-1) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 24. Second Level Clustering: Areas from 2 clusters : cluster 1 : cluster 2 Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Speed feature ● Red cluster: Scotland, Liverpool and Ireland, mostly unaffected ● Purple cluster: Midlands, affected ● Red speed decreases ● Purple speed increases ● Verification with historical data
  • 25. The INSIGHT project Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Detecting Events: - censors on road network - censors on buses - Twitter data http://www.insight-ict.eu/ Intelligent Synthesis and Real-time Response using Massive Streaming of Heterogeneous Data
  • 26. Conclusions ● Analysis on Twitter data ○ emergencies, disaster management & relief ● Experimental analysis on floodings ○ establishment of “flood related lexicon” ○ division of the entire UK to affected areas ○ identification of flood-stricken areas with high accuracy ● Comparison with ground truth data ○ quality evaluation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 27. Future Work ● Collect more data of similar flooding events and test our approach in larger datasets ○ generalize in other areas ○ test with larger timespan ● Develop online clustering approaches (1ier) ● To incorporate into the INSIGHT tool Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  • 28. Thank you! Acknowledgements: MMD - Mining Mobility DataINSIGHT - Intelligent Synthesis and Real-time Response using Massive Streaming of Heterogeneous Data