SlideShare uma empresa Scribd logo
1 de 22
Situational Awareness from
Social Media
V I S T O L O G Y, I N C
BRIAN ULICNY
JAKUB MOSKAL
M I E C Z Y S L AW M . ( M I T C H ) K O K A R ( N O R T H E A S T E R N )
SEMANTIC TECHNOLOGIES FOR INTELLIGENCE DEFENSE
AND SECURITY (STIDS 2013)
GEORGE MASON UNIVERSITY

NOVEMBER 13, 2013
About VIStology, Inc.
 Past and present R&D contracts
 ONR, Army RDECOM , AFOSR, AFRL, DARPA, MDA
 Professional Collaborations
 Northeastern University, W3C, Lockheed
Martin, OMG, Referentia
Systems, BBN, Raytheon, Vulcan, Inc.
 Products & Services
 BaseVISor: highly efficient inference engine
 ConsVISor: consistency checker of ontologies
 PolVIsor: policy-compliant information exchange
 HADRian: next-generation smart COP for HA/DR ops
 Ontology Engineering and Systems Design
 Areas of Expertise
 Level 2+ Information Fusion, Situation Awareness,
Formal Reasoning Systems, Artificial Intelligence,
Ontology Engineering, Object-Oriented Design

2
VIStology HADRian
 HADRian is a next-generation COP for HA/DR ops
 HADRian enables an Incident Commander to
 Find
 Filter
 Geocode

and

 Display

the information that he or she needs to make the best
decisions about the situation on the basis of semantic
annotation of information repositories.
 Repositories may contain text
reports, photos, videos, tracks (KML), chemical
plumes, 3D building models (SketchUp), etc.

4
HADrian: Concept of Operations:
Find, Filter, Map and Display Info Re:Disasters
 COP Operator annotates repositories, using ontology
 Info from new repositories can be potentially integrated by annotating
metadata
 COP operator formulates High Level Query to describes info

needs for current operation
 System infers repositories that may contain relevant info by
reasoning over metadata that the repository has been
annotated with.



Information remains in place until need (not ETL);
Users upload data wherever they want; ingested as needed

 System issues appropriate low level query to repositories


Repositories contain disparate data in disparate formats

 System filters out irrelevant data
 System aggregates and displays data in Google Earth COP
 Users interact with data in COP
 COP Operator can send information from COP to responder phone
5
Dataflow for HADRian Situation Awareness
from Social Media

Twitter
Status Updates

Civilians/Media/
Others

Ontologybased
annotation
of metadata

Identifies relevant
repositories based on
metadata and high
level queries.
Here, only one
repository is relevant

Processes content of relevant
tweets

HADRian
Formulates high
level queries
based on
Commander
needs:
Here, “Where are
unexploded or
additional bombs
reported after
Boston
Marathon?”

Uses COP to
make decisions.

COP Operator
Identifies information needs.
Conveys to COP Operator

Field Operators

1. Identifies tweets that report
an additional or unexploded
bomb.
2. Identifies where the bomb is
reported
3. Geolocates the reported
location
4. Anonymizes the reporter‟s
identity
5. Presents the information in
HADRian COP

Commander
6
Current State of the Art: Fixed Feeds, On/Off

VIStology, Inc
High Level Query

“Show me Locations mentioned in Tweets
about Boston Marathon and Unexploded
Bombs”
Repository Annotation
Filter Processing Chain

The bulk of 0.5 million tweets were collected on April 15, 2013 in 3 hours after
the Boston Marathon bombing and stored in a giant CSV file. After this file is
determined to be relevant:
1) Using metadata description of schema and rules, the 0.5m tweets were filtered to only those
that contained mentions of unexploded or undetonated bombs, and converted into an OWL
representation. This file is identified as being relevant to the HLQ “find locations mentioned
in tweets about Boston Marathon and unexploded bombs at Boston Marathon on April
15, 2013”
2) The OWL-formatted tweets were loaded into BaseVISor along with rules. These rules
extract information about locations mentioned in the tweets - we call them "location
phrases", for instance: "jfk", "jfk library", "kennedy library", "#boston", etc.
Location Phrases

Locations referenced in tweet; NOT location of user, although that can help disambiguate
INFO Initializing BaseVISor..
INFO There are 26679 asserted facts in the knowledge
base.
INFO Initialization complete. Running inference...
INFO There is a total of 44414 facts in the knowledgebase
after running the inference.
INFO Done.
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO

'mandarin hotel' -> 3
'bostonmarathon' -> 798
'bpd commissioner ed davis' -> 1
'back bay' -> 1 …
'jfk library' -> 311
'jfk' -> 73
'jfklibrary' -> 2
'jkf library' -> 1 …
'copley place' -> 8 …
„st. ignatius catholic church‟ -> 47
Mapping Location Phrases to Placemarks
For each extracted location phrase a rule with a special procedural
attachment is fired. This attachment, takes the location phrase and
tries to locate the place on the map using the following algorithm:
 1. (Places API, Exact) Lookup Google Maps Places API
(https://developers.google.com/places) and see if it returns exact
match for the phrase. If it does, use its Lat/Lon to assign the tweet
to that location, otherwise, move to next step.
 2. (Geocoding API) Lookup Google Geocoding API
(https://developers.google.com/maps/documentation/geocoding)
and see if it returns exact match for the phrase, if it does, use its
Lat/Lon to assign the tweet to that location, otherwise move to the
next step.
 3. (Places API, First on the list) Pick the first result in the Google
Maps Places API result (from 3a), if there were no results, ignore
the location.
3 tweets mention Mandarin Hotel, 2 Copley
Place, 1 Back Bay Station,…
Information in Placemark
 We indicate the source of









the location inside the
placemark. (Here, Places
API first result)
Corresponding location
phrases
Photo
Number of tweets (1158)
Place type
(here, “library, museum,
establishment”)
Sample Tweets (not
shown)
Placemark Size and Color
The number next to the
placemark's name indicates
the number of tweets that
used on of the location
phrases mapping to this
location. The higher the
number, the more tweets
were talking about the same
spot.
We emphasize this fact by
rendering polygons
underneath the placemarks the higher and darker the
color, the more frequently
mentioned was the location.
47 Tweets mention St Ignatius Church,
at Boston College
Relevant Tweet Retrieval (Finding) Evaluation
 Corpus: ~500K tweets
 Identified 7,748 tweets that were about additional or

unexploded bombs with a precision of 94.5%, based on
a random sample of 200 tweets identified as such.


That is, only 1.5% of the original corpus was identified as referring to
additional bombs, using our pattern matching.

 Based on a random sample of 236 tweets from the

original corpus, our recall (identification of tweets that
discussed additional bombs) was determined to be 50%.




That is, there were many more ways to refer to additional bombs
than our rules considered.
Thus, our F1 measure for accurately identifying tweets about
additional bombs was 65%.
Nevertheless, because of the volume of tweets, this did not affect the
results appreciably.
Location Phrase (Filtering) Evaluation
 Location phrases were identified purely by means of generic pattern







matching.
We did not use any list of known places. Nor did we include any
scenario-specific patterns.
The precision with which we identified location phrases was 95%.
That is, in 95% of the cases, when we identified a phrase as a
location phrase, it actually did refer to a location in that context.
Mistakes included temporal references and references to online
sites.
Our recall was only 51.3% if we counted uses of #BostonMarathon
that were locative. (We mishandled hashtags with camel case.)
If we ignore this hashtag, then our recall was 79.2%.


That is, of all the locations mentioned in tweets about additional bombs at the
Boston Marathon, we identified 79.2% percent of the locations that were
mentioned.

 Using the more lenient standard, our F1 measure for identifying

location phrases in the text was 86.3%.
18
GeoCoding Evaluation
 Our precision in associating tweets with known places via the

Google APIs was 97.2%.
 Our precision in assigning unique location phrases to
known places via Google APIs was 50%.
 That is, there were many location phrases that were repeated
several times that we assigned correctly to a known place, but
half of the unique phrase names that we extracted were not
assigned correctly.
 Ten location phrases that were extracted corresponded to no
known locations identified via the Google APIs.


These included location phrases such as “#jfklibrary” and “BPD
Commissioner Ed Davis”. The former is a phrase we would like to
geolocate, but lowercase hashtags which concatenate several words
are challenging. The latter is the sort of phrase that we expect would
be rejected as non-geolocatable.

19
Top 20 Locations by Frequency
Known Place
JFK Library

1158

Boston

629

Boston Marathon

325

St Ignatius Catholic Church

47

PD

29

Boylston

8

CNN

5

Copley Sq

4

Huntington Ave

4

Iraq

3

Mandarin Hotel

3

Dorchester

3

Marathon

3

US Intelligence

3

Copley Place

2

Boston PD

2

BBC

2

Cambridge

2

John

2

St James Street #Boston

20

#Tweets

2
Comparing Locations Mentioned in Media Blogs
 For three of these sites – Mass.
Location [Source]: (# of Tweets Identified with That Location

Boylston Street [Globe, CNN]: 8
Commonwealth Ave near Centre Street, Newton [Globe]: 0
Commonwealth Ave (Boston) [Globe]: 0
Copley Square [NYT]: 4
Harvard MBTA station [Globe]: 0
JFK Library [CNN, Globe, NYT]: 1158
Mass. General Hospital [Globe, NYT]: 0
(glass footbridge over) Huntington Ave near Copley place
[Globe]: 4
Tufts New England Medical Center [NYT]: 0
Washington Square, Brookline [NYT]: 0

21

General Hospital, Tufts Medical
Center and Washington Square,
Brookline -- reports of
unexploded bombs or suspicious
packages occurred after the end
of the tweet collection period, at
7:06 PM.
 Otherwise, the recall of our
system was good, missing only
the report of unexploded bombs
at the Harvard MBTA station.
 Media failed to report other
locations prominent to us (e.g. St
Ignatius Catholic Church)
 In our corpus, but missed, due to
capitalization.
Qualitative Evaluation
 On average, tweets reflecting same locations as

media blogs were produced 11 minutes prior to their
being reported on the sites mentioned.
 Thus, the tweet processing was more timely and
more comprehensive (included more locations) than
simply relying on a handful of news sites alone for
situational awareness

22
Conclusion
 We described a system for integrating disparate information sources into a











COP for Humanitarian Assistance/Disaster Relief operations by means of
semantic annotations and queries, using a common ontology.
We applied our technology to a repository of tweets collected in the
immediate aftermath of the Boston Marathon bombings in April, 2013, and
demonstrated that a ranked set of places could be incorporated into the
COP, showing the prominence of each site by tweet volume that was reported as
being the site of an additional unexploded bomb or bombs.
We evaluated the results formally and compared the results with the
situational awareness that could be gleaned only from mainstream media blogs
being updated at the same time.
On average, the automatic processing would have had access to locations from
tweets eleven minutes before these sites were mentioned on the mainstream
media blogs.
Additionally, sites that were prominent on Twitter (e.g. St Ignatius Church at
Boston College or the Mandarin Oriental Hotel in Boston) were not mentioned
on the news blog sites at all.
We believe that these results show that this approach is a promising one for
deriving situational awareness from social media going forward.

23

Mais conteúdo relacionado

Semelhante a Vistology STIDS 2013 Situation Awareness from Social Media

2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataLeon Derczynski
 
Recognition Markets and Visual Privacy
Recognition Markets and Visual PrivacyRecognition Markets and Visual Privacy
Recognition Markets and Visual PrivacyRyan Shaw
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessBen Busby
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...Jim McCusker
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web PagesMichael Nelson
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.infoJiwen Xin
 
An Ontology-based Technique for Online Profile Resolution
An Ontology-based Technique for Online Profile ResolutionAn Ontology-based Technique for Online Profile Resolution
An Ontology-based Technique for Online Profile Resolutionkcortis
 
The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalMark Wilkinson
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
 
Repository for data crawled from multiple social networks
Repository for data crawled from multiple social networksRepository for data crawled from multiple social networks
Repository for data crawled from multiple social networksConstantinos Christofilos
 

Semelhante a Vistology STIDS 2013 Situation Awareness from Social Media (20)

2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Location prediction
Location predictionLocation prediction
Location prediction
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
 
Recognition Markets and Visual Privacy
Recognition Markets and Visual PrivacyRecognition Markets and Visual Privacy
Recognition Markets and Visual Privacy
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...
Next Generation Cancer Data Discovery, Access, and Integration Using Prizms a...
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info
 
An Ontology-based Technique for Online Profile Resolution
An Ontology-based Technique for Online Profile ResolutionAn Ontology-based Technique for Online Profile Resolution
An Ontology-based Technique for Online Profile Resolution
 
The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its Personal
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
 
Repository for data crawled from multiple social networks
Repository for data crawled from multiple social networksRepository for data crawled from multiple social networks
Repository for data crawled from multiple social networks
 

Último

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Vistology STIDS 2013 Situation Awareness from Social Media

  • 1. Situational Awareness from Social Media V I S T O L O G Y, I N C BRIAN ULICNY JAKUB MOSKAL M I E C Z Y S L AW M . ( M I T C H ) K O K A R ( N O R T H E A S T E R N ) SEMANTIC TECHNOLOGIES FOR INTELLIGENCE DEFENSE AND SECURITY (STIDS 2013) GEORGE MASON UNIVERSITY NOVEMBER 13, 2013
  • 2. About VIStology, Inc.  Past and present R&D contracts  ONR, Army RDECOM , AFOSR, AFRL, DARPA, MDA  Professional Collaborations  Northeastern University, W3C, Lockheed Martin, OMG, Referentia Systems, BBN, Raytheon, Vulcan, Inc.  Products & Services  BaseVISor: highly efficient inference engine  ConsVISor: consistency checker of ontologies  PolVIsor: policy-compliant information exchange  HADRian: next-generation smart COP for HA/DR ops  Ontology Engineering and Systems Design  Areas of Expertise  Level 2+ Information Fusion, Situation Awareness, Formal Reasoning Systems, Artificial Intelligence, Ontology Engineering, Object-Oriented Design 2
  • 3. VIStology HADRian  HADRian is a next-generation COP for HA/DR ops  HADRian enables an Incident Commander to  Find  Filter  Geocode and  Display the information that he or she needs to make the best decisions about the situation on the basis of semantic annotation of information repositories.  Repositories may contain text reports, photos, videos, tracks (KML), chemical plumes, 3D building models (SketchUp), etc. 4
  • 4. HADrian: Concept of Operations: Find, Filter, Map and Display Info Re:Disasters  COP Operator annotates repositories, using ontology  Info from new repositories can be potentially integrated by annotating metadata  COP operator formulates High Level Query to describes info needs for current operation  System infers repositories that may contain relevant info by reasoning over metadata that the repository has been annotated with.   Information remains in place until need (not ETL); Users upload data wherever they want; ingested as needed  System issues appropriate low level query to repositories  Repositories contain disparate data in disparate formats  System filters out irrelevant data  System aggregates and displays data in Google Earth COP  Users interact with data in COP  COP Operator can send information from COP to responder phone 5
  • 5. Dataflow for HADRian Situation Awareness from Social Media Twitter Status Updates Civilians/Media/ Others Ontologybased annotation of metadata Identifies relevant repositories based on metadata and high level queries. Here, only one repository is relevant Processes content of relevant tweets HADRian Formulates high level queries based on Commander needs: Here, “Where are unexploded or additional bombs reported after Boston Marathon?” Uses COP to make decisions. COP Operator Identifies information needs. Conveys to COP Operator Field Operators 1. Identifies tweets that report an additional or unexploded bomb. 2. Identifies where the bomb is reported 3. Geolocates the reported location 4. Anonymizes the reporter‟s identity 5. Presents the information in HADRian COP Commander 6
  • 6. Current State of the Art: Fixed Feeds, On/Off VIStology, Inc
  • 7. High Level Query “Show me Locations mentioned in Tweets about Boston Marathon and Unexploded Bombs”
  • 9. Filter Processing Chain The bulk of 0.5 million tweets were collected on April 15, 2013 in 3 hours after the Boston Marathon bombing and stored in a giant CSV file. After this file is determined to be relevant: 1) Using metadata description of schema and rules, the 0.5m tweets were filtered to only those that contained mentions of unexploded or undetonated bombs, and converted into an OWL representation. This file is identified as being relevant to the HLQ “find locations mentioned in tweets about Boston Marathon and unexploded bombs at Boston Marathon on April 15, 2013” 2) The OWL-formatted tweets were loaded into BaseVISor along with rules. These rules extract information about locations mentioned in the tweets - we call them "location phrases", for instance: "jfk", "jfk library", "kennedy library", "#boston", etc.
  • 10. Location Phrases Locations referenced in tweet; NOT location of user, although that can help disambiguate INFO Initializing BaseVISor.. INFO There are 26679 asserted facts in the knowledge base. INFO Initialization complete. Running inference... INFO There is a total of 44414 facts in the knowledgebase after running the inference. INFO Done. INFO INFO INFO INFO INFO INFO INFO INFO INFO INFO 'mandarin hotel' -> 3 'bostonmarathon' -> 798 'bpd commissioner ed davis' -> 1 'back bay' -> 1 … 'jfk library' -> 311 'jfk' -> 73 'jfklibrary' -> 2 'jkf library' -> 1 … 'copley place' -> 8 … „st. ignatius catholic church‟ -> 47
  • 11. Mapping Location Phrases to Placemarks For each extracted location phrase a rule with a special procedural attachment is fired. This attachment, takes the location phrase and tries to locate the place on the map using the following algorithm:  1. (Places API, Exact) Lookup Google Maps Places API (https://developers.google.com/places) and see if it returns exact match for the phrase. If it does, use its Lat/Lon to assign the tweet to that location, otherwise, move to next step.  2. (Geocoding API) Lookup Google Geocoding API (https://developers.google.com/maps/documentation/geocoding) and see if it returns exact match for the phrase, if it does, use its Lat/Lon to assign the tweet to that location, otherwise move to the next step.  3. (Places API, First on the list) Pick the first result in the Google Maps Places API result (from 3a), if there were no results, ignore the location.
  • 12. 3 tweets mention Mandarin Hotel, 2 Copley Place, 1 Back Bay Station,…
  • 13. Information in Placemark  We indicate the source of      the location inside the placemark. (Here, Places API first result) Corresponding location phrases Photo Number of tweets (1158) Place type (here, “library, museum, establishment”) Sample Tweets (not shown)
  • 14. Placemark Size and Color The number next to the placemark's name indicates the number of tweets that used on of the location phrases mapping to this location. The higher the number, the more tweets were talking about the same spot. We emphasize this fact by rendering polygons underneath the placemarks the higher and darker the color, the more frequently mentioned was the location.
  • 15. 47 Tweets mention St Ignatius Church, at Boston College
  • 16. Relevant Tweet Retrieval (Finding) Evaluation  Corpus: ~500K tweets  Identified 7,748 tweets that were about additional or unexploded bombs with a precision of 94.5%, based on a random sample of 200 tweets identified as such.  That is, only 1.5% of the original corpus was identified as referring to additional bombs, using our pattern matching.  Based on a random sample of 236 tweets from the original corpus, our recall (identification of tweets that discussed additional bombs) was determined to be 50%.    That is, there were many more ways to refer to additional bombs than our rules considered. Thus, our F1 measure for accurately identifying tweets about additional bombs was 65%. Nevertheless, because of the volume of tweets, this did not affect the results appreciably.
  • 17. Location Phrase (Filtering) Evaluation  Location phrases were identified purely by means of generic pattern       matching. We did not use any list of known places. Nor did we include any scenario-specific patterns. The precision with which we identified location phrases was 95%. That is, in 95% of the cases, when we identified a phrase as a location phrase, it actually did refer to a location in that context. Mistakes included temporal references and references to online sites. Our recall was only 51.3% if we counted uses of #BostonMarathon that were locative. (We mishandled hashtags with camel case.) If we ignore this hashtag, then our recall was 79.2%.  That is, of all the locations mentioned in tweets about additional bombs at the Boston Marathon, we identified 79.2% percent of the locations that were mentioned.  Using the more lenient standard, our F1 measure for identifying location phrases in the text was 86.3%. 18
  • 18. GeoCoding Evaluation  Our precision in associating tweets with known places via the Google APIs was 97.2%.  Our precision in assigning unique location phrases to known places via Google APIs was 50%.  That is, there were many location phrases that were repeated several times that we assigned correctly to a known place, but half of the unique phrase names that we extracted were not assigned correctly.  Ten location phrases that were extracted corresponded to no known locations identified via the Google APIs.  These included location phrases such as “#jfklibrary” and “BPD Commissioner Ed Davis”. The former is a phrase we would like to geolocate, but lowercase hashtags which concatenate several words are challenging. The latter is the sort of phrase that we expect would be rejected as non-geolocatable. 19
  • 19. Top 20 Locations by Frequency Known Place JFK Library 1158 Boston 629 Boston Marathon 325 St Ignatius Catholic Church 47 PD 29 Boylston 8 CNN 5 Copley Sq 4 Huntington Ave 4 Iraq 3 Mandarin Hotel 3 Dorchester 3 Marathon 3 US Intelligence 3 Copley Place 2 Boston PD 2 BBC 2 Cambridge 2 John 2 St James Street #Boston 20 #Tweets 2
  • 20. Comparing Locations Mentioned in Media Blogs  For three of these sites – Mass. Location [Source]: (# of Tweets Identified with That Location Boylston Street [Globe, CNN]: 8 Commonwealth Ave near Centre Street, Newton [Globe]: 0 Commonwealth Ave (Boston) [Globe]: 0 Copley Square [NYT]: 4 Harvard MBTA station [Globe]: 0 JFK Library [CNN, Globe, NYT]: 1158 Mass. General Hospital [Globe, NYT]: 0 (glass footbridge over) Huntington Ave near Copley place [Globe]: 4 Tufts New England Medical Center [NYT]: 0 Washington Square, Brookline [NYT]: 0 21 General Hospital, Tufts Medical Center and Washington Square, Brookline -- reports of unexploded bombs or suspicious packages occurred after the end of the tweet collection period, at 7:06 PM.  Otherwise, the recall of our system was good, missing only the report of unexploded bombs at the Harvard MBTA station.  Media failed to report other locations prominent to us (e.g. St Ignatius Catholic Church)  In our corpus, but missed, due to capitalization.
  • 21. Qualitative Evaluation  On average, tweets reflecting same locations as media blogs were produced 11 minutes prior to their being reported on the sites mentioned.  Thus, the tweet processing was more timely and more comprehensive (included more locations) than simply relying on a handful of news sites alone for situational awareness 22
  • 22. Conclusion  We described a system for integrating disparate information sources into a      COP for Humanitarian Assistance/Disaster Relief operations by means of semantic annotations and queries, using a common ontology. We applied our technology to a repository of tweets collected in the immediate aftermath of the Boston Marathon bombings in April, 2013, and demonstrated that a ranked set of places could be incorporated into the COP, showing the prominence of each site by tweet volume that was reported as being the site of an additional unexploded bomb or bombs. We evaluated the results formally and compared the results with the situational awareness that could be gleaned only from mainstream media blogs being updated at the same time. On average, the automatic processing would have had access to locations from tweets eleven minutes before these sites were mentioned on the mainstream media blogs. Additionally, sites that were prominent on Twitter (e.g. St Ignatius Church at Boston College or the Mandarin Oriental Hotel in Boston) were not mentioned on the news blog sites at all. We believe that these results show that this approach is a promising one for deriving situational awareness from social media going forward. 23

Notas do Editor

  1. Distinguish from DISIS