This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
4. Commercial Company
Database
(e.g. D&B)
Link data!
Reveal more!
Social Media
News
Wikipedia
Private
• Link diverse data in a
Knowledge Graph
• Analyze News and
Social Content
• Extract facts and
link content to data
• Interpret data in context
of big linked data
6. Relation Discovery Case
• Find suspicious
relationships like:
− Company in USA
− Controls another
company in USA
− Through a company in
an off-shore zone
• Show news
relevant to these
companies
7. Linking News to Big Knowledge Graphs
• The DSP platform
links text to
knowledge
graphs
• One can navigate
from news to
concepts,
entities and
topics, and from
there to other
news
Try it at http://now.ontotext.com
8. Semantic Media Monitoring
For each
entity:
• popularity
trends
• relevant
news
• related
entities
• knowledge
graph
information
Try it at http://now.ontotext.com
10. OntoRefine: Data Transformation to RDF
• Based on OpenRefine and integrated in the GraphDB Workbench
• Allows converting tabular data into RDF
− Supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet
− Easily filter your data, edit its inconsistencies
− View the cleaned data as RDF
• Exposes a GraphDB SPARQL endpoint
− Transform your data using SPIN functions
− Import your data straight into a GraphDB repository
The Power of Semantic Technologies to Explore Linked Open Data #10
11. FactForge: Open data and
news about people and
organizations
http://factforge.net
12. FactForge: Data Integration
DBpedia (the English version) 496M
Geonames (all geographic features on Earth) 150M
owl:sameAs links between DBpedia and Geonames 471K
Company registry data (GLEI) 3M
Panama Papers DB (#LinkedLeaks) 20M
Other datasets and ontologies: WordNet, WorldFacts, FIBO
News metadata (2000 articles/day enriched by NOW) > 600M
Total size (1,313M explicit + 327M inferred statements) 1 640М
13. News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform
− News stream from Google
− Automatically generated as part of the NOW.ontotext.com semantic news showcase
•News stream from Google since Feb 2015, about 50k news/month
− 700 000 news articles
− ~70 tags (annotations) per news article, 43M tags all together
− 400 000 unique entities mentioned
• Tags link text mentions of concepts to the knowledge graph
− Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
15. Offshore control example
• Query: Find companies, which control other companies in the same country,
through company in an off-shore zone
• How it works:
• Establish control-relationship
• Establish a company-country mapping
• Establish an “off-shore criteria”
• SPARQL it
16. Off-shore company control example
SELECT *
FROM onto:disable-sameAs
WHERE {
?c1 fibo-fnd-rel-rel:controls ?c2 .
?c2 fibo-fnd-rel-rel:controls ?c3 .
?c1 ff-map:orgCountry ?c1_country .
?c2 ff-map:orgCountry ?c2_country .
?c3 ff-map:orgCountry ?c1_country .
FILTER (?c1_country != ?c2_country)
?c2_country ff-map:hasOffshoreProvisions true .
}
18. News popularity ranking of companies
• Rankings can be customized by specifying a geographic region, news category
(e.g., business, sport, lifestyle, etc.) and time period.
• Unique features:
− It is based on live streaming news
− Tracks also mentions of subsidiaries
• Rank uses the industry sectors of DBPedia with several refinements
− About 40 top-industry sectors
− Sectors are linked in a hierarchical taxonomy (all together 251 sectors)
− Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)
19. Rank uses NOW, FactForge and GraphDB
• This ranking service is entirely based on FactForge
− FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts, which is
loaded in GraphDB
− GraphDB is a semantic graph database engine of Ontotext
− Unlike FactForge, this service is aimed at non-technical users as it does not require any knowledge of
SPARQL or other technology.
− But it allows users to see the SPARQL query for each ranking and to customize it