This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.
A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.
1. A framework for knowledge extraction, linked data and semantic search.
What do we want computers to do for us?
2. We have data.
• From 2005 to 2020, the digital universe will grow in
size by a factor of 300, from 30 exabytes to 40 trillion
gigabyte (40 ZB).
• From now until 2020, the digital universe will about
double every two years.
• Volumes of data are projected to reach 5.247 GB per
person with emerging economies playing an
increasingly important role (producing two thirds of
the world data by the end of this decade).
• Only 0.5% of this data is used today for analysis.
• The amount of information individuals create
themselves - writing documents, taking pictures,
recording audio - is far less than the information
being created about them in the digital universe.
[IDC I V I E W, 2012]
3. What do we want computers to do for us?
Text
Images/Video
Audio
"language": "de"
Categorisation,
Summarisation,
Search,
Question/Answer,
…
"label": "outdoor"
Suggest tags,
Image search,
…
Automatic Speech
Recognition,
Speaker identification,
Music classification,
…
[Andrew NG, 2011]We want computers to process data.
4. Natural Language
Processing
We use it everyday.
[J U RAFSKY & MARTIN, 2008]
a theoretically motivated range of
computational techniques for
analysing naturally occurring
text/speech for the purpose of
achieving human-like language
processing.
5. Features extraction in text/speech.
Levels of knowledge encoding in language data.
INPUT
Morphologic
Syntactic
Semantic
FEATURES
NLP
{
Parser
Lexical DB
Stemming
AnaphoraPos Tagging
NER
6. TEXT
NLP
FEATURES
WISDOM
What do we want computers to do with a text?
STRUCTURED
DATA
CONTEXT
We want computers to make sense of unstructured data.
KNOWLEDGE
{
Semantic Lifting
7. TEXT WISDOM
A practical example.
CONTEXT
Combining Semantic Web technologies with NLP technologies.
KNOWLEDGE
Lucoli
"label":
"Lucoli"
"values":
["13.338889"],
"predicate": "http://
www.w3.org/2003/01/
geo/wgs84_pos#long"
"values":
["42.29194444444445"],
"predicate":
"http://www.w3.org/2003/01/geo/
wgs84_pos#lat"
"values": [
!
!
!
!
!
],
"predicate": "http://
xmlns.com/foaf/0.1/
depiction"
About 20 minutes
car drive from L’Aquila.
…
8. How we started.
Building an open platform for
knowledge extraction, linked data
and semantic search.
!
Delivering the world’s most
advanced open source
content analysis and making
linked data publishing and
information discovery accessible
to anyone.
9. • Incorporating requirements from industry partners:
• CMS companies
• System integrators
• Tool providers
• Inheriting 6 years of IP with R&D on:
• Semantic Information Management and
Publishing (RDF and Semantic Web Technology)
• Semantic Processing
• Conceptual Search
10. CONTENT ANALYSIS
LINKED DATA PUBLISHING
1
3
Linked Data Cloud
Technology Stack
Text
Legacy Data
Audio/Images
(under development)
CONTENT DISCOVERY2
• Enterprise
Linked Data
• Content
Enhancement
• Semantic Search
11. • Semantic enhancement process chaining
• Multiple NLP features extraction facilities
• Multiple language support
• Content classification and sentiment analysis
• Graduated as Top Level Project of the Apache
Foundation in September 2012
STANBOL.APACHE.ORG
A Toolbox for Semantic Processing.
12. SOLR.APACHE.ORG
The Highly Scalable Search Server.
• Based on Apache Lucene
• Various language specific processing procedures
• Highly scalable (Solr cloud) and highly configurable
• Ultra fast indexing/searching, indexes can be merged/
optimised
• Semantic Search available with an easy-to-install
Redlink Plugin
13. DEV.REDLINK.IO/PLUGINS/SOLR
Adding Semantic Search to Apache Solr.
• Boost your existing Apache Solr installation with
semantic enhancements via Redlink Content Analysis
• Watch the screencast
• Learn more• Customising the semantic enhancements
with user-created vocabularies and Redlink NLP extraction
facilities
14. Managing vocabularies.
Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data
• Build your first app
• Learn more
• Redlink allows users to create their own Linked Data server for
managing vocabularies or publishing datasets for Linked (Open)
Data projects
• Datasets managed with Redlink can
be made available for content
analysis and linking
• Datasets can be either private (Linked
Enterprise Data) or public (Linked
Open Data)
!
• Public Datasets such as DBpedia, Freebase and
GeoNames are available for de-referencing and interlinking
15. • Read-Write Linked Data
• Triple store with transactions, versioning
and rule-based reasoning
• SPARQL and LDPath query languages
• Transparent Linked Data Caching
• Graduated as Top Level Project of the Apache
Foundation in November 2013
MARMOTTA.APACHE.ORG
The Open Platform for Linked Data.
16. An Open Linked Data Project
for Tourism in Salzburg
• Cross platform publishing as more travellers massively begin
using mobile devices
• Multiple Web CMSs (both proprietary and open source) to be
managed simultaneously
• Costly manual curation and interlinking
• Increasing demand for content syndication (from big players like
foursquare as well as from local application developers)
• Need for better SEO especially for events and sites (too regional to
be understood by commercial search engines)
17. Remixing existing content and creating new value.
A magazine
running on WordPress
An online
booking system
freshly updated content
on locations and events
a database containing:
events, facilities, accommodations, …
Everything we know already
from Wikipedia
the World’s largest
encyclopedia
Using Linked Data to make sense of the information
18. Linked Data Publishing
• Data from the online booking system (Feratel) is enriched and transformed
in triples using identified vocabularies and ontologies
• Triples are stored in the Redlink triple store in a dedicated context
• RDF data and SPARQL end-points are published to the data website
(data.salzburgerland.com) running CKAN as Linked Open Data
• CKAN makes the data accessibile to third parties in various formats by
querying Redlink
21. Using LODE: An ontology for
Linking Open Descriptions of
Events
Adding the relationships
between things
22. Florianifeier
with RDF different data sources are integrated to provide
robot-friendly information that describe real world things
<subject><predicate><object>
23. Semantic Lifting and
Linked Data Principles
• A “word” or “phrase” becomes an
identifier used to denote
“things” (named entities) existing in
the real world
1.Real-world thing are
unambiguously represented with
web addresses (URI)
2.By accessing these web addresses
(HTTP-URI) usable data is sent in
return using standard formats (RDF,
SPARQL)
3.This data includes links to other
data so that people can discover
more things
"label":"May",
"reference":
“http://dbpedia.org/
resource/May”
!
Type: Thing
"values"["13.7446"],"predicate": "http://
www.w3.org/2003/01/geo/wgs84_pos#long"
values"["47.10222"],"predicate": “http://
www.w3.org/2003/01/geo/wgs84_pos#lat”
"reference":
“http://dbpedia.org/page/Unternberg”
!
Type: Place
“label":"Florianifeier",
"reference":“http://
rdf.salzburgerland.com/
events/event/dea7fde1-5583-4002-97eb-007
4a182fa9c.html”!
Type: Event
Tim Berners-Lee.
LANGUAGE EVENT THING LOCATION
ENGLISH FLORIANIFEIER MAY UNTERNBERG
[Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe]
“This May don't miss the
Florianifeier, we'll have fun
as usual in Unternberg”
24. Dynamic Semantic Publishing with ordLiftW
• Data from the Redlink triple store is made available for content enrichment
and can be edited using WordLift, a semantic plugin for WordPress.
25. Data Curation
• Using Linked Data the Web
becomes my new CMS
• information is automatically
imported in WordPress
• posts are connected with
entities
• properties for each entity can
be edited using WordPress
• any change is automatically
reflected in the triple-store and
re-published as Open Data
Using Linked Data and WordLift the Web becomes your new CMS.
editing a blog post
editing an entity
26. Web Search
19.900 results
no answer
Touristic applications attempting to discover events in Salzburgerland.
“Which events occur in May in Lungau?”
Linked Open Data
Query
5 result
5 answer
Unternberg is a village in the area of Lungauon google.at!!
27. Better SEO using
Semantic Markup
Florianifeier
Unternberg
• Using schema.org the data
from the triple-store is added
to the pages as semantic
markup
• Search engines can finally
“recognise” entities that were
previously unknown (i.e.
Florianifeier)
ordLiftW
28. •Media in cross-media context, allowing to
analyse media resources as well as
connected content, including video, images,
audio, text, link structure and metadata;
•Investigate cross-media analysis along the
complete, distributed analysis chain, namely
extraction, metadata publishing, querying
and recommendations;
•Contribute its main software development
results as Open Source components to two
established Apache projects, Apache
Marmotta and Apache Stanbol, simplifying
the use of the technology in industrial
products.
What do we want computers to do with Media?
MICO-PROJECT.EU
29. “Show me the tempo-regional fragments where
Lewis Jones is right beside Connor Macfarlane?”
MICO-PROJECT.EU
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-
mm/functions#>
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>
!
SELECT (mm:boundingBox(?l1,?l2) AS ?left_right)
WHERE {
?f1 ma:locator ?l1; dct:subject ?p1.
?p1 foaf:name "Lewis Jones".
?f2 ma:locator ?l2; dct:subject ?p2.
?p2 foaf:name "Connor Macfarlane".
!
FILTER mm:rightBeside(?l1,?l2)
FILTER mm:temporalOverlaps(?l1,?l2)
}
We want computers to process media.
31. CREDITS
ANDREW NG, 2011
J U RAFSKY & MARTIN, 2008
Webscale IA using Linked Open Data on slideshare by reduxd
LODE linking open descriptions of events aswc 2009 on
slideshare by Raphael Troncy
Semantic SEO in the post-Hummingbird era on slideshare by Kim
Renberg and Andrea Volpini
Querying of metadata, media content and context in MICO a
demo by Thomas Kurz
this presentation is the result of many inspiring ideas and amazing work from
other people and here is the list:
any idea, graphics or meme belonging to us is available
for sharing, copying and re-mixing under
creative commons license 3.0