SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
A framework for knowledge extraction, linked data and semantic search.
What do we want computers to do for us?
We have data.
• From 2005 to 2020, the digital universe will grow in
size by a factor of 300, from 30 exabytes to 40 trillion
gigabyte (40 ZB).

• From now until 2020, the digital universe will about
double every two years.

• Volumes of data are projected to reach 5.247 GB per
person with emerging economies playing an
increasingly important role (producing two thirds of
the world data by the end of this decade). 

• Only 0.5% of this data is used today for analysis. 

• The amount of information individuals create
themselves - writing documents, taking pictures,
recording audio - is far less than the information
being created about them in the digital universe.
[IDC I V I E W, 2012]
What do we want computers to do for us?
Text
Images/Video
Audio
"language": "de"
Categorisation,
Summarisation,
Search,
Question/Answer,
…
"label": "outdoor"
Suggest tags,
Image search,
…
Automatic Speech
Recognition,
Speaker identification,
Music classification,
…
[Andrew NG, 2011]We want computers to process data.
Natural Language
Processing
We use it everyday.
[J U RAFSKY & MARTIN, 2008]
a theoretically motivated range of
computational techniques for
analysing naturally occurring
text/speech for the purpose of
achieving human-like language
processing.
Features extraction in text/speech.
Levels of knowledge encoding in language data.
INPUT
Morphologic
Syntactic
Semantic
FEATURES
NLP
{
Parser
Lexical DB
Stemming
AnaphoraPos Tagging
NER
TEXT
NLP
FEATURES
WISDOM
What do we want computers to do with a text?
STRUCTURED
DATA
CONTEXT
We want computers to make sense of unstructured data.
KNOWLEDGE
{
Semantic Lifting
TEXT WISDOM
A practical example.
CONTEXT
Combining Semantic Web technologies with NLP technologies.
KNOWLEDGE
Lucoli
"label":
"Lucoli"
"values":
["13.338889"],
"predicate": "http://
www.w3.org/2003/01/
geo/wgs84_pos#long"
"values":
["42.29194444444445"],
"predicate":
"http://www.w3.org/2003/01/geo/
wgs84_pos#lat"
"values": [
!
!
!
!
!
],
"predicate": "http://
xmlns.com/foaf/0.1/
depiction"
About 20 minutes
car drive from L’Aquila.
…
How we started.
Building an open platform for
knowledge extraction, linked data
and semantic search.

!
Delivering the world’s most
advanced open source 

content analysis and making
linked data publishing and
information discovery accessible
to anyone.
• Incorporating requirements from industry partners:
• CMS companies
• System integrators
• Tool providers
• Inheriting 6 years of IP with R&D on:
• Semantic Information Management and
Publishing (RDF and Semantic Web Technology)
• Semantic Processing
• Conceptual Search
CONTENT ANALYSIS
LINKED DATA PUBLISHING
1
3
Linked Data Cloud
Technology Stack
Text
Legacy Data
Audio/Images
(under development)
CONTENT DISCOVERY2
• Enterprise
Linked Data
• Content
Enhancement
• Semantic Search
• Semantic enhancement process chaining
• Multiple NLP features extraction facilities
• Multiple language support
• Content classification and sentiment analysis
• Graduated as Top Level Project of the Apache
Foundation in September 2012
STANBOL.APACHE.ORG
A Toolbox for Semantic Processing.
SOLR.APACHE.ORG
The Highly Scalable Search Server.
• Based on Apache Lucene
• Various language specific processing procedures
• Highly scalable (Solr cloud) and highly configurable
• Ultra fast indexing/searching, indexes can be merged/
optimised
• Semantic Search available with an easy-to-install
Redlink Plugin
DEV.REDLINK.IO/PLUGINS/SOLR
Adding Semantic Search to Apache Solr.
• Boost your existing Apache Solr installation with
semantic enhancements via Redlink Content Analysis
• Watch the screencast
• Learn more• Customising the semantic enhancements
with user-created vocabularies and Redlink NLP extraction
facilities
Managing vocabularies.
Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data
• Build your first app
• Learn more
• Redlink allows users to create their own Linked Data server for
managing vocabularies or publishing datasets for Linked (Open)
Data projects
• Datasets managed with Redlink can
be made available for content
analysis and linking
• Datasets can be either private (Linked
Enterprise Data) or public (Linked
Open Data)
!
• Public Datasets such as DBpedia, Freebase and
GeoNames are available for de-referencing and interlinking
• Read-Write Linked Data
• Triple store with transactions, versioning
and rule-based reasoning
• SPARQL and LDPath query languages
• Transparent Linked Data Caching
• Graduated as Top Level Project of the Apache
Foundation in November 2013
MARMOTTA.APACHE.ORG
The Open Platform for Linked Data.
An Open Linked Data Project
for Tourism in Salzburg
• Cross platform publishing as more travellers massively begin
using mobile devices
• Multiple Web CMSs (both proprietary and open source) to be
managed simultaneously
• Costly manual curation and interlinking
• Increasing demand for content syndication (from big players like
foursquare as well as from local application developers)
• Need for better SEO especially for events and sites (too regional to
be understood by commercial search engines)
Remixing existing content and creating new value.
A magazine
running on WordPress
An online
booking system
freshly updated content
on locations and events
a database containing:
events, facilities, accommodations, …
Everything we know already
from Wikipedia
the World’s largest
encyclopedia
Using Linked Data to make sense of the information
Linked Data Publishing
• Data from the online booking system (Feratel) is enriched and transformed
in triples using identified vocabularies and ontologies
• Triples are stored in the Redlink triple store in a dedicated context
• RDF data and SPARQL end-points are published to the data website
(data.salzburgerland.com) running CKAN as Linked Open Data
• CKAN makes the data accessibile to third parties in various formats by
querying Redlink
Transforming Feratel Data
in Semantic Knowledge
from SOAP to Linked Data
Ontologies provide a mean
to hold everything together
Data Modelling with LODE
Using LODE: An ontology for
Linking Open Descriptions of
Events
Adding the relationships
between things
Florianifeier
with RDF different data sources are integrated to provide
robot-friendly information that describe real world things
<subject><predicate><object>
Semantic Lifting and
Linked Data Principles
• A “word” or “phrase” becomes an
identifier used to denote
“things” (named entities) existing in
the real world

1.Real-world thing are
unambiguously represented with
web addresses (URI)
2.By accessing these web addresses
(HTTP-URI) usable data is sent in
return using standard formats (RDF,
SPARQL)
3.This data includes links to other
data so that people can discover
more things
"label":"May",
"reference":
“http://dbpedia.org/
resource/May”
!
Type: Thing
"values"["13.7446"],"predicate": "http://
www.w3.org/2003/01/geo/wgs84_pos#long"
values"["47.10222"],"predicate": “http://
www.w3.org/2003/01/geo/wgs84_pos#lat”
"reference":
“http://dbpedia.org/page/Unternberg”
!
Type: Place
“label":"Florianifeier",
"reference":“http://
rdf.salzburgerland.com/
events/event/dea7fde1-5583-4002-97eb-007
4a182fa9c.html”!
Type: Event
Tim Berners-Lee.
LANGUAGE EVENT THING LOCATION
ENGLISH FLORIANIFEIER MAY UNTERNBERG
[Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe]
“This May don't miss the
Florianifeier, we'll have fun
as usual in Unternberg”
Dynamic Semantic Publishing with ordLiftW
• Data from the Redlink triple store is made available for content enrichment
and can be edited using WordLift, a semantic plugin for WordPress.
Data Curation
• Using Linked Data the Web
becomes my new CMS

• information is automatically
imported in WordPress

• posts are connected with
entities

• properties for each entity can
be edited using WordPress

• any change is automatically
reflected in the triple-store and
re-published as Open Data
Using Linked Data and WordLift the Web becomes your new CMS.
editing a blog post
editing an entity
Web Search
19.900 results
no answer
Touristic applications attempting to discover events in Salzburgerland.
“Which events occur in May in Lungau?”
Linked Open Data
Query
5 result
5 answer
Unternberg is a village in the area of Lungauon google.at!!
Better SEO using
Semantic Markup
Florianifeier
Unternberg
• Using schema.org the data
from the triple-store is added
to the pages as semantic
markup
• Search engines can finally
“recognise” entities that were
previously unknown (i.e.
Florianifeier)
ordLiftW
•Media in cross-media context, allowing to
analyse media resources as well as
connected content, including video, images,
audio, text, link structure and metadata;
•Investigate cross-media analysis along the
complete, distributed analysis chain, namely
extraction, metadata publishing, querying
and recommendations;
•Contribute its main software development
results as Open Source components to two
established Apache projects, Apache
Marmotta and Apache Stanbol, simplifying
the use of the technology in industrial
products.
What do we want computers to do with Media?
MICO-PROJECT.EU
“Show me the tempo-regional fragments where
Lewis Jones is right beside Connor Macfarlane?”
MICO-PROJECT.EU
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-
mm/functions#>
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>
!
SELECT (mm:boundingBox(?l1,?l2) AS ?left_right)
WHERE {
?f1 ma:locator ?l1; dct:subject ?p1.
?p1 foaf:name "Lewis Jones".
?f2 ma:locator ?l2; dct:subject ?p2.
?p2 foaf:name "Connor Macfarlane".
!
FILTER mm:rightBeside(?l1,?l2)
FILTER mm:temporalOverlaps(?l1,?l2)
}
We want computers to process media.
GRAZIE!
foaf:name
“Andrea Volpini"
Hopefully
soon in the
Linked
Data
Cloud!
CREDITS
ANDREW NG, 2011

J U RAFSKY & MARTIN, 2008

Webscale IA using Linked Open Data on slideshare by reduxd

LODE linking open descriptions of events aswc 2009 on
slideshare by Raphael Troncy 

Semantic SEO in the post-Hummingbird era on slideshare by Kim
Renberg and Andrea Volpini

Querying of metadata, media content and context in MICO a
demo by Thomas Kurz
this presentation is the result of many inspiring ideas and amazing work from

other people and here is the list:
any idea, graphics or meme belonging to us is available 

for sharing, copying and re-mixing under 

creative commons license 3.0

Mais conteúdo relacionado

Mais procurados

Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
sssw2011
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
sssw2011
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011
sssw2011
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
Julie Allinson
 

Mais procurados (20)

Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
 
Semantic Web - Introduction
Semantic Web - IntroductionSemantic Web - Introduction
Semantic Web - Introduction
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Using cognitive computing to better analyze human communication
Using cognitive computing to better analyze human communicationUsing cognitive computing to better analyze human communication
Using cognitive computing to better analyze human communication
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 
Open Data and Linked Data
Open Data and Linked DataOpen Data and Linked Data
Open Data and Linked Data
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with Ontologies
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Mining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your OrganizationMining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your Organization
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
POLE Investigations with Neo4j
POLE Investigations with Neo4jPOLE Investigations with Neo4j
POLE Investigations with Neo4j
 
Intelligence led policing- pole sandbox (webinar 21012019)
Intelligence led policing- pole sandbox (webinar 21012019) Intelligence led policing- pole sandbox (webinar 21012019)
Intelligence led policing- pole sandbox (webinar 21012019)
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
 
Capitalize On Social Media With Big Data Analytics
Capitalize On Social Media With Big Data AnalyticsCapitalize On Social Media With Big Data Analytics
Capitalize On Social Media With Big Data Analytics
 

Semelhante a What do we want computers to do for us?

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum Presentation
Mediabistro
 

Semelhante a What do we want computers to do for us? (20)

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they Compare
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
 
Bibliotheken en cloud computing
Bibliotheken en cloud computingBibliotheken en cloud computing
Bibliotheken en cloud computing
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum Presentation
 
The Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open Data
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Paul houle resume
Paul houle resumePaul houle resume
Paul houle resume
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Web3.0 or The semantic web
Web3.0 or The semantic webWeb3.0 or The semantic web
Web3.0 or The semantic web
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
 

Mais de Andrea Volpini

Mais de Andrea Volpini (20)

Seo automation using gpt 3 and transformer-based language models
Seo automation using gpt 3 and transformer-based language modelsSeo automation using gpt 3 and transformer-based language models
Seo automation using gpt 3 and transformer-based language models
 
Schema Markup Essentials by Semrush
Schema Markup Essentials by SemrushSchema Markup Essentials by Semrush
Schema Markup Essentials by Semrush
 
How AI/ML Can Supercharge Your SEO (a TNW round table)
How AI/ML Can Supercharge Your SEO (a TNW round table)How AI/ML Can Supercharge Your SEO (a TNW round table)
How AI/ML Can Supercharge Your SEO (a TNW round table)
 
Making Websites Talk: the rise of Voice Search and Conversational Interfaces
Making Websites Talk: the rise of Voice Search and Conversational InterfacesMaking Websites Talk: the rise of Voice Search and Conversational Interfaces
Making Websites Talk: the rise of Voice Search and Conversational Interfaces
 
Wordlift Roadmap for 2018
Wordlift Roadmap for 2018Wordlift Roadmap for 2018
Wordlift Roadmap for 2018
 
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
 
Is semantic markup really helping websites improve their online visibility?
Is semantic markup really helping websites improve their online visibility?Is semantic markup really helping websites improve their online visibility?
Is semantic markup really helping websites improve their online visibility?
 
WordLift - SEMANTiCS 2016
WordLift - SEMANTiCS 2016 WordLift - SEMANTiCS 2016
WordLift - SEMANTiCS 2016
 
New Thinking in the Practice of Digital Journalism
New Thinking in the Practice of Digital Journalism New Thinking in the Practice of Digital Journalism
New Thinking in the Practice of Digital Journalism
 
Semantic SEO in the post Hummingbird Era and WordLift
Semantic SEO in the post Hummingbird Era and WordLiftSemantic SEO in the post Hummingbird Era and WordLift
Semantic SEO in the post Hummingbird Era and WordLift
 
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0 Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)
 
WordLift 3.0 - Dynamic Semantic Publishing for WordPress
WordLift 3.0 - Dynamic Semantic Publishing for WordPress WordLift 3.0 - Dynamic Semantic Publishing for WordPress
WordLift 3.0 - Dynamic Semantic Publishing for WordPress
 
Redlink - Semantic Technologies for News & Media
Redlink - Semantic Technologies for News & Media Redlink - Semantic Technologies for News & Media
Redlink - Semantic Technologies for News & Media
 
Hybrid TV & OTT TV for Telco 3.0
Hybrid TV & OTT TV for Telco 3.0Hybrid TV & OTT TV for Telco 3.0
Hybrid TV & OTT TV for Telco 3.0
 
Wordlift 2.5 Sneak-Peek
Wordlift 2.5 Sneak-PeekWordlift 2.5 Sneak-Peek
Wordlift 2.5 Sneak-Peek
 
RedLink GmbH (Introduction)
RedLink GmbH (Introduction)  RedLink GmbH (Introduction)
RedLink GmbH (Introduction)
 
HelixCloud Webinar
HelixCloud WebinarHelixCloud Webinar
HelixCloud Webinar
 
Semantic Marketing
Semantic MarketingSemantic Marketing
Semantic Marketing
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in Rome
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

What do we want computers to do for us?

  • 1. A framework for knowledge extraction, linked data and semantic search. What do we want computers to do for us?
  • 2. We have data. • From 2005 to 2020, the digital universe will grow in size by a factor of 300, from 30 exabytes to 40 trillion gigabyte (40 ZB). • From now until 2020, the digital universe will about double every two years. • Volumes of data are projected to reach 5.247 GB per person with emerging economies playing an increasingly important role (producing two thirds of the world data by the end of this decade). • Only 0.5% of this data is used today for analysis. • The amount of information individuals create themselves - writing documents, taking pictures, recording audio - is far less than the information being created about them in the digital universe. [IDC I V I E W, 2012]
  • 3. What do we want computers to do for us? Text Images/Video Audio "language": "de" Categorisation, Summarisation, Search, Question/Answer, … "label": "outdoor" Suggest tags, Image search, … Automatic Speech Recognition, Speaker identification, Music classification, … [Andrew NG, 2011]We want computers to process data.
  • 4. Natural Language Processing We use it everyday. [J U RAFSKY & MARTIN, 2008] a theoretically motivated range of computational techniques for analysing naturally occurring text/speech for the purpose of achieving human-like language processing.
  • 5. Features extraction in text/speech. Levels of knowledge encoding in language data. INPUT Morphologic Syntactic Semantic FEATURES NLP { Parser Lexical DB Stemming AnaphoraPos Tagging NER
  • 6. TEXT NLP FEATURES WISDOM What do we want computers to do with a text? STRUCTURED DATA CONTEXT We want computers to make sense of unstructured data. KNOWLEDGE { Semantic Lifting
  • 7. TEXT WISDOM A practical example. CONTEXT Combining Semantic Web technologies with NLP technologies. KNOWLEDGE Lucoli "label": "Lucoli" "values": ["13.338889"], "predicate": "http:// www.w3.org/2003/01/ geo/wgs84_pos#long" "values": ["42.29194444444445"], "predicate": "http://www.w3.org/2003/01/geo/ wgs84_pos#lat" "values": [ ! ! ! ! ! ], "predicate": "http:// xmlns.com/foaf/0.1/ depiction" About 20 minutes car drive from L’Aquila. …
  • 8. How we started. Building an open platform for knowledge extraction, linked data and semantic search. ! Delivering the world’s most advanced open source content analysis and making linked data publishing and information discovery accessible to anyone.
  • 9. • Incorporating requirements from industry partners: • CMS companies • System integrators • Tool providers • Inheriting 6 years of IP with R&D on: • Semantic Information Management and Publishing (RDF and Semantic Web Technology) • Semantic Processing • Conceptual Search
  • 10. CONTENT ANALYSIS LINKED DATA PUBLISHING 1 3 Linked Data Cloud Technology Stack Text Legacy Data Audio/Images (under development) CONTENT DISCOVERY2 • Enterprise Linked Data • Content Enhancement • Semantic Search
  • 11. • Semantic enhancement process chaining • Multiple NLP features extraction facilities • Multiple language support • Content classification and sentiment analysis • Graduated as Top Level Project of the Apache Foundation in September 2012 STANBOL.APACHE.ORG A Toolbox for Semantic Processing.
  • 12. SOLR.APACHE.ORG The Highly Scalable Search Server. • Based on Apache Lucene • Various language specific processing procedures • Highly scalable (Solr cloud) and highly configurable • Ultra fast indexing/searching, indexes can be merged/ optimised • Semantic Search available with an easy-to-install Redlink Plugin
  • 13. DEV.REDLINK.IO/PLUGINS/SOLR Adding Semantic Search to Apache Solr. • Boost your existing Apache Solr installation with semantic enhancements via Redlink Content Analysis • Watch the screencast • Learn more• Customising the semantic enhancements with user-created vocabularies and Redlink NLP extraction facilities
  • 14. Managing vocabularies. Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data • Build your first app • Learn more • Redlink allows users to create their own Linked Data server for managing vocabularies or publishing datasets for Linked (Open) Data projects • Datasets managed with Redlink can be made available for content analysis and linking • Datasets can be either private (Linked Enterprise Data) or public (Linked Open Data) ! • Public Datasets such as DBpedia, Freebase and GeoNames are available for de-referencing and interlinking
  • 15. • Read-Write Linked Data • Triple store with transactions, versioning and rule-based reasoning • SPARQL and LDPath query languages • Transparent Linked Data Caching • Graduated as Top Level Project of the Apache Foundation in November 2013 MARMOTTA.APACHE.ORG The Open Platform for Linked Data.
  • 16. An Open Linked Data Project for Tourism in Salzburg • Cross platform publishing as more travellers massively begin using mobile devices • Multiple Web CMSs (both proprietary and open source) to be managed simultaneously • Costly manual curation and interlinking • Increasing demand for content syndication (from big players like foursquare as well as from local application developers) • Need for better SEO especially for events and sites (too regional to be understood by commercial search engines)
  • 17. Remixing existing content and creating new value. A magazine running on WordPress An online booking system freshly updated content on locations and events a database containing: events, facilities, accommodations, … Everything we know already from Wikipedia the World’s largest encyclopedia Using Linked Data to make sense of the information
  • 18. Linked Data Publishing • Data from the online booking system (Feratel) is enriched and transformed in triples using identified vocabularies and ontologies • Triples are stored in the Redlink triple store in a dedicated context • RDF data and SPARQL end-points are published to the data website (data.salzburgerland.com) running CKAN as Linked Open Data • CKAN makes the data accessibile to third parties in various formats by querying Redlink
  • 19. Transforming Feratel Data in Semantic Knowledge from SOAP to Linked Data
  • 20. Ontologies provide a mean to hold everything together Data Modelling with LODE
  • 21. Using LODE: An ontology for Linking Open Descriptions of Events Adding the relationships between things
  • 22. Florianifeier with RDF different data sources are integrated to provide robot-friendly information that describe real world things <subject><predicate><object>
  • 23. Semantic Lifting and Linked Data Principles • A “word” or “phrase” becomes an identifier used to denote “things” (named entities) existing in the real world 1.Real-world thing are unambiguously represented with web addresses (URI) 2.By accessing these web addresses (HTTP-URI) usable data is sent in return using standard formats (RDF, SPARQL) 3.This data includes links to other data so that people can discover more things "label":"May", "reference": “http://dbpedia.org/ resource/May” ! Type: Thing "values"["13.7446"],"predicate": "http:// www.w3.org/2003/01/geo/wgs84_pos#long" values"["47.10222"],"predicate": “http:// www.w3.org/2003/01/geo/wgs84_pos#lat” "reference": “http://dbpedia.org/page/Unternberg” ! Type: Place “label":"Florianifeier", "reference":“http:// rdf.salzburgerland.com/ events/event/dea7fde1-5583-4002-97eb-007 4a182fa9c.html”! Type: Event Tim Berners-Lee. LANGUAGE EVENT THING LOCATION ENGLISH FLORIANIFEIER MAY UNTERNBERG [Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe] “This May don't miss the Florianifeier, we'll have fun as usual in Unternberg”
  • 24. Dynamic Semantic Publishing with ordLiftW • Data from the Redlink triple store is made available for content enrichment and can be edited using WordLift, a semantic plugin for WordPress.
  • 25. Data Curation • Using Linked Data the Web becomes my new CMS • information is automatically imported in WordPress • posts are connected with entities • properties for each entity can be edited using WordPress • any change is automatically reflected in the triple-store and re-published as Open Data Using Linked Data and WordLift the Web becomes your new CMS. editing a blog post editing an entity
  • 26. Web Search 19.900 results no answer Touristic applications attempting to discover events in Salzburgerland. “Which events occur in May in Lungau?” Linked Open Data Query 5 result 5 answer Unternberg is a village in the area of Lungauon google.at!!
  • 27. Better SEO using Semantic Markup Florianifeier Unternberg • Using schema.org the data from the triple-store is added to the pages as semantic markup • Search engines can finally “recognise” entities that were previously unknown (i.e. Florianifeier) ordLiftW
  • 28. •Media in cross-media context, allowing to analyse media resources as well as connected content, including video, images, audio, text, link structure and metadata; •Investigate cross-media analysis along the complete, distributed analysis chain, namely extraction, metadata publishing, querying and recommendations; •Contribute its main software development results as Open Source components to two established Apache projects, Apache Marmotta and Apache Stanbol, simplifying the use of the technology in industrial products. What do we want computers to do with Media? MICO-PROJECT.EU
  • 29. “Show me the tempo-regional fragments where Lewis Jones is right beside Connor Macfarlane?” MICO-PROJECT.EU PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX mm: <http://linkedmultimedia.org/sparql- mm/functions#> PREFIX ma: <http://www.w3.org/ns/ma-ont#> PREFIX dct: <http://purl.org/dc/terms/> ! SELECT (mm:boundingBox(?l1,?l2) AS ?left_right) WHERE { ?f1 ma:locator ?l1; dct:subject ?p1. ?p1 foaf:name "Lewis Jones". ?f2 ma:locator ?l2; dct:subject ?p2. ?p2 foaf:name "Connor Macfarlane". ! FILTER mm:rightBeside(?l1,?l2) FILTER mm:temporalOverlaps(?l1,?l2) } We want computers to process media.
  • 31. CREDITS ANDREW NG, 2011 J U RAFSKY & MARTIN, 2008 Webscale IA using Linked Open Data on slideshare by reduxd LODE linking open descriptions of events aswc 2009 on slideshare by Raphael Troncy Semantic SEO in the post-Hummingbird era on slideshare by Kim Renberg and Andrea Volpini Querying of metadata, media content and context in MICO a demo by Thomas Kurz this presentation is the result of many inspiring ideas and amazing work from other people and here is the list: any idea, graphics or meme belonging to us is available for sharing, copying and re-mixing under creative commons license 3.0