Things you didn't know you can use in your Salesforce
From the Semantic Web to the Web of Data: ten years of linking up
1. from the Semantic Web to the Web of Data
ten years of linking up
Lugano 30-03-2010 Davide Palmisano - Fondazione Bruno Kessler
2. a short ToC
story of a buzzword
concepts and ideas behind it
Linked Data: four rules, billions of opportunities
the server side of the triple: Java and the Semantic Web
successes, failures and hopes
3. story of a buzzword
“To a computer, the Web is a
flat, boring world devoid
of meaning.”
“A new form of Web content that
is meaningful to computers will
unleash a revolution of new
possibilities”
“The Semantic Web is not a separate Web but an
extension of the current one, in which
information is given well-defined meaning, ”
7. story of a buzzword
“Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to
be created with relationship values.”
8. story of a buzzword
typed objects and relationships
machine-readable content metadata
with shared semantics
The Web as a global giant decentralized database
11. concepts and ideas behind it
How to represent the knowledge ?
World’s academic communities dealt for
years with knowledge representation
artificial intelligence, natural language
processing, model management and many
other research fields largely contributed
some ancestors traced the way
12. concepts and ideas behind it
SHOE[1]
“SHOE is an extension to HTML which
allows authors to annotate their web pages
with machine-readable knowledge”
<USE-ONTOLOGY ID="cs-dept-ontology" VERSION="1.0" PREFIX="cs" URL=
"http://www.cs.umd.edu/projects/plus/SHOE/cs.html">
<CATEGORY NAME="cs.Professor" FOR="http://www.cs.umd.edu/users/hendler/">
<RELATION NAME="cs.member">
<ARG POS=1 VALUE="http://www.cs.umd.edu/projects/plus/">
<ARG POS=2 VALUE="http://www.cs.umd.edu/users/hendler/">
</RELATION>
<RELATION NAME="cs.name">
<ARG POS=2 VALUE="Dr. James Hendler">
</RELATION>
13. concepts and ideas behind it
John Sowa’s
Conceptual Graphs [2]
(...) they express meaning in a form that is logically
precise, humanly readable, and computationally
tractable (...)
BOY AGNT WALK
“boy walking”
14. concepts and ideas behind it
declining such approaches in a
unpredictable
decentralized
potentially incoherent
environment as the Web is
has been the goal of a standardization effort
mainly lead by the W3C
15. concepts and ideas behind it
Resource Description Framework RDF
corner stone of the Semantic Web
technology stack
1999, first publication
directed and labeled
graphs as data model
16. concepts and ideas behind it
everything is univocally identifiable with
a Uniform Resource Identifier
a web page, a person, a book, an intangible thing
http://dpalmisano.myopenid.com
http://dbpedia.org/resource/Lugano
http://dbtune.org/myspace/coldplay
17. concepts and ideas behind it
relationships between things could be expressed
with a directed, labeled graph
where
nodes could be resources or XMLSchema-typed values
and relationships are identified also by URIs
18. concepts and ideas behind it
http://dpalmisano.myopenid.com
http://sws.geonames.org/3165243/
19. concepts and ideas behind it
http://dpalmisano.myopenid.com
http://xmlns.com/foaf/0.1/based_near
http://sws.geonames.org/3165243/
it’s an RDF triple
20. concepts and ideas behind it
http://dpalmisano.myopenid.com
http://xmlns.com/foaf/0.1/based_near
http://sws.geonames.org/3165243/
http://www.geonames.org/ontology#name
Trento
21. concepts and ideas behind it
http://dpalmisano.myopenid.com
http://xmlns.com/foaf/0.1/based_near
http://sws.geonames.org/3165243/
http://www.geonames.org/
ontology#population
http://www.geonames.org/ontology#name
104946
Trento
22. concepts and ideas behind it
http://xmlns.com/foaf/0.1/based_near
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
XML serialization
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dpalmisano.myopenid.com/">
<foaf:based_near rdf:resource="http://sws.geonames.org/
3165243/"/>
</rdf:Description>
</rdf:RDF>
26. concepts and ideas behind it
http://xmlns.com/foaf/0.1/based_near
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
this triple represents a relationship
between two resources
but how we can represent the meaning of
that relationship?
defining vocabularies and ontologies:
RDFSchema and OWL
27. concepts and ideas behind it
an “Hello World” RDFSchema vocabulary
rdf:type
http://helloworld.com/ontology/Person
http://helloworld.com/ontology/father
rdf:type
rdf:type
rdf:type
rdfs:Class rdfs:Property
28. concepts and ideas behind it
RDFSchema entailment: inferring new statements
http://helloworld.com/ontology/Person
http://helloworld.com/resource/Michele
rdf:type
http://helloworld.com/ontology/father
http://helloworld.com/resource/Davide
29. concepts and ideas behind it
RDFSchema entailment: inferring new statements
http://helloworld.com/ontology/Person rdf:type
http://helloworld.com/resource/Michele
rdf:type
http://helloworld.com/ontology/father
http://helloworld.com/resource/Davide
30. concepts and ideas behind it
OWL allows to specify other axioms
property cardinality restrictions
classes disjunction
property transitivity
cardinality constraints
but beware: more expressivity means more
reasoning complexity
interested in these topics? give a try to [3]
32. concepts and ideas behind it
RDFa: Bridging the traditional
Web with the Semantic Web
<div rel="dc:creator">
<span typeof="foaf:Person" about="http://foafbuilder.qdos.com/people/
dpalmisano.myopenid.com/foaf.rdf#me">
<a property="foaf:name" rel="foaf:homepage" href="http://
dpalmisano.myopenid.com/">Davide Palmisano</a>
<a rel="foaf:workplaceHomepage"
href="http://www.fbk.eu">Fondazione Bruno Kessler</a>
</span>
</div>
33. concepts and ideas behind it
SPARQL: querying the Semantic Web
based on graph pattern matching
SPARQL Protocol and RDF Query
Language
4 different operators: SELECT, DESCRIBE,
ASK and CONSTRUCT
34. concepts and ideas behind it
SPARQL: querying the Semantic Web
SELECT ?person
WHERE {
?person a foaf:Person.
?person ex:age ?age.
FILTER(?age > 18)
}
35. concepts and ideas behind it
SPARQL: querying the Semantic Web
“In which university have
studied the founders of
successful IT companies?”
and order them by
frequency...
36. concepts and ideas behind it
SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency
WHERE {
{ {?company a dbpedia-owl:Company} UNION { ?company a
yago:InternetCompaniesOfTheUnitedStates } UNION {?company a
yago:CompaniesBasedInSiliconValley} UNION {?company a
yago:CompaniesListedOnNASDAQ} }
?company dbpedia-owl:numberOfEmployees ?numberOfEmpl.
FILTER (?numberOfEmpl > 0).
OPTIONAL { ?company dbpedia-owl:keyPerson ?keyPerson }
?keyPerson dbpprop:almaMater ?almaMater.
}
ORDER BY DESC(?frequency)
37. Linked Data: four rules, billions of opportunities
1.
Use URIs to identify things.
2.
Use HTTP URIs so that these things can be
referred to and looked up ("dereference") by
people and user agents.
3.
Provide useful information (i.e., a structured
description - metadata) about the thing when
its URI is dereferenced.
4.
Include links to other, related URIs in
the exposed data to improve discovery of other
related information on the Web.
38. Linked Data: four rules, billions of opportunities
DBpedia: Wikipedia as a database
extract such structured info and represent it with RDF
39. Linked Data: four rules, billions of opportunities
let’s do it also for
Internet Movie Database
BBC /programmes
CiteSeer
GeoNames
Musicbrainz
CIA factbook
and for all imaginable data-
intensive traditional Web sites...
42. the server side of the triple: Java and the Semantic Web
RDF is the model
SPARQL is the query language
RDFa is our Trojan horse
Linked Data is the paradigm
how does it fit with Java?
43. the server side of the triple: Java and the Semantic Web
Semantic Web general purposes open sources libraries
Jena[3] - The Semantic Web Java framework
- a RDF API
- parsing and writing RDF in RDF/XML, N3 and N-Triples
- an OWL API
- In-memory storage and persistence layer
- SPARQL query engine
- Schemagen: Java classes from a RDFSchema vocabulary
44. the server side of the triple: Java and the Semantic Web
Jena: creating a model
// URI declarations
String familyUri = "http://family/";
String relationshipUri = "http://purl.org/vocab/relationship/";
// Create an empty Model
Model model = ModelFactory.createDefaultModel();
// Create a Resource for each family member, identified by their URI
Resource adam = model.createResource(familyUri+"adam");
Resource beth = model.createResource(familyUri+"beth");
// Create properties for the different types of relationship to represent
Property siblingOf = model.createProperty(relationshipUri,"siblingOf");
// Add properties to adam describing relationships to other family members
adam.addProperty(siblingOf,beth);
45. the server side of the triple: Java and the Semantic Web
Jena: querying the model
// Create a new query passing a String containing the RDQL to execute
Query query = new Query(queryString);
// Set the model to run the query against
query.setSource(model);
// Use the query to create a query engine
QueryEngine qe = new QueryEngine(query);
// Use the query engine to execute the query
QueryResults results = qe.exec();
while (results.hasNext()) {
ResultBinding binding = (ResultBinding)results.next();
RDFNode definition = (RDFNode) binding.get("definition");
System.out.println(definition.toString());
Resource concept = (Resource)binding.get("concept");
List wordforms = concept.listObjectsOfProperty(wordForm);
}
46. the server side of the triple: Java and the Semantic Web
other valuable alternatives
Sesame[4] - a generic open source Java framework for
storage and querying of RDF data
- easy, elegant and well documented
jRDF[5] - an RDF library for Java
- notable for IoC support (Spring 2)
47. the server side of the triple: Java and the Semantic Web
getting RDF data
Any23[6] - Anything to Triples
- a library
- a Web service
- a CLI
- allows to extract RDF from various sources:
- Microformats: Adr, Geo, hCalendar, hCard, hListing,
hResume, hReview, License and XFN
- RDF/XML, Turtle and Notation3
- RDF/XML, N3, Turtle and content-negotiated
serialization supported
48. the server side of the triple: Java and the Semantic Web
Any23: rdf extraction
/*1*/ Any23 runner = new Any23();
/*2*/ runner.setHTTPUserAgent("test-user-agent");
/*3*/ HTTPClient httpClient = runner.getHTTPClient();
/*4*/ DocumentSource source = new HTTPDocumentSource(
httpClient,
"http://www.rentalinrome.com/semanticloft/semanticloft.htm"
);
/*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream();
/*6*/ TripleHandler handler = new NTriplesWriter(out);
/*7*/ runner.extract(source, handler);
/*8*/ String n3 = out.toString("UTF-8");
49. the server side of the triple: Java and the Semantic Web
Any23 deals with such documents that already
contains some RDF metadata
extracting the semantics from free-text and
disambiguate terms with links to some Linked Data
cloud it’s another story
a pletora of different services
- AlchemyAPI[7]
- OpenCalais[8]
50. the server side of the triple: Java and the Semantic Web
The world's largest maker of solar inverters announced Monday that
it will locate its first North American manufacturing plant in Denver.
"We see a huge market coming in the U.S.," said Pierre-Pascal
Urbon, the company's chief financial officer.
The company, based in Kassel, north of Frankfurt, Germany, boasts
growing sales of about $1.2 billion a year.
51. the server side of the triple: Java and the Semantic Web
The world's largest maker of solar inverters announced Monday that
it will locate its first North American manufacturing plant in Denver.
"We see a huge market coming in the U.S.," said Pierre-Pascal
Urbon, the company's chief financial officer.
The company, based in Kassel, north of Frankfurt, Germany, boasts
growing sales of about $1.2 billion a year.
http://dbpedia.org/resource/Frankfurt
http://dbpedia.org/resource/Denver
http://dbpedia.org/resource/Kassel
52. the server side of the triple: Java and the Semantic Web
exposed as HTTP Web services they
provide responses in XML, RDF/XML, RDFa
or JSON
Apache UIMA comes with two annotators
for AlchemyAPI and OpenCalais[9]
53. the server side of the triple: Java and the Semantic Web
indexing RDF data
SIREn[10]: Efficient semi-structured Information
Retrieval for Lucene
- a plugin for Lucene
- extends the Lucene query model
- semi-structured search
- structure aware full-text search
- ranked semi-structured search: most relevant results
returned first
- sub-linear average response time
- flexible semi-structured indexing
54. the server side of the triple: Java and the Semantic Web
storing RDF data
commonly known as “triple-stores”[11]
“let me insert triples and make
SPARQL queries above them”
- OpenLink Virtuoso
- 4Store
- Redland
- Jena or Sesame over a RDBMS
55. the server side of the triple: Java and the Semantic Web
JDBC and Virtuoso
boolean more = stmt.execute("sparql select * from <gr> where { ?x ?y ?z }");
ResultSetMetaData data = stmt.getResultSet().getMetaData();
while(more)
{
rs = stmt.getResultSet();
while(rs.next())
{
...
}
more = stmt.getMoreResults();
}
56. the server side of the triple: Java and the Semantic Web
Empire[12]: JPA for RDF
- Object Triples Mapper
- 4Store, Sesame and Jena support
- small annotation framework for tying Java beans
to RDF
-generate Java interfaces for classes described in an
OWL ontology automatically based on domain, range
constraints, cardinality restrictions
- runtime implementation generation
- IoC with Google Guice
57. the server side of the triple: Java and the Semantic Web
crawl the Web
extract RDF from RDFa and
Microformats with Any23
index the data with SIREn
store the data on HBase
in one word: Sindice.com
58. successes, failures and hopes
Linked Data and RDFa seem to be the right
ways to trigger the “network effect” about
the usage of Semantic Web technologies
data.gov.uk
59. successes, failures and hopes
Twine.com
it has been the first mainstream consumer
application of Semantic Web.
raised nearly $24mm of venture capital over 2 rounds
gaining users rapidly - faster than Twitter did in it’s early
years
Twine.com is going to be acquired by Evri.com
60. successes, failures and hopes
Twine.com
“I can truly say they present significant challenges
both to developers and to end-users.These
challenges all stem from one underlying problem:
Data storage.” - Nova Spivack CEO
61. successes, failures and hopes
GoodRelations: e-commerce on the Web of Data
huge impact on traditional search engines ranking
enabling cross-site product and offerings retrieval
Google rich snippets
62. successes, failures and hopes
GoodRelations: e-commerce on the Web of Data
GoodRelations and RDFa could heavily impact on
traditional SEO techniques
it may be a really powerful traction for an unleashed
usage of RDFa and semi-structured data on the Web
63. /me
Technologist @ Fondazione Bruno Kessler
Web of Data research Unit
twitter.com/dpalmisano
davidepalmisano.wordpress.com
wed.fbk.eu