Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
.Net and Rdf APIs
1. .NET Framework RDF APIs
Lucian Nistor, Denis Recean
Universitatea “Alexandru Ioan Cuza”, Iasi
1 Introduction
In this paper we intend to do a comparative study about how RDF,
one of the most important bricks of Semantic Web, is processed in .NET. The
utility of such a study is obvious, .NET is one of the most used frameworks in
software development (desktop of web based), and Semantic Web, with RDF
at its foundation, represents the next step in web evolution, so they have to
interact with each other.
Before we start comparing the tools, we do a short presentation of
the main technologies.
Semantic Web
The Web has begun to “understand” the meaning of the information
it is composed of and this is the new phase of Web, the Semantic Web. This
process of “understanding” the data is realized using various formal ways,
like RDF (Resource Description Frameworks), RDFS (RDF Schema),
interchangeable data formats (like N3 or Turtle) or WOL (Web Ontology
Language). But the Semantic Web is like a living organism that is growing and
evolving right in front of our eyes.
RDF
The Resource Description Framework (RDF) is a standard for storing
data on Semantic Web. Semantic Web compliant applications use structured
information that is transmitted in a decentralized and distributed way. In
order to store the information in small, discrete pieces an abstract model
was created, RDF. This model is stored in a multitude of formats, but the
most popular is RDF/XML.
2. Ontology
Even thow there is no unique definition to Semantic Web Ontologies,
they are very important for Semantic Web. In philosophical way ontology is
“the study of entities and their relations” – Clay Shirky. Extrapolating that
definition to computer science we can say that an ontology is a formal
representation of a set of entities from a certain domain and the relations
between those entities.
SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is a query
language for RDF. A SPARQL interrogation is querying required and optional
graph patterns (RDF stored information forms an informational graph). These
patterns can be connected by conjunctions or disjunctions. SPARQL can be
used to interrogate any data source that is stored in RDF format or can be
transformed in RDF. The result of a query can take the form of sets or RDFs.
http://en.wikipedia.org/wiki/SPARQL - SARQL query example:
PREFIX abc: <http://example.com/exampleOntology#>
SELECT ?capital ?country
WHERE {
?x abc:cityname ?capital ;
abc:isCapitalOf ?y .
?y abc:countryname ?country ;
abc:isInContinent abc:Africa .
}
.NET Framework and VS
The .NET Framework is a software framework developed by
Microsoft and it used by the latest software applications that run on
Windows. The framework includes a big library with solutions to common
programming problems and a virtual machine.
Developers that write applications in .NET have the advantage that
they can use one of many programming language to write their code (C#,
Visual Basic, C++,…), they have access to the Base Class Library, have a
common development environment for desktop and web applications, and
have access to a extraordinary documentation.
3. The Base Class Library is a component of the framework that
provides features like database connectivity, cryptography, web application
development and so on.
Visual Studio is the Microsoft IDE for software development. It
includes .NET Framework and Microsoft encourages the use of the
framework in software development. VS provides advanced features and RIA
development support.
Besides Microsoft there is a .NET Framework developed for Linux,
called Mono, but supports only .Net 1.0 and .NET 2.0 and unlike Windows
which is on .Net 3.5 and prepares for .Net 4.0
2 API comparation
Even if, in the latest years, Semantic Web evolved considerably and
RDF has become a common data storing standard, Microsoft didn’t include
native support for RDF processing. Understanding the growing importance of
RDF in new web software development, some independent developers have
implemented solutions that offer support to RDF processing. We will discuss
about three such API’s, SemWeb, a library that provides low level RDF
interaction, LinqToRdf and Rowlex, thow API’s that use SemWeb internally
and provide a flexible and easy to use API.
SemWeb
Was developed by Joshua Tauberer and, according to the author, it
can be used to read and write RDF files in XML and N3 formats, persistently
store RDFs in memory, in SQL databases, to query persistent storage or
remote endpoints using SPARQL.Is also can provide limited RDFS interaction.
LinqtoRdf
Developed by Andrew Matthews this tool’s main aim is to allow .NET
programmers to use LINQ query technology to interrogate a RDF information
graph with the help of classes that have been defined using RDFS or OWL.
The tool includes extensions for visual studio that allows the user to model
4. ontologies using VS.NET class designer. Its main features are converting
LINQ query to SPARQL and to generating .NET classes that map ontologies.
Rowlex
Rowlex is a toolkit used for creating and browsing RDF documents. It
uses ontology to model classes and properties and then models RDF tipples
like instances of those classes. ROWLEX is the acronym for Relaxed OWL
Experience. On other words Rowlex maps the object oriented programming
advantages over RDF processing using OWL (Web Ontology Language). It
offers the ability to generate .NET classes from ontologies and ontologies
from .NET classes. This API was developed by NC3A Semantic Interoperability
tem.
3 RDF data storage
The way the RDF information or the RDF itself is stored is very
important. It influences performance and the interoperability with other
platforms and applications.
SemWeb is capable to work with RDF in XML and N3 formats. The
abstractization of a RDF triple in SemWeb library is done with the Entity
class, which stores an RDF entity, the Literal class which stores a relation and
the Statement class which combines two entities and a literal to obtain a RDF
triple.
LinqToRdf API uses N3 format to store RDF files. In .Net LinqToRdf
creates classes that map the ontology describing the RDF and then uses the
Linq mechanisms to query, delete or add information to a certain RDF file.
The classes are created using attributes to map ontology features. A triple is
stored as an instance of a class and the relation between classes are
modeled with OOP means. For instance ontology class hierarchy is modeled
with class derivation and one-to-many relation is modeled with list of
objects.
5. ROWLEX uses XML and N3 format to store RDF files. When processes
documents the library stores RDF triples as instances of classes that map
ontology, in a similar way that LinqToRdf does.
As a common characteristic of using the .NET framework all these
APIs have the possibility to serialize the data in the .NET ways. For instance
the RDFs can be stored in a SQL database or in binary format, as any .NET
object.
4 SPARQL support
Only two of the three APIs have support for SPARQL interrogations,
both on local RDF files and on Remote SPARQL Endpoints. These are
SemWeb and LinqToRdf.
In the SemWeb the interrogations are stored in special objects:
Query class objects for local queries and SparqlHttpSource for remote ones.
Example of SPARQL interrogation written using SemWeb API.
SparqlHttpSource source =
new SparqlHttpSource("http://DBpedia.org/sparql");
source.RunSparqlQuery("SELECT * WHERE { ?a ?b
"Michael Jackson" . }", Console.Out);
LinqToRdf uses the LINQ mechanism to create queries. As a data
context for a query a RDF object is used. The constructed LINQ query is then
translated into SPARQL. In order to interrogate remote data that is not in
RDF format, special tools that transform in to RDF format need to be used.
For instance, in order to interrogate OpenLink data the Virtuoso platform can
be used.
Example of LINQ interrogation over a RDF file, using LinqToRdf
TripleStore ts = new TripleStore();
ts.EndpointUri = @"://DBpedia.org/sparql ";
ts.QueryType = QueryType.RemoteSparqlStore;
6. var q = from p in new RDF(ts).ForType<Person>()
where p.Name == " Michael Jackson “
select p;
5 Support for developers
Two of the projects are one man projects and the third is developed
by a company that has interest in other fields of computer science, like
information security. So the information is rather little and the support is
obviously insufficient.
5.1 Documentation
All three APIs have a documentation that shows their main features
using examples, all of them lack serious, detailed information. SemWeb is an
older project so the documentation is a bit more structured. The forum
activities concerning the three APIs are low because people who work with
RDF and want a specialized API for it usually use other development
frameworks like Java or C++.
5.2 Integration with VS
SemWeb is essentially a dll library which is included in VS project and
used as any other assembly.
LinqToRdf besides the dlls provides extensions to create .NET classes
that map ontologies and to create your own ontologies. It is the only tool
that has an installing kit.
Rowlex is integrated using the same dll method, but is also provides
two .exe files that can be used to generate an ontology from .NET classes
and to generate .NET classes that map an existing ontology.
There is a problem that needs to be mentioned here. All three APIs
are developed with .NET 2.0 and with Visual Studio 2008 without SP1 and
have problems when used with higher versions of .NET of VS. LinqToRdf is
7. impossible to install on versions of VS2008 with at least SP1 because of the
tools extensions.
5.3 Learning curve
The learning curve is almost the same for each of the three tools.
Performing simple tasks with all of them is relatively quick to learn, but when
it comes to serious, complicated tasks, that require a good understanding of
the API there are big problems due to lack of documentation and poor
support.
Rowlex is slightly easier to learn because it lacks SPARQL capability
and LinqToRdf is a bit easier that SemWeb if you know LINQ, else it can be
harder as you have to learn LINQ as well. But, for a .NET programmer it is
easier to learn LINQ than SPARQL. Taking into account these considerations
Rowlex is the easiest to learn, SemWeb is the hardest and LinqToRdf is in
between.
6 Performance
SemWeb has the best performance of all because it stores the
information in a lightweight manner (with three classes, Entity, Literal and
Statement) and the SPARQL interrogations need no transformation as they
are passed to the Query object as a string. Other reason why SemWeb is
more performant is the fact that the other two APIs use it to do their low
level interaction with the RDFs. Rowlex is worse in terms of performance
than SemWeb because it uses more classes to store the triples during
processing, but LinqToRdf is the least performant of all because the classes it
uses to map the ontology are LINQ compatible and because the LINQ queries
have to be transformed into SPARQL queries before they are run.
7 Interoperability
In terms of interoperability all the APIs benefit from two sides. One is
the RDF format which is specially designed to be used by many web
8. development frameworks. The other side is .NET which allows the
interoperability with SQL databases or with other platforms via web services
or different network communication protocols.
8 Project development and licensing
All the projects leave the impression that they are still in beta phase.
All have installing or integration problems, but the work on them ceased in
2008 or 2009.
SemWeb is open source, LinqToRdf is under New BSD License and
Rowlex is under GNU LESSER GENERAL PUBLIC LICENSE
9 Conclusions
Before drawing any conclusions we like to state the opinion of
Joshua Tauberer, the author to the most stabile API of the three, SemWeb,
about the fate of his project:
“May 19, 2009. I'm taking an indefinite hiatus from this project. That means
that while I'll try to apply any patches to fix existing bugs, I won't be actively
developing the library further, and I won't be answering questions for help
on the mail list. Over the last four years it's been fun to work on it, but I don't
think there has been enough uptake of the Semantic Web in the .NET world
(or otherwise) for me to justify spending more time on this when I have
other things in life I'd rather be working on.”
Now to conclusions. The three APIs where selected because they
were the “loudest” on the internet, so we considered them to be the best
candidates for our comparative study. The ideas where good, the work is
outstanding but all of them need a better documentation, better support,
bug fixing and further development in order to make them usable and
reliable tools for big, serious applications.
9. 10 Reference
http://www.w3.org/TR/rdf-sparql-query/
http://en.wikipedia.org/wiki/SPARQL
http://rowlex.nc3a.nato.int/default.aspx
http://www.hookedonlinq.com/linqtordf.ashx
http://aabs.wordpress.com/LINQ/
http://razor.occams.info/code/semweb/
http://en.wikipedia.org/wiki/.NET_Framework
http://www.microsoft.com/NET/
http://en.wikipedia.org/wiki/Web_Ontology_Language
http://semanticweb.org/wiki/Ontology
http://www.w3.org/RDF/
This article was processed using Microsoft Word with Springer LNCS style
and it is released under the Creative Commons Attribution-Share Alike 3.0
license http://creativecommons.org/licenses/by-sa/3.0/