Tags are a very effcient method of describing information
with metadata. Adding semantic information to the keywords allows
computers to comprehend what the pages are saying and use that knowledge to oer better service to humans when interacting with them. The
tagging extension for the XWiki Platform links the user-defined keywords
with semantic information from the DBpedia knowledge base.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
1. Semantic Tagging for the XWiki Platform with
Zemanta and DBpedia
Elena-Oana T˘b˘ranu and Anna-Maria Metzak
a a
Faculty of Computer Science
“Alexandru I. Cuza” University of Ia¸i
s
{elena.tabaranu,anna.metzak}@info.uaic.ro
Abstract. Tags are a very efficient method of describing information
with metadata. Adding semantic information to the keywords allows
computers to comprehend what the pages are saying and use that knowl-
edge to offer better service to humans when interacting with them. The
tagging extension for the XWiki Platform links the user-defined keywords
with semantic information from the DBpedia knowledge base.
Key words: XWiki, Zemanta, DBpedia, knowledge base, Semantic Web,
tagging, Common Tag
1 Introduction
A tag is a relevant keyword or term associated with specific content. Labeling
by keywords has long been used in scientific publications. Recent comeback hap-
pened when web users and developers realized tags are a very efficient method
of describing information with metadata.
The goal of this project is to extend a conventional open source Web ap-
plication with semantic information. The Semantic Tagging XWiki component
enriches the tagging mechanism for the XWiki Platform using the content rec-
ommendation tool Zemanta1 and the knowledge base DBpedia2 . The XWiki
semantic tagging mechanism allows the user to get suggestions when adding
new tags and have links for each new tag to concepts extracted from the world’s
biggest knowledge base, Wikipedia.
2 The XWiki Platform
XWiki is a open source platform for developing collaborative web applications
using the wiki paradigm. XWiki Products are based on the XWiki Platform
1
Zemanta is a tool which brings relevant content from around the web brought as the
user is typing. The API allows to bring these related Images, Articles, Hyperlinks
and Tags to your Application.
2
DBpedia is a community effort toextract structured information from Wikipedia
andtomake this information available onthe Web.
2. 2 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
which provides common services and UI to them. XWiki is a second generation
wiki that provides all the basic content management and administration features
of common wikis, but with much more. XWiki takes the wiki approach to a whole
new level by providing enhanced features and capabilities. With XWiki, you can
build simple applications, extend the platform with custom plugins/components,
or even build complex Web applications.
Some of the features offered by the XWiki Platform are:
– Edit pages by using wiki syntax to format text, create tables, create links,
display images, etc. Alternatively use a powerful WYSIWYG editor to edit
the content of documents.
– Create, Edit, Show, Print, Delete, Copy, Move and Rename documents.
– Export wiki pages to PDF, RTF, XML or HTML.
– Attach as many files as you want to any page. These files can then be refer-
enced and used in page contents.
– Control who can view, edit or delete documents in a flexible manner. Apply
rights to a document, a space or an entire wiki.
– Use XWiki’s programming API directly into your pages (Velocity or Groovy)
to perform advanced formatting, layout or anything really.
– Create applications by grouping several pages together. Import and export
Applications to/from your wiki.
Examples of applications that non-developers can create quickly and in an or-
ganic manner using XWiki:
– A blogging application.
– An RSS feed aggregator.
– Mashups. For example combining Google Maps with Delicious with Flickr
with Google Base with Google Calendar, etc.
– Collaborative authoring of documents in real time.
– Form-based applications to enter collections of items
– A Poll/Survey application
2.1 The XWiki Platform Core
XWiki Core is a single historic JAR that is split into several distinct modules
and that currently implements the following features:
– Model: All the classes representing the wiki model, i.e. the following notions:
Document, Space, Wiki, Classes/Objects, Attachments and more.
– XWiki Syntax 1.0 Rendering: This is the old service for rendering XWiki
Syntax 1.0 which we keep for backward compatibility so that existing users
can keep using the XWiki Syntax 1.0. For all other syntaxes there’s now a
new Rendering Module.
– Localization: Handles translations in various languages. A new Localization
module is under development that will replace this old module.
3. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 3
Fig. 1. The XWiki Platform Architecture.
– Notification: Handles event registration and distribution. For example code
can subscribe to receive an event when a new document is created.
– Exports (PDF, RTF, XAR). In the future this will be done by implementing
specific Renderers in the new Rendering Module.
– Security: Authentication and Authorization handling.
– User Management
2.2 The XWiki Platform Plugins
The plugins created and maintained by the XWiki development team are ei-
ther in their own JAR, either are still located in the XWiki Core JAR. Besides
these ones, others plugins have been contributed by the community and can be
installed. The full list of available plugins is available on the Code Zone3 .
2.3 The XWiki Platform Modules
A module offers services in a given domain. Modules are the equivalent of Plugins
but using the new XWiki component-based architecture.
XWiki’s Architecture is based on Component-oriented Development. XWiki
has chosen to be independent of all existing Components Managers and instead
to define some simple Component interfaces that can then be bound on any ex-
isting Component Manager. XWiki is currently implementing its own lightweight
Component Manager.
3
Contributions from the XWiki community can be accessed at: http://code.xwiki.
org/xwiki/bin/view/Main/.
4. 4 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
2.4 The XWiki Platform Applications
The applications created and maintained by the XWiki development team are:
Panels, Administration, Blog, Application Manager, Wiki Manager, Scheduler,
Statistics, Watch List, Office Importer, WebDAV, WebDAV, Tags, Search. In
addition to these, others applications have been contributed by the community
and can be installed. The full list of available applications is available on the
Code Zone.
2.5 Extending The XWiki Platform
The XWiki Platform can be extended by:
– Writing scripts in wiki pages
– Writing Applications (set of wiki pages)
– Writing Plugins in Java
– Writing Modules (a set of components) in Java
– Writing new Skins or extending existing ones
– Extending existing Service APIs when they provide extension points.
Fig. 2. Extending the XWiki Platform.
3 Bringing Semantic Tagging to the XWiki Platform
with Zemanta and DBpedia
Semantic Tagging is a proposal to extend XWiki’s default tagging mechanism
using the Zemanta content recommendation tool and the DBpedia knowledge
base:
5. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 5
– tag documents with user-defined tags (default behavior in XWiki for tag-
ging);
– use Zemanta to recommend tags for the wiki page content;
– add concept information for each tag using Dbpedia.
The mockups below were produced using Balsamq mockups and provide the
user interface changes for the XWiki Platform when adding and displaying a
semantic tag.
3.1 Add a semantic tag
When adding a tag for the content of a wiki page, the user has two options from
the “Add Tag” form: the “Suggested tags” tab or the “Wiki Tags” tab.
When hovering over a suggested tag, a popup with semantic details will be
displayed: tag description and URI link for the DBpedia resource page. Besides
the “Suggested tags”, the user can use the “Wiki tags” tab to display the tag
cloud from the entire wiki. Also, the default autocomplete feature will help the
user find tags already used in the wiki instance.
After a tag will be added to the Tags section for a wiki page, it will be deac-
tivated from the suggested list. The grey color was used to mark the deactivated
tags.
Fig. 3. Mockup for tagging a wiki page in XWiki.
6. 6 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Fig. 4. Tagging a wiki page in XWiki.
Fig. 5. Autocomplete feature for tagging a wiki page in XWiki.
7. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 7
3.2 Display semantic information for a tag
A semantic tag will preserve the default behavior for XWiki in view mode: add
icon, remove icon and link to the list of documents which were tagged with it,
but will also have semantic information attached.
Fig. 6. Mockup for displaying a wiki page in XWiki.
Fig. 7. Semantic information for a wiki tag.
3.3 Instruments used for suggestions
Digitalization of content started by putting written word into ASCII form.
HTML and web eventually enabled linking and interleaving with other types
of media such as images, sound and video. Flash and Javascript further enabled
interactive widgets such as map views. Lately the content on the web is moving
into direction of explicitly exposing relations between pieces of data. General
intention of explicitly exposing relations is to allow computers to comprehend
what pages are saying and use that knowledge to offer better service to humans
when interacting with them.
8. 8 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
While authoring text comes naturally for educated human beings many rea-
sons exist why creating fully featured web content is still cumbersome experience.
Those reasons can be split into two main categories. One issue is efficiently find-
ing the right content that should be included or connected to. This usually takes
a lot of time. The other issue is efficiently telling the computer the relationships
between our content and external content and data. This usually requires skills
and knowledge from depths of specifications and standards.
Zemanta is the service that tries to resolve those two issues by providing
semi-automatic process of content enrichment to be more appealing to humans
and at the same time placing it in correct relations to other content in a way
computers can understand.
Fig. 8. Authoring process with Zemanta.
Zemanta API allows application developers to automatically query the Ze-
manta engine for contextual information about the text that user enters. Tech-
nically, the API accepts (any) text through a POST request and upon analysis
of that text returns suggestions.
While some other services only try to find the most overrepresented rare
words or proper names in the text, Zemanta goes deeper when processing con-
tent. Zemanta offers both tags based on words and phrases that can be found
inside author’s text and also those that are only topics that could represent the
content as a whole, but are not explicitly mentioned. It goes even further and
tries to find very concrete items and concepts that are related to what is being
said, but are only connected through a third piece of information. Therefore
author can expect topics, names and concepts as tags.
9. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 9
Structure of Zemanta’s RDF/XML response was inspired by Linking Open
Data initiative, other APIs offering semantic responses and most importantly
ideas championed by W3C.
The XWiki Semantic Tagging component uses the Zemanta API to suggest
possible keywords for a specific text. The component identifies itself with an
API key. The API key is a string that uniquely identifies a specific instance of
application that is using the Zemanta web service. Also, there are limitations
on the number of requests per day and number of requests per second: default
developer accounts allow for 1000 posts per day and 1 post per second.
3.4 Instruments used for semantic information
DBpedia extracts factual information from Wikipedia pages, allowing users to
find answers to questions where the information is spread across many differ-
ent Wikipedia articles. DBpedia is served on the Web under the terms of the
GNU Free Documentation License. In order to full the requirements of different
client applications and can be accessed through four mechanisms: Linked Data,
SPARQL endpoint, RDF dumps and index lookup.
Linked Data is a method of publishing RDF data on the Web that relies
on HTTP URIs as resource identifers and the HTTP protocol to retrieve re-
source descriptions. DBpedia resource identifers (such as http://dbpedia.org/
resource/Andy_Warhol) are set up to return RDF descriptions when accessed
by Semantic Web agents and a simple HTML view of the same information
to traditional Web browsers. HTTP content negotiation is used to deliver the
appropriate format.
A SPARQL endpoint is available for querying the Dbpedia knowledge base.
Client applications can send queries over the SPARQL protocol to the endpoint
at http://dbpedia.org/sparql. In addition to standard SPARQL, the end-
point supports several extensions of the query language that have proved useful
for developing client applications, such as full text search over selected RDF
predicates, and aggregate functions, notably COUNT(). To protect the service
from overload, limits on query complexity and result size are in place.
The DBpedia knowledge base is sliced by triple predicate into several parts
and N-Triple serializations of these parts are available for download on the DB-
pedia website. In addition to the knowledge base that is served as Linked Data
and via the SPARQL endpoint, the download page also ooffers infobox datasets
that have been extracted from Wikipedia editions in 29 languages other than
English.
In order to make it easy for Linked Data publishers to find Dbpedia resource
URIs to link to, a lookup service proposes DBpedia URIs for a given label.
The Web service is based on a Lucene index providing a weighted label lookup,
which combines string similarity with a relevance ranking in order to and the
most likely matches for a given term. DBpedia lookup is available as a Web
service at http://lookup.dbpedia.org/api/search.asmx.
The XWiki Semantic Tagging component links information from the DBpedia
index (short description for a tag, URI for the resource page, label) to the user-
10. 10 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
defined tags in the wiki. This is an extension to the default tagging mechanism
for the XWiki platform which does not link the user-defined tags to a concept.
3.5 Common Tags
The Semantic Tagging component uses the Common Tags RDFa vocabulary to
bring semantic markup to the default XWiki tagging mechanism.
Fig. 9. Example of semantic markup using RDFa for a wiki tag.
3.6 Implementation details
Extensions for the XWiki Platform to implement the semantic tagging mecha-
nism:
– a XWiki application(SemTags.Tooltip) for the tag tooltip: contains a Javascript
skin extension, Stylesheet skin extension;
– a XWiki application (SemTags.CreateTagForm) for the new form for seman-
tic tagging: velocity code to add a tag suggested from Zemanta, linked with
information from DBpedia or just a tag already used in the wiki;
– a XWiki component for the backend tag mechanism: connect to the Zemanta
API, query the DBpedia index.
– resources modifications: Javascript code to support the new tagging func-
tionality;
– template modifications: updating htmlheader.vm with the DOCTYPE of
the XHTML wiki pages to support the new RDFa vocabulary, updating
documentTags.vm with the new display for a keyword.
The XWiki code lifecycle is based on maven, hence a maven archetype was
used to help create a simple component module with respect to the XWiki
architecture and components specific requirements. Since the XWiki platform is
written using the Java programming language, a Java library was used to query
the Zemanta engine and the API was added as a maven dependency for the
XWiki component.
11. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 11
Maven dependency for the Zemanta API.
<dependency>
<groupId>com.zemanta.api</groupId>
<artifactId>zemapi</artifactId>
<version>1.0</version>
</dependency>
The HTTPClient library was used to query the Dbpedia lookup web service and
a dependency was also added in the component pom.xml.
Maven dependency for the HTTPClient library.
<dependency>
<groupId>commons-httpclient</groupId></dependency>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
Content of the component declaration file components.txt.
org.xwiki.semtag.component.internal.DefaultSemanticTagger
org.xwiki.semtag.component.internal.
vcinitializer.SemanticTaggerVelocityContextInitializer
The @ComponentRole annotation used for declaring the interface of the compo-
nent.
@ComponentRole
public interface SemanticTagger
{
public ArrayList<SemanticTag> getSuggestions(String text);
public void updateFirstSemanticDetail(SemanticTag tag)
throws SAXException, ParserConfigurationException, RemoteException;
public SemanticTag updateSemanticDetails(String tagName)
throws ParserConfigurationException, SAXException;
}
The @Component annotation is used to implement the XWiki component which
will be accessed using a scripting language like Velocity.
@Component("tagger")
public class SemanticTaggerVelocityContextInitializer
implements VelocityContextInitializer
{
/** The key to add to the velocity context */
public static final String VELOCITY_CONTEXT_KEY = "tagger";
12. 12 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
@Requirement
private SemanticTagger semanticTagger;
/**
* Add the component instance to the velocity context
* received as parameter.
*/
public void initialize(VelocityContext context)
{
context.put(VELOCITY_CONTEXT_KEY, semanticTagger);
}
}
Using the component API from Velocity to display the tag name, description and
link to the DBpedia URI.
#set($suggestedList = $tagger.getSuggestions("$request.text"))
#foreach($suggestedTag in $suggestedList)
#set($ok = $tagger.updateFirstSemanticDetail($suggestedTag))
#set($details = $suggestedTag.getSemanticDetails())
<li>
<a class="suggested-tag" href="#">$suggestedTag.name</a>
<span class="suggested-tag-info"
style="display: none">$details.get(0).getDescription()
<br/><a href="$details.get(0).getUri()">Visit</a>
<div id="more-at">Powered by
<a href="http://www.dbpedia.org">
<img src=’$dbpediaImg’ alt="Dbpedia"/></a></div>
</span>
</li>
#end
4 Conclusions
A tag is a relevant keyword or term associated with specific content and provide
a very efficient method of describing information with metadata. The tagging
extension for the XWiki platform provides semantic details extracted from the
world’s biggest knowledge base improving the content understanding both user
and the computer.
5 Bibliography
1. Common Tag, http://commontag.org/Home
13. Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 13
2. Bizer, Ch., Lehmann, J., Kobilarov, G., Auer, S., Becker, Ch., Cyganiak, R., Hell-
mann, S.: Dbpedia A Crystallization Point for the Web of Data
3. Zemnata Developer Network, http://developer.zemanta.com/
4. Tori,A.: Everything you need to know about Zemanta API besides the specification
5. Writing XWiki Components, http://platform.xwiki.org/xwiki/bin/view/
DevGuide/WritingComponents
6. ***, http://platform.xwiki.org/xwiki/bin/view/Main/
7. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/Architecture
8. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/