1. Modelling social Web applications via tinydb
Claudiu Mih˘il˘
a a
Faculty of Computer Science,
”Al.I. Cuza” University of Ia¸i,
s
16, G-ral Berthelot Street,
700483 Ia¸i, Romania
s
claudiu.mihaila@info.uaic.ro
Abstract. This paper reports on the possibilities of data modelling for social Web applications
and user interaction with them by using url-shortening Web services such as tinydb. Due to the fact
that such a service transforms a long url into a very short one, and is able to send structured data
along with the url, it can change radically the appearance and composition of current messages.
However, some essential problems regarding these services have been identified, which need to be
addressed before using short urls safely.
Key words: tinydb, social Web, semantic Web, Twitter, model, rest, interaction
1 Introduction
Microblogging is a relatively new phenomenon defined as ”a form of multimedia blogging that allows users
to send brief text updates or micromedia such as photos or audio clips and publish them, either to be
viewed by anyone or by a restricted group which can be chosen by the user.”1 Microblogging tools provide
a light-weight, easy form of communication that enables users to broadcast and share information about
their activities, opinions and status. However, the existing microblogging services are still centralised and
confined, and efforts are being made to let microblogging be an integrating part of the Social Semantic
Web [1].
Modelling the necessary data for this type of Web applications whilst maintaining a high standard for
a rich user experience is not an easy task to fulfill. We hereby describe some steps that have been taken
towards the integration of microblogging into the semantic Web and the access to data in a restfull
manner by using url-shortening Web services.
The report is structured as follows: in section two, the tinydb Web service is presented, whilst in the
third section we describe a case study of using this Web service for one social Web application. Finally,
the issues arising from the use of url-shortening services are discussed.
2 Tinydb
Tinydb 2 is a Web service that offers the possibility of transmitting structured data along a short url. The
data can be associated with a url address either by get or post http-request parameters. For example,
the request in Fig. 1 would return a short url, http://tinydb.org/11mK, which, when accessed, would
make possible the retrieval of the original data.
http://tinydb.org/_write?_url=http://students.info.uaic.ro/~clau
diu.mihaila&name=Claudiu Mihaila&course=WADe
Fig. 1: Writing the data as parameters of an http request
There is one restriction regarding the name of the parameters: they are not allowed to start with an
underscore (’ ’), due to the fact that the Web service uses a number of parameters whose names start in
1
http://en.wikipedia.org/wiki/Microblogging
2
http://www.tinydb.org
2. this manner. These parameters are url, f, c and tinydb id . If present, the url parameter allows
the redirection to its value instead of displaying the actual data stored in the tinydb url.
The data associated with the parameters can be retrieved by accessing the url address obtained in
response to the request. There are multiple representations of the created information resource that can
be retrieved in a restful manner by the user or another application:
– f = json - returns a json object containing the data.
– f = jsonp - returns a json object but in JavaScript, that is fed directly into a callback function,
tinydbCallback, to be called once the script is loaded.
– f = js - returns a json object but in JavaScript, with an optional callback function (& c =
callback) to be called once the script is loaded.
– f = xml - returns the data in xml.
The thus retrieved information may be used by other applications as it is, or for the creation of
mash-ups, etc.
For example, Fig. 2 includes the data requested via the http://tinydb.org/11mK? f=xml url. If no
explicit format would have been requested, the accessing of the tinydb url would have resulted in the
redirection to the url specified by the value of the url parameter.
<xml_data>
<url>http://students.info.uaic.ro/~claudiu.mihaila</url>
<name>Claudiu Mihaila</name>
<course>WADe</course>
<tinydb_id>11mK</tinydb_id>
<created>2009-10-03 15:08:05.225510</created>
</xml_data>
Fig. 2: Retrieving the data as xml
The simplicity and ease of use of this Web service in storing and retrieving structured data make it
a viable option for its inclusion in large projects, especially in which data storage capabilities represent
an issue.
In the next section, we will analyse the modelling of the data and the interaction of the user with
one of the intensively used social Web applications nowadays, Twitter.
3 Twitter
Twitter3 is a very popular social Web application which allows the users to send and read short pieces
of text, known as tweets [2]. This application has grown significantly and very fast since its launching in
2006, having an estimate number of users for 2009 of 20 million people [3].
The very short length of the messages was initially established to make it possible to send and receive
tweets via the Short Message Service (sms). Thus, the limit of 140 characters has introduced sms-specific
slang and shorthand notation into the Web. Therefore, the creation of Web services such as tinybd can
largely influence the content of the sent messages. By using this service, Twitter users can now surpass
the 140 character limit and share practically unlimited amounts of data.
Searches on this system make use of hashtags, which are words or phrases prefixed with a #. A search
for ”Web” would find all messages that include #Web. Similarly, the @ sign followed by a username allows
users to send messages directly to each other, although the message is still readable by anyone.
One effective use of the tinydb Web service in the Twitter application is to share long and cumbersome
urls, which can take more than 140 characters. For example, the Google link in Fig. 3 comprises 214
characters and the whole link would not be allowed in a single tweet. However, using tinydb would
consume only 22 characters to obtain the same result, which leaves sufficient additional space for some
other short message, be it text or another url. Therefore, the association of the desired link with one
2
3. https://www.google.com/accounts/ServiceLogin?hl=en_US&continue=h
ttp%3A%2F%2Fpicasaweb.google.com%2Flh%2Fidredir%3Funame%3Dklaudi
umihaila%26target%3Dphoto%26id%3D5386208676112752274&service=lh2
<mpl=gp&passive=true
Fig. 3: Example of a 214-character long Google link
from tinydb is beneficial to the user, since more than 13% of the existing tweets contain some url in
them [4].
Furthermore, the fact that tinydb is able to store many long texts as parameters gives users the
possibility of publishing extensive messages. Since the http protocol does not place any a priori limit on
the length of a request parameter, users have no imposed length limit other than the maximum admitted
by the server. An advantage of having more space for writing messages is that the used concepts can be
properly tagged. Instead of the hashtags Twitter offers, the users can annotate by including more powerful
processing that can extract and define uris based on those tags. For instance, instead of writing ”Climbing
the #Statue of Liberty in #New York ”, someone could microblog ”Climbing the #dbp:Statue of Liberty
in #geo:New York ”. By doing so, the processor would then be able to extract the hashtags and send
queries to DBpedia4 and GeoNames5 to retrieve the uris of the related concepts. Thus, the tweets would
be automatically connected to existing uris rather than to meaningless text strings [5].
Moreover, the associated data may be described by using metadata, such as vocabularies from the
social semantic Web, e.g. foaf (Friend of a friend) and sioc (Semantically-Interlinked Online Commu-
nities). The former is used to model the microbloggers and their properties (e.g., name and e-mail) and
reuse their uri from some Web 2.0 services instead of creating new ones every time. The latter is used
to define related user contexts, providing a way to identify a user account on a given microblogging
service. Given the strong connection between foaf and sioc, people are allowed to access information
unavailable before via sparql queries.
4 Current issues
Even though the tinydb Web service provides short urls as aliases for longer ones or for large amounts of
text, it should be noted that the more the service is used, the longer the tinydb urls will get. Considering
n the size of the alphabet, and k the longest size of the desired code, the total number of short pages
with codes less or equal to k that can be created is computed in equation (1), according to the geometric
progression.
k
n(nk − 1)
ni = (1)
i=1
n−1
The currently used alphabet of 10 numbers and 52 letters (both upper and lowercase English letters)
creates a 62-base numeral system. With at most four characters per key, the service can encode almost
15 million urls. In the case of extending the key to five characters, the possibilities increase to almost
one billion. Although these are big numbers, they will probably prove to be not enough.
One drawback of the tinydb Web service is that no privacy or security features are introduced and
the submitted data is entirely accessible by the wide public. However, this issue can be easily overcome
by using various cryptography algorithms and submitting only the encrypted text. The ciphertext would
then be available to everyone, but readable only by those who possess the correct decryption key.
Furthermore, it is impossible to know towards which url the user will be redirected. Many phishing,
spamming, shock and affiliate urls are recorded on url shortening sites, which can compromise the users’
integrity. Nonetheless, this type of problematic content can be overcome by filtering or by presenting the
link instead of an automatic redirect. However, the case of hacking, when the url is changed intentionally
by someone else, can vexate and expose the service’s users.
3
http://www.twitter.com
4
http://dbpedia.org
5
http://geonames.org
3
4. Moreover, the urls that this type of services store are prone to becoming link rot. Gomes and Silva [6]
concluded in one study that the lifetime of contents follows a logarithmic distribution with an estimated
half-life of only two days, which is continuously decreasing.
Another issue is that by using such a service the complexity of each access increases by one level. The
fact that every time at least one dns lookup and one http access have to be performed supplements the
number of necessary requests and lengthens the waiting time.
5 Conclusions
In this report we have presented the possibility of modelling the data in social Web applications via url
shortening services and how this affects the user’s interaction with those applications.
Possessing a diminished url with associated structured data brings many advantages when it comes
to the information content that can be sent. On the one hand, users are able to communicate in longer
texts, acquiring a higher expressivity regarding their emotional state or activities. On the other hand, it
is now possible to include appropriate metadata to the messages. Including widely used ontologies and
data bases enables the creation of an enriched semantic Web.
However, the large number of examples of inappropriately used tinydb short urls raises a question
mark on its utility. The fact that short urls are not entirely safe at the moment may procrastinate its
employment on a large scale. More security measures need be developed in order to apply this type of
data modelling for it to be a successful step towards a social semantic Web.
References
1. Breslin, J.G., Decker, S.: Semantic web 2.0: Creating social semantic information spaces. In: Tutorial in the
15th International World Wide Web Conference (WWW 2006). (May 2006)
2. Pontin, J.: From many tweets, one loud voice on the internet. The New York Times (22 April 2007)
3. Kazeniac, A.: Social networks: Facebook takes over top spot, Twitter climbs.
http://blog.compete.com/2009/02/09/facebook-myspace-twitter-social-network/ (9 February 2009)
4. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities.
In: WebKDD/SNA-KDD ’07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web
mining and social network analysis, New York, NY, USA, ACM Press (2007) pp. 56–65
5. Passant, A., Hastrup, T., Bojars, U., Breslin, J.: Microblogging: A semantic web and distributed approach.
In: Proceedings of the 4th Workshop on Scripting for the Semantic Web. (2008)
6. Gomes, D., Silva, M.J.: Modelling information persistence on the web. In: Proceedings of the 6th International
Conference on Web Engineering, New York, NY, USA, ACM Press (11-16 July 2006) pp. 193–200
4