2. The problem
• in the last years the amount of information available on the Web have increased
not only in size, but also in complexity
• most of the documents accessible through the internet consist of multimedia data,
user generated content, real-time information
• the complexity of the new structure of information has thus become a big issue in
the field of user experience and web usability and there is not yet a standard
framework for the presentation of these information to the user
2
3. Thinkbase and Thinkpedia
• interactive visualization tools for exploring the semantic graph of large knowledge
spaces
• these systems are designed only to improve the visual representation of semantic
web like the open shared knowledge databases Freebase and Wikipedia
3
4. Whatsonweb+
• systems that develop the use of web clustering engines as data sources for the
visualization
• these systems forward the user’s queries to the classical web search engines, take
back the results and organize them in categorized groups called clusters, in order
to provide a semantic representation of the information to the user
4
5. TouchGraph Google Browser
• a visual search engine that displays the connections between web sites using
Google technology and visualizing the results in an interactive and customizable
map
5
6. Previous works
• all the previous solutions are built on top of a dedicated resource and cannot be
extended to other information repositories, like:
– web search engines
– multimedia databases (images, video, etc.)
– real-time data
– traditional databases
– simple data sets (csv files, xml, etc.)
• they do not add content extracted
from other resources to improve
information quality
• they do not use advanced
techniques for visualization of
information
6
7. The visual interactive framework
The framework consists of a web-based application which allows users to
perform a query, extracting and merging results from diverse knowledge
repositories, and letting users to explore information by means of an interactive
graph-based user interface
• easily adaptable to all common
data repositories
• related content extraction to enrich
information quality
• Information Visualization techniques
to improve user experience
more efficiency in information retrieval
7
8. The architecture
• The user performs a query against the following different resources:
– the main data source: a web search engine
– the related sources: two multimedia databases and a social network
• results are then processed according to a semantic strategy to create a
descriptions of the results
• the GUI represents descriptions by specific interactive features and visual
elements designed to allow users to explore the information
8
9. Main data source
• the user performs a query against a web search engine which returns clustered
data extracted from the web
• we used the clustering engine Carrot2: search results are organized by topic,
using the Lingo clustering engine
9
10. Related content sources
• information extracted from the main resource are processed and used to query
some of the most common social networks, image and video sharing platforms
• the objective is the enrichment of information with related content in order to
provide the user with a larger number of information sources to improve
information completeness and increase user’s knowledge
• our system uses the publicly available APIs of YouTube as source of video
contents, Flickr for images and pictures and Technorati for social and real-time
content
10
11. Resources description
• all the information extracted from main and related content sources are then
processed to create a semantic structure in the XML format
• data are organized in the following elements:
– entities (called nodes), which represent the descriptors of the documents extracted from
the resources
– relations (edges), which represent the connections among documents
• The structure of data was designed with two goals:
– to be easily extended to all common data repositories or search engines, in order to
implement a standardized representation of single elements, clusters, ranking informations
and semantic relations
– to be lightweight, so that it can be easily delivered and processed by the RIA application
that runs within a browser plugin.
11
12. Graphical User Interface
• the user interface uses graph-based representation techniques in order to
maximizes data comprehension and relations analysis
• visual paradigms are proposed by means of two different metaphors to represent
information
• interactive functions are implemented in order to explore the structure of data
12
13. Information visualization
• each node of the interface is a graphically enriched element representing the
following group of information:
– a cluster returned by the main data source
– related content from the multimedia repositories
– related content from the social repository
• each node is shaped according to the relations
among all the information of the corresponding
group
• a line connecting two different nodes represent
the semantic relation between groups
13
14. Content access
• Users can inspect all the information sources related to each node of the graphs
that contain the search results, such as web pages, blog entries, images and
videos related to the subject of the query
• Contents can be accessed by clicking on buttons under each node name
14
15. Geometric paradigm
• results are represented by a geometric shape differing in size, colour and shape:
– size represents the amount of results returned by the main data source
– colour represents the amount of results returned by the multimedia repositories
– shape type represents the amount of results returned by the social repository (starting
from the basic shape of a triangle)
15
17. Urban paradigm
• results are represented by elements of a urban landscape:
– building type represents the amount of results returned by the main data source
– trees represent the number of amount returned by the multimedia repositories
– people represent the number of amount returned by the social repository
17
19. Experimental analysis
• testing was focused on evaluating the quality of the two different visual paradigms
and the usability of the system
• 11 users with different cultural background and expertise have been selected:
– 3 students and researchers of the Media Integration and Communication Center
– 3 students of the Master in Multimedia Content Design
– 5 non-technical users
• the test was conducted in different sessions according to the following methods:
– trained testing: participants were first given a brief tutorial (lasting about 10 minutes) about
the functions of the system
– untrained testing: participants were required to complete the test without any previous
knowledge of the system
19
20. Users test methodology
• each participant was asked to find a document, image or video about a topic using
a keyword given by the test supervisor
• the tasks assigned in the experiments were:
– task 1: find an installation guide of Ubuntu operating system, using the keyword ubuntu
– task 2: find a web page describing the climate conditions that can be expected in Italy,
using the keyword Italy
– task 3: find the name of the founder of the social network Facebook, using the keyword
facebook
– task 4: find an image of one or more players of American Football, using the keyword
football
• tasks were followed by a short interview in which subjects were asked about their
experiences and their understanding of interface, data representations and visual
paradigms
• parameters used to evaluate the system:
– the number of interactions (mouse clicks) used to complete a task
– the time spent (seconds) used to complete a task
20
21. Trained testing results
• 9 users were assigned to this test (3 users for each results presentation paradigm)
• Google was used to compare results with a traditional visualization interface
21
23. Conclusions and future works
• In this paper was presented a framework to visualize heterogenous information
from the World Wide Web
• Given a query string, the proposed system extracts the results from a web
clustering engine and represent them according to a graph-based visualization
technique
• The GUI allows the end-user to explore the information space and visualize
related content extracted from different resources, like multimedia databases and
social networks
• Two different visualization paradigms have been developed and tested in usability
experiments, to evaluate their effectiveness in letting end-user to have a better
comprehension of the categories and semantic relationships existing between the
search results, thus achieving a more efficient retrieval of the web documents
• Future work will address an extended experimental evaluation with different user-
interfaces, to overcome the difficulties highlighted in the experiments, as well as
an expansion of methods used for the extraction and linking of multimedia content
related to the textual searches
23