SlideShare uma empresa Scribd logo
1 de 77
1
CHAPTER 1: INTRODUCTION
1.0: Introduction
Nowadays the Internet has become a very vast platform for storing information. With just a few
clicks, we can browse to a lot of information. Information stored in the World Wide Web
(WWW) or in short, the Web can be accessed from anywhere, anytime and anyhow. The
massively increasing structured data on the Web (Data Web) and the need for novel methods to
exploit these data to their full potential is the motivation of this thesis. Building on the
remarkable success of Web 2.0 mashups, this thesis regards the websites as a database, where
each web data source is seen as a table, and a mashup is seen as a query over these sources.
Fast development with growing complexity of websites has made the Web become essential to
the Internet users. Besides providing information, websites become a platform where users can
be provided with services such as online booking system. This thesis explores the problem of
aggregating information about online booking system from several websites and delivers them
through one point of access or portal. The aggregation tool used in this research is called Kapow
Mashup Server.
1.1 Problem Statement
There are already quite a number of available websites that support online booking services in
Malaysia like Airasia (www.airasia.com), Malaysia Airlines System
(www.malaysiaairlines.com), Firefly (www.fireflyz.com.my) and Maswing
(www.maswing.com.my). However, looking for the right information such as price rate, date
booking, availability and so on will be time consuming since this has to be done through
repetitive manual browsing of the relevant websites. The need of an automated system to provide
such information is important. Comparisons through a manual browsing will also not a
competitive way. Users want to make a comparison in terms of price rate, will have troubles to
do it by browsing several websites.
2
To aggregate the data from these several websites will need specific and right tools to do it.
Different websites have their own architecture and data is located in the different frames. It is a
challenge how to extract these data and offer integrated access through it in one portal.
1.2 Motivational Example
Searching on the Web, we can find lot of websites or portals that provide information about
online booking. We know that the most important information about online booking that users
would like to have is time, duration, price and comparison of several booking services.
Let us consider the following scenario. When we search for information about a flight schedule,
price, and destination and so on, we have to browse into many websites for the information. Isn’t
it easy if we just browse into one website where from it we are able to browse all airlines that
available for a particular destination? Users also wish to make a comparison of the information
regarding online booking that they are about to make. To open many web browsers and make a
comparison manually is not really an effective way.
Another scenario is, when users want to book for hotel’s room online. As we know today, most
of hotels have their own website which provides online information booking services. Users may
want to know about the hotel price, availability, as well as check-in and check-out date. With one
website that controls and aggregates all these information, it eases users in making a comparison.
With the advent of powerful tools for extracting and integrating data from these several of web
sites realizing this one stop portal is becoming lot easier.
1.3 Research Question
Internet world give us a lot of benefits especially in providing information. Within just a click on
the Internet, people can find any information that they want. There are millions of websites that
are available in the Internet as we know it. People need these information and data to help them
make a decision and to compare information. The question is how to manipulate the data?
3
Internet technology has also developed rapidly towards greater efficiency. People can do
anything through online transactions. Especially for online travelling ticket information, people
want to know about this information, date departure, time of departure, and price of the ticket.
They want to compare all those information before they make a booking.
Since there is a lot of website that they can use to make a booking, it is going to be a tedious
work to compare information between several website by opening it one by one. Then this thesis
comes with the question ―How to aggregate data into a single online ticket website?‖
1.4 Aim and Research Objectives
The main aim of this research is to develop a prototype of a portal as a proof of concept for the
problem of aggregating information that are currently available in Malaysian web-based online
booking systems. Towards this end, we have identified the following specific research
objectives:
1. To identify tools and agents that is suitable for web mashup and aggregation.
2. To explore a way to aggregate and mashup information on online booking system.
3. To collaborate data from several online booking system and create a portal where the data
can be manipulate.
1.5 Summary of Contributions
There are two main contributions of this research work. First, a prototype has been developed as
a proof of concept. The prototype is a portal which contains data or information that will extract
from several online booking systems. The portal will display each information according to
user’s needs.
Secondly, a guidelines and manual will be providing on how a web aggregation and mashup can
be done with selection tools. The guideline will include what is need and what are the
techniques that involve building the prototype.
4
1.6 Thesis Organization
The rest of this thesis is organized as follows. Chapter 2, will describes on literature review,
papers and works that have been done on web aggregation and mashup. In this chapter, several
works or case studies will be discuss and state what scope that they have covered in the web
aggregation and mashup.
In Chapter 3, we will describe about methodology that will be used to achieve the objectives of
the paper. Figure below show the brief about methodology that will be use:
Figure 1.1: Thesis organization
In Chapter 4, more explanations of the prototype in term of its design and implementation will be
elaborate in details.
Chapter 5 will be the conclusion and future enhancement of the thesis.
Literature review and
finding information
Selection of Websites
and suitable tools
Prototype
development
Publishing prototype
and guidelines
5
CHAPTER 2: LITERATURE REVIEW
2.1: Introduction
Works on web aggregation and mashup have grown rapidly. In web development, a mashup is a
web page or application that uses or combines data or functionality from two or many more
external sources to create a new service.
The term implies easy, fast integration, frequently using open APIs and data sources to produce
enriching results that were not necessarily the original reason for producing the raw source data.
According to Larry Dignan [19] based on the presentation by Gartner analyst David Gootzit the
future of portal is mashups, SOA, more aggregation.
2.2: Related works
Momondo.com is a travel search engine that allows the consumer to compare prices on flights,
hotels and car rental. The search engine aggregates results from more than 700 travel websites
simultaneously to within seconds give an overview of the best offers found. Momondo doesn’t
sell tickets; instead it shows the consumer where to buy at the best prices and links to the
supplier. It is free of charge to use Momondo, which receives commission from sponsored links
and advertising. In 2007 NBC Today’s Travel recommended that when it comes to finding the
best offers on flights the consumer should go to sites like Kayak, Mobissimo, SideStep and
Momondo instead of buying tickets from third-party sites that actually sell travel and are dealing
directly with the airlines. In addition to the price comparisons Momondo also offers city guides
written by the site's users and by bloggers based in different cities.
Kayak.com is a travel search engine website based in the United States. Founded in 2004, it
aggregates information from hundreds of other travel sites and helps user’s book flights, hotels,
cruises, and rental cars. Kayak combines results from online travel agencies, consolidators such
as Orbitz, and other sources such as large hotel chains.
Like momondo.com, Kayak doesn't sell directly to the consumer; rather, it aggregates results
from other sites then redirects the visitor to one of these sites for reservation. Thus, Kayak.com
6
makes money from pay per click advertising, when the consumer clicks-through to one of the
compared websites (for example, when the consumer is redirected to the Orbitz website).
2.3: Paper on Web Aggregation and Mashup
In [3], they discuss about the design and implementation of a prototype web information system
that users web aggregation as the core engine. Annotea is one of project that related to the field.
Annotea is a Semantic Web based project for which the inspiration comes from users’
collaboration problem in the web. It examined what users did naturally and selected familiar
metaphors for supporting better collaboration [4]. In [5], they define a semantic web portal as
any web portal that is developed based on semantic web technologies. They are in process of
developing such web portal using available semantic technologies.
Only standard technologies promising generic solution are selected. As result they expect that
they will be able to provide basic development guidelines in the form of portal architecture and
design patterns.
In [6], they examine the development of web aggregators, entities that collect information from a
wide range of sources, with or without prior arrangements, and add value through post-
aggregation services. New web-page extraction tools, context sensitive mediators, and agent
technologies have greatly reduced the barriers to constructing aggregators. They predict that
aggregators will soon emerge in industries where they were not formerly present.
7
2.4 Others Works on Web Aggregation and Mashups
2.4.1 Mapping mashups
In this age of information technology, humans are collecting a prodigious amount of data about
things and activities, both of which are wont to be annotated with locations. All of these diverse
data sets that contain location data are just screaming to be presented graphically using maps.
One of the big catalysts for the advent of mashups was Google's introduction of its Google Maps
API. This opened the floodgates, allowing Web developers to mash all sorts of data onto maps.
Not to be left out, APIs from Microsoft (Virtual Earth), Yahoo (Yahoo Maps), and AOL
(MapQuest) shortly followed.
2.4.2 Video and photo mashups
The emergence of photo hosting and social networking sites like Flickr with APIs that expose
photo sharing has led to a variety of interesting mashups. Because these content providers have
metadata associated with the images they host (such as who took the picture, what it is a picture
of, where and when it was taken, and more), mashup designers can mash photos with other
information that can be associated with the metadata. For example, a mashup might analyze song
or poetry lyrics and create a mosaic or collage of relevant photos, or display social networking
graphs based upon common photo metadata (subject, timestamp, and other metadata.). Yet
another example might take as input a Web site (such as a news site like CNN) and render the
text in photos by matching tagged photos to words from the news.
2.4.3 Search and Shopping mashups
Search and shopping mashups have existed long before the term mashup was coined. Before the
days of Web APIs, comparative shopping tools such as BizRate, PriceGrabber, MySimon, and
Google's Froogle used combinations of business-to-business (b2b) technologies or screen-
scraping to aggregate comparative price data. To facilitate mashups and other interesting Web
applications, consumer marketplaces such as eBay and Amazon have released APIs for
programmatically accessing their content.
8
2.4.4 News mashups
News sources (such as the New York Times, the BBC, or Reuters) have used syndication
technologies like RSS and Atom since 2002 to disseminate news feeds related to various topics.
Syndication feed mashups can aggregate a user's feeds and present them over the Web, creating a
personalized newspaper that caters to the reader's particular interests. An example is Diggdot.us,
which combines feeds from the techie-oriented news sources Digg.com, Slashdot.org, and
Del.icio.us.
2.5 Related Technologies
A mashup application is architecturally comprised of three different participants that are
logically and physically disjoint: API/content providers, the mashup site, and the client's Web
browser.
 The API/content providers. These are the providers of the content being mashed. To
facilitate data retrieval, providers often expose their content through Web-protocols such
as REST, Web Services, and RSS/Atom. However, many interesting potential data-
sources do not conveniently expose APIs. Mashups that extract content from sites like
Wikipedia, TV Guide, and virtually all government and public domain Web sites do so
by a technique known as screen scraping. In this context, screen scraping denotes the
process by which a tool attempts to extract information from the content provider by
attempting to parse the provider's Web pages, which were originally intended for human
consumption.
 The mashup site. This is where the mashup is hosted. Interestingly enough, just because
this is where the mashup logic resides, it is not necessarily where it is executed. On one
hand, mashups can be implemented similarly to traditional Web applications using
server-side dynamic content generation technologies like Java servlets, CGI, PHP or
ASP.
9
 The client's Web browser. This is where the application is rendered graphically and
where user interaction takes place. As described above, mashups often use client-side
logic to assemble and compose the mashed content.
2.5.1 Ajax
There is some dispute over whether the term Ajax is an acronym or not (some would have it
represent "Asynchronous JavaScript + XML"). Regardless, Ajax is a Web application model
rather than a specific technology. It comprises several technologies focused around the
asynchronous loading and presentation of content:
 XHTML and CSS for style presentation
 The Document Object Model (DOM) API exposed by the browser for dynamic display
and interaction
 Asynchronous data exchange, typically of XML data
 Browser-side scripting, primarily JavaScript
When used together, the goal of these technologies is to create a smooth, cohesive Web
experience for the user by exchanging small amounts of data with the content servers rather than
reload and re-render the entire page after some user action. You can construct Ajax engines for
mashups from various Ajax toolkits and libraries (such as Sajax or Zimbra), usually implemented
in JavaScript. The Google Maps API includes a proprietary Ajax engine, and the effect it has on
the user experience is powerful: it behaves like a truly local application in that there are no
scrollbars to manipulate or translation arrows that force page reloads.
2.5.2 Web protocols: SOAP and REST
Both SOAP and REST are platform neutral protocols for communicating with remote services.
As part of the service-oriented architecture paradigm, clients can use SOAP and REST to interact
with remote services without knowledge of their underlying platform implementation: the
functionality of a service is completely conveyed by the description of the messages that it
requests and responds with.
10
SOAP is a fundamental technology of the Web Services paradigm. Originally an acronym for
Simple Object Access Protocol, SOAP has been re-termed Services-Oriented Access Protocol (or
just SOAP) because its focus has shifted from object-based systems towards the interoperability
of message exchange. There are two key components of the SOAP specification. The first is the
use of an XML message format for platform-agnostic encoding, and the second is the message
structure, which consists of a header and a body. The header is used to exchange contextual
information that is not specific to the application payload (the body), such as authentication
information. The SOAP message body encapsulates the application-specific payload. SOAP
APIs for Web services are described by WSDL documents, which themselves describe what
operations a service exposes, the format for the messages that it accepts (using XML Schema),
and how to address it. SOAP messages are typically conveyed over HTTP transport, although
other transports (such as JMS or e-mail) are equally viable.
REST is an acronym for Representational State Transfer, a technique of Web-based
communication using just HTTP and XML. Its simplicity and lack of rigorous profiles set it apart
from SOAP and lend to its attractiveness. Unlike the typical verb-based interfaces that you find
in modern programming languages (which are composed of diverse methods such as
getEmployee(), addEmployee(), listEmployees(), and more), REST fundamentally supports
only a few operations (that is POST, GET, PUT, DELETE) that are applicable to all pieces of
information. The emphasis in REST is on the pieces of information themselves, called resources.
For example, a resource record for an employee is identified by a URI, retrieved through a GET
operation, updated by a PUT operation, and so on. In this way, REST is similar to the document-
literal style of SOAP services.
2.5.3 Screen scraping
Lack of APIs from content providers often forces mashup developers to resort to screen scraping
in order to retrieve the information they seek to mash.
Scraping is the process of using software tools to parse and analyze content that was originally
written for human consumption in order to extract semantic data structures representative of that
information that can be used and manipulated programmatically.
11
A handful of mashups use screen scraping technology for data acquisition, especially when
pulling data from the public sectors. For example, real-estate mapping mashups can mash for-
sale or rental listings with maps from a cartography provider with scraped "comp" data obtained
from the county records office. Another mashup project that scrapes data is XMLTV, a
collection of tools that aggregates TV listings from all over the world.
Screen scraping is often considered an inelegant solution, and for good reasons. It has two
primary inherent drawbacks. The first is that, unlike APIs with interfaces, scraping has no
specific programmatic contract between content-provider and content-consumer. Scrapers must
design their tools around a model of the source content and hope that the provider consistently
adheres to this model of presentation. Web sites have a tendency to overhaul their look-and-feel
periodically to remain fresh and stylish, which imparts severe maintenance headaches on behalf
of the scrapers because their tools are likely to fail.
The second issue is the lack of sophisticated, re-usable screen-scraping toolkit software,
colloquially known as scrAPIs. The dearth of such APIs and toolkits is largely due to the
extremely application-specific needs of each individual scraping tool. This leads to large
development overheads as designers are forced to reverse-engineer content, develop data models,
parse, and aggregate raw data from the provider's site.
2.5.4 Semantic Web and RDF
The inelegant aspects of screen scraping are directly traceable to the fact that content created for
human consumption does not make good content for automated machine consumption. Enter the
Semantic Web, which is the vision that the existing Web can be augmented to supplement the
content designed for humans with equivalent machine-readable information. In the context of the
Semantic Web, the term information is different from data; data becomes information when it
conveys meaning (that is, it is understandable).
The Semantic Web has the goal of creating Web infrastructure that augments data with metadata
to give it meaning, thus making it suitable for automation, integration, reasoning, and re-use.
12
The W3C family of specifications collectively known as the Resource Description Framework
(RDF) serves this purpose of providing methodologies to establish syntactic structures that
describe data. XML in itself is not sufficient; it is too arbitrary in that you can code it in many
ways to describe the same piece of data. RDF-Schema adds to RDF's ability to encode concepts
in a machine-readable way. Once data objects can be described in a data model, RDF provides
for the construction of relationships between data objects through subject-predicate-object triples
("subject S has relationship R with object O"). The combination of data model and graph of
relationships allows for the creation of ontologies, which are hierarchical structures of
knowledge that can be searched and formally reasoned about. For example, you might define a
model in which a "carnivore-type" as a subclass of "animal-type" with the constraint that it "eats"
other "animal-type", and create two instances of it: one populated with data concerning cheetahs
and polar bears and their habitats, another concerning gazelles and penguins and their respective
habitats. Inference engines might then "mash" these separate model instances and reason that
cheetahs might prey on gazelles but not penguins.
RDF data is quickly finding adoption in a variety of domains, including social networking
applications (such as FOAF -- Friend of a Friend) and syndication (such as RSS, which I
describe next). In addition, RDF software technology and components are beginning to reach a
level of maturity, especially in the areas of RDF query languages (such as RDQL and SPARQL)
and programmatic frameworks and inference engines (such as Jena and Redland).
2.5.5 RSS and ATOM
RSS is a family of XML-based syndication formats. In this context, syndication implies that a
Web site that wants to distribute content creates an RSS document and registers the document
with an RSS publisher. An RSS-enabled client can then check the publisher's feed for new
content and react to it in an appropriate manner.
RSS has been adopted to syndicate a wide variety of content, ranging from news articles and
headlines, changelogs for CVS checkins or wiki pages, project updates, and even audiovisual
data such as radio programs. Version 1.0 is RDF-based, but the most recent, version 2.0, is not.
13
Atom is a newer, but similar, syndication protocol. It is a proposed standard at the Internet
Engineering Task Force (IETF) and seeks to maintain better metadata than RSS, provide better
and more rigorous documentation, and incorporates the notion of constructs for common data
representation.
These syndication technologies are great for mashups that aggregate event-based or update-
driven content, such as news and weblog aggregators.
2.6 Aggregation and Mashup Challenges
To mashup and aggregate the web, it has its own challenges. The challenges can be divided into
three which is technical challenges, component challenges and social challenges.
2.6.1 Technical Challenges
Like any other data integration domain, mashup development is replete with technical challenges
that need to be addressed, especially as mashup applications become more features and
functionality rich.
For example, translation systems between data models must be designed. When converting data
into common forms, reasonable assumptions often have to be made when the mapping is not a
complete one (for example, one data source might have a model in which an address-type
contains a country-field, whereas another does not). Already challenging, this is exacerbated by
the fact that the mashup developers might not be domain experts on the source data models
because the models are third-party to them, and these reasonable assumptions might not be
intuitive or clear.
In addition to missing data or incomplete mappings, the mashup designer might discover that the
data they wish to integrate is not suitable for machine automation; that it needs cleansing.
For example, law enforcement arrest records might be entered inconsistently, using common
abbreviations for names (such as "mkt sqr" in one record and "Market Square" in another),
14
making automated reasoning about equality difficult, even with good heuristics. Semantic
modelling technologies, such as RDF, can help ease the problem of automatic reasoning between
different data sets, provided that it is built-in to the data-store. Legacy data sources are likely to
require much human effort in terms of analysis and data cleansing before they can be availed to
semantic modelling technologies.
Another host of integration issues facing mashup developers arise when screen scraping
techniques must be used for data acquisition. Deriving parsing and acquisition tools and data
models requires significant reverse-engineering effort. Even in the best case where these tools
and models can be created, all it takes is a re-factoring of how the source site presents its content
to break the integration process, and cause mashup application failure.
2.6.2 Component Challenges
The Ajax model of Web development can provide a much richer and more seamless user
experience than the traditional full-page-refresh, but it poses some difficulties as well. At its
fundamentals, Ajax entails using the browser's client-side scripting capabilities in conjunction
with its DOM to achieve a method of content delivery that was not entirely envisioned by the
browser's designers. However, this subjects Ajax-based applications to the same browser
compatibility issues that have plagued Web designers ever since Microsoft created Internet
Explorer. For example, Ajax engines make use of an XMLHttpRequst object to exchange data
asynchronously with remote servers. In Internet Explorer 6, this object is implemented with
ActiveX rather than native JavaScript, which requires that ActiveX be enabled. Meanwhile in
Mozilla Firefox, this object require an extension or plug-in.
A more fundamental requirement is that Ajax requires that JavaScript be enabled within the
user's browser. This might be a reasonable assumption for the majority of the population, but
there are certainly users who use browsers or automated tools that either do not support
JavaScript or do not have it enabled. One such set of tools are the robots, spiders, and Web
crawlers that aggregate information for Internet and intranet search engines.
Without graceful degradation, Ajax-based mashup applications might find themselves missing
out on both a minority user base as well as search engine visibility.
15
The use of JavaScript to asynchronously update content within the page can also create user
interface issues. Because content is no longer necessarily linked to the URL in the browser's
address bar, users might not experience the functionality that they normally expect when they
use the browser's BACK button, or the BOOKMARK feature. And, although Ajax can reduce
latency by requesting incremental content updates, poor designs can actually hinder the user
experience, such as when the granularity of update is small enough that the quantity and
overhead of updates saturate the available resources. Also, take care to support the user (for
example, with visual feedback such as progress bars) while the interface loads or content is
updated.
As with any distributed, cross-domain application, mashup developers and content providers
alike will also need to address security concerns. The notion of identity can prove to be a sticky
subject, as the traditional Web is primarily built for anonymous access.
2.6.3 Social Challenges
In addition to the technical challenges, social issues have (or will) surface as mashups become
more popular.
One of the biggest social issues facing mashup developers is the trade-off between the protection
of intellectual property and consumer privacy versus fair-use and the free flow of information.
Unwitting content providers (targets of screen scraping), and even content providers who expose
APIs to facilitate data retrieval might determine that their content is being used in a manner that
they do not approve of.
The mashup Web application genre is still in its infancy, with hobbyist developers who produce
many mashups in their spare time. These developers might not be cognizant of (or concerned
with) issues such as security. Additionally, content providers are only beginning to see the value
in providing APIs for machine-based content access, and many do not consider them a core
business focus.
This combination can yield poor software quality, as priorities such as testing and quality
assurance take the backseat to proof-of-concept and innovation. The community as a whole will
16
have to work together to assemble open standards and reusable toolkits in order to facilitate
mature software development processes.
Before mashups can make the transition from cool toys to sophisticated applications, much work
will have to go into distilling robust standards, protocols, models, and toolkits. For this to
happen, major software development industry leaders, content providers, and entrepreneurs will
have to find value in mashups, which means viable business models. API providers will need to
determine whether or not to charge for their content, and if so, how (for example, by subscription
or by per-use). Perhaps they will provide varying levels of quality-of-service. Some marketplace
providers, such as eBay or Amazon, might find that the free use of their APIs increases product
movement. Mashup developers might look for an ad-based revenue model, or perhaps build
interesting mashup applications with the goal of being acquired.
17
CHAPTER 3: METHODOLOGY
3.1: Research Activities
To complete this research, prototyping approach has been used in the system development
process. From the research activities carried out and the development of the prototype, it is
sufficient to prove that conceptual idea will work satisfactory towards archieving the main idea
of web mashup and aggregation.
The proposed research methodology will primarily focused on three main activities. Firstly,
emphasis will be put on the identifying of the information model to be used for aggregating web
data. These include tools, platform and latest technology that will be used in the prototype
development. It is important that the tools that involve are easy to use and has certain ability to
just not extracting data but more than that such as update the data as the client website changing.
In this stage, object and class will be identifying to satisfied information that we need to harvest.
Secondly, after information model is identified and tools is confirmed, an integration and
collaboration of the websites we be put on focus. Model and web bot (Kapow Robot) is
developed and deploy to harvest information that is need. This bots will bring back the data and
these are data that we will use.
Lastly, interface will develop to be as the portal for information that has been collected by the
mashup robot.
18
3.2: Overview of Development Process
A prototype will be developing in the research. Prototyping is the rapid development of a system.
In the past, the developed system was normally thought of as inferior in some way to the
required system so further development was required. There are 5 steps in prototype
development methodology. Below are the steps:
1. Gather requirements
2. Build prototype
3. Evaluate prototype
4. If accepted, throw away prototype and redesign
5. if rejected, re-gather requirements and repeat from step 2
Below is the illustration picture of prototype methodology:
Figure 3.1: Prototype Model
3.3 Gather requirement
There are a few methods that are used to gather requirement. In this case, we used internet,
relevant papers, journals, and explore in the university library to find information. Basically,
information about previous paper journal or researches about the topic are needed. Comes into
the prototype specifically, the seawares, hardware and technologies that suitable to develop the
prototype are needed.
Evolutionary
prototyping
Throw-away
Prototyping
Delivered
system
Executable Prototype +
System Specification
Outline
Requirements
19
There are a few websites that use almost the same concept of the research prototype. For an
example, www.kayak.com. According to wikipedia, Kayak.com is a travel search engine website
based in the United States. Founded in 2004, it aggregates information from hundreds of other
travel sites and helps users’ book flights, hotels, cruises, and rental cars. Kayak combines results
from online travel agencies, consolidators such as Orbitz, and other sources such as large hotel
chains. Kayak is built on a Java, Apache, Perl and Linux platform. It uses XML-HTTP and
JavaScript to create a rich User interface.
When it comes to the prototype development, we are going to need the tools and software to
developing it. Table below are list of software and hardware required:
Software requirement:
EDI Platform
Windows Windows 2000 SP2 and SP4, Windows Server
2003 Standard Edition SP1 and x64 Standard
Edition, Windows Server 2003 R2 Standard
Edition for x86 and x64, Windows XP SP2,
Windows Vista
Linux Red Hat Enterprise Linux 5.0
Server Platform
Windows Windows 2000 SP2 and SP4, Window Server
2003 Standard Edition SP1 and x64 Standard
Edition, Windows Server 2003 R2 Standard
Edition for x86 and x64, Windows XP SP2
Linux Red Hat Enterprise Linux 5.0, Debian 4.0 (on
x86 and x64)
Database
Oracle Version 8.1.7.0, 9i R2 and 10g R2
IBM DB2 UDB 7.2 and 8.2
Microsoft SQL Server Version 2000 and 2005
Sybase Adaptive Server Enterprise 15.0
20
PointBase Server Version 4.4 and 4.5
MySQL Version 4.0, 4.1 and 5.0
APIs
Java J2SE 1.3 + JAXP or J2SE 1.4 or later
.NET C#, .NET Version 1.0 and 1.1
Clipping Portlets
BEA WebLogic Portal Version 8.1 (all service packs)
IBM WebSpere Portal Version 5.0 and 5.1
Standard Java Portal JSR-168
Clipping Browsers
Microsoft Internet Explorer Version 6.0
Mozilla Firefox Version 1.5+ (both Windows and Linux)
Tag library
JSP Version 1.2 and 2.0
Web Services
BEA WebLogic Workshop Version 8.1 (all service packs)
.NET .NET Version 1.0 and 1.1
Code Generation
Java J2SE 1.3 or later
.NET C#, .NET Version 1.0 and 1.1
Table 3.1: Software Requirements
21
Hardware requirements:
The table below specifies system specification for different platforms. The requirements may
depend on the application so these should only be taken as guidelines and not as absolute
numbers. A complex clipping solution might require much more power than a simple collection
solution. The recommendations for servers are for one server. The number of servers used for a
given application (the size of a cluster) is a completely different matter and should be estimated
using methods described elsewhere.
Minimum Recommended
IDE
Windows Intel Pentium 1GHz CPU,
512MB RAM,
200MB Free Disk Space
Intel Pentium 2GHz CPU,
1GB RAM, 200MB
Free Disk Space
Linux Intel Pentium 1GHz CPU,
512MB RAM,
200MB Free Disk Space
Intel Pentium 2GHz CPU,
1GB RAM, 200MB Free Disk
Space
Server
Windows Intel Pentium 2GHz CPU,
1GB RAM,
200MB Free Disk Space
Intel Pentium 2GHz CPU,
2GB RAM, 200MB
Free Disk Space
Linux Intel Pentium 2GHz CPU,
1GB RAM,
200MB Free Disk Space
Intel Pentium 2GHz CPU,
2GB RAM, 200MB
Free Disk Space
Souce: http://kdc.kapowtech.com/documentation_6_4/Technical/TechnicalDataSheet6_4.pdf
Table 3.2: Hardware Requirements
Beside information about software and hardware requirement, information about websites that
will be the target for harvesting information also will be identified. Websites that have the ability
to do online booking/ticket system will be put on priority.
22
3.4 Build prototype
In build the prototype, every aspect from installation and reading the manual must be prepared
well. The main tool to develop the prototype is Kapow Mashup Server. The other tools are Intelij
Idea.
3.2.1 Kapow Mashup Server
When we talk about web data access, extraction and harvesting, Kapow Mashup Server is a tool
that suitable to do all those things. Kapow also known as web integration platform and the
Kapow Mashup Server make it possible to access data or content from any browse-able
application or website. Over the past few years, the Kapow Mashup Server has become a new
lightweight services and mashup standard among Internet-intensive businesses in the areas of
media, financial services, travel, manufacturing and information services firms (background
checking, information providers, etc.).
Kapow mashup server is a platform for web integration. Kapow mashup server helps transform
the resources of the web into well define nuggets of information and functionality. In effect,
kapow mashup server transforms a web site into more services available to client application.
According to source from [19], The Kapow Web Data Server powers solutions in web and
business intelligence, portal generation, SOA/WOA enablement, and content migration.
Kapow’s patented visual programming and integrated development environment (IDE)
technology enables business and technical decision-makers to create innovative business
applications. With Kapow, new applications can be completed and deployed in a fraction of the
time and cost associated with traditional software development methods.
The research using Kapow as the main application tools because Kapow have the best result
when it comes to data aggregation. By creating models and robots, all function such as collection
of internal or external web-based data sources, website clip, and so on can be done easily without
doing any programming.
23
Few abilities of the Kapow Mashup Server that contributes to the development of the prototype
are:
- Web integration
- Code generation
- Data harvesting
The Kapow Mashup Server also provides web-to-web data integration functionality, allowing
data extraction from one website, transforming it into a new format, and pushing it through input
forms into a second website. This process can be a many-to-many process, extracting data from
multiple websites, combining and transforming and pushing them into multiple other websites.
Web based transformation is also supported e.g. using a website for real-time language
translation or HTML to XML conversion.
3.2.2 Intelij Idea
To code all web bases programming language such as Java, HTML and PHP, a platform is
needed. IntelliJ IDEA is a code-centric IDE focused on developer productivity. IntelliJ IDEA
deeply understands the code and gives a set of powerful tools without imposing any particular
workflow or project structure. Imagine that we have a large source code-base that we need to
browse or modify it. For instance, we might want to use a library and find out how it works, or
we might need to get acquainted with existing code and to modify it. Yet another example is that
a new JDK becomes available and we are keen to see the changes in the standard Java libraries
and so on. Conventional tools like find and replace text may not completely address these goals
because when we use them, it is easy to find or replace too much or too little. Of course, if
someone already knows the source code well, then using the whole words option and regular
expressions may help make our find-and-replace queries smarter. This is an advantage why we
use this tool for the development of the prototype.
24
3.5 Evaluate prototype
To be more specific, the prototype approach that we use is called extreme prototype. Basically, it
breaks down web development into three phases, each one based on the preceding one. The first
phase is a static prototype that consists mainly of HTML pages. In the second phase, the screens
are programmed and fully functional using a simulated services layer. In the third phase the
services are implemented. The process is called Extreme Prototyping to draw attention to the
second phase of the process, where a fully-functional UI is developed with very little regard to
the services other than their contract.
In this stage, all robots and model that were created with the Kapow Mashup Server are ready to
be deployed. From the ability of kapow that allow generating codes, it will be copy and paste to
the programming development tool Intellij Idea.
25
CHAPTER 4: PROTOTYPE DESIGN AND IMPLEMENTATION
This chapter will explain and show the developer on actual concept of the research and how
various measurements are taken to prove the concept. The whole implementation process of the
aggregation and mashup tool, name as Kapow as Web Aggregation and Mashup for Online
Booking System.
4.1 Conceptual Design
4.1.1 Kapow Mashup Server as Tool
The Kapow Mashup Server enable you to collect, connect and mashup everything on corporate
intranets as well as the World Wide Web. Because of these abilities, the Kapow Mashup Server
has made it the first choice to be use in the development of the prototype.
Kapow Mashup Server provides web-to-web data integration functionality, allowing data
extraction from one website, transforming it into a new format, and pushing it through input
forms into a second website. This process can be a many-to-many process, extracting data from
multiple websites, combining and transforming and pushing them into multiple other websites.
Web based transformation is also supported e.g. using a website for real-time language
translation or HTML to XML conversion.
Figure 4.1: Diffrent layer involved in Kapow Mashup Server (kapow website)
26
Figure 4.1 generally explain how the Kapow work. Tree layer are involved which is integrated
development environment, web based management and scalable server environment. In the next
topic, we will explain more details about each of these layers.
There are 4 important elements that are involved in the integrated development environment
layer. We can call this layer as the primary studio tools of Kapow Mashup Server.
Figure 4.2: Kapow ModelMaker Interface
ModelMaker is the RoboSuite application for writing and maintaining domain models that are
used in RoboMaker. With ModelMaker, we can easily create new domain models and configure
existing models, as well as add, delete, and configure the objects within a domain model.
Meanwhile, RoboMaker is the RoboSuite application for creating and debugging robots.
RoboMaker is an integrated development environment (IDE) for robots. This means that
RoboMaker is all we need for programming robots in an easy-to-understand visual programming
language. To support us in the construction of robots, RoboMaker provides us with powerful
programming features including interactive visual programming; full debugging capabilities, an
overview of the program state, and easy access to context-sensitive online help.
27
Figure 4.3: Kapow Mashup Server RoboMaker Interface.
4.1.2 Architecture of the Prototype
The architecture of the prototype basically consists on 3-tier architecture. There will be an
existing online booking/ticket system on the internet that will be used as the data sources. These
data will be extracted by the kapow mashup server tools, to be specific kapow robot which is
built accordingly with the model. The robot will be deploy to harvest the data that we want and
bring it back to the portal that is develop by using several web programing language and using
apache as the web server. Figure 4.4 show the architecture of the prototype.
28
Figure 4.4: Prototype architecture
29
4.2 Prototype Implementation Design
4.2.1 Websites
In this thesis, for the purpose of testing we will only test the aggregator for one website. The
website is www.Agoda.com. The website provides information about hotel. Information that
usually need for customer are the name of the hotel, rate per night, location and date. We are
using Kapow is the tool for extracting all those information.
Websites have their own structure and design. Website often change their structure, layout and
design so that it suits the current needs. The changing might be a problem to the prototype to
adapt with because the data or information that is want to harvest might be change their location
or position in the website. This may course the robot bring back the wrong information.
4.2.2 Model
Model Maker is used to create and edit object models. An object model is like a type definition
in a programming Language. It defines the structure of the objects that form the input and output
of a robot. model maker a; is a visual tool for creating data objects that define that data structured
utilized by robots for information collection, aggregation and integration.
An object model consists of one or more attribute definitions, each of which define an attribute
name, type, and other information. A given robot will return (or store) objects defined by one or
more object models. For example, a data collection Robot for job postings could return objects
defined by the object model Job. Job would contain attributes such as title and source (short text
types), date (date type), description (long text) and so on. In case the objects are stored in a
database at runtime, the database will have a table definition matching the object model. Model
Maker can generate the SQL necessary to create the required tables in the database.
Firstly, a model for hotel need to be creates. Since this is an input and output type of query, we
need to create two object in the model. HotelQuery to input atributes country, city, checkindate
and checkoutdate. HotelResult to extract output.
30
See pictures how the model look likes.
Figure 4.5 below show the object call HotelQuery with attributes call country, city, checkin, and
checkout.
Figure 4.5: HotelQuery atributes
31
Figure 4.6 show the output object – HotelResult with attributes.
Figure 4.6: HotelResult as output object.
32
4.4: Creating Robot
4.4.1: Creating Robot for Hotel Website
A robot will be creating to be deploying to harvest all information accordingly to the model.
Steps to create the robot are as shown.
Step 1: Choose Integration robot type of robot from New Robot Wizard
Figure 4.7: Create new Integration Robot
33
Step 2: Enter the URL that the robot should start from: www.agoda.com
Figure 4.8: Enter www.agoda.com
34
Step 3: Select the objects that the robot should receive from input. Choose HotelQuery which is
created in the model wizard.
Figure 4.9: Choose HotelQuery
35
Step 4: From the wizard, select the objects that the robot should return as output.
Figure 4.10: Choose HotelResult
36
Step 5: Select objects that the robot should use for holding temporary data during its execution.
Figure 4.11: ScratchPad holding the temporary data
Figure 4.12: Two output objects HotelResult and ScratchPad
37
Step 6: Entering the next attribute
Figure 4.13: Loading website into the Kapow interface
38
Step 7:
Figure 4.14: Select country
39
Step 8:
Figure 4.15: Select clipping
40
Step 9:
Figure 4.16: Debugging the robot
41
Step 10:
Figure 4.17: Information that is collected
42
4.4.2: Creating Robot for Flight Website
Airasia.com is chosen to be the website where information about the flight is extract. Firstly,
model will be creating as follow.
Create a model for the robot. First create a model for flight_in.model which will collect all data
about from where you want to fly. This will be as your input data. Add attributes below:
1. Origin [data type: short text]
2. Destination [data type: short text]
3. Dep_date [data type: date]
4. Ret_date [data type: date]
Figure 4.18: Show how Flight_In attributes looks like.
43
Create a model for flight_Out.model which will collect all data about the destination of your
flight. This will be your output data. Add attributes for flight_out.model as below list:
1. Origin [data type: short text]
2. Destination [data type: short text]
3. Dep_date [data type: date]
4. Arr_date [ data type: date]
5. Flight_no [data type: date]
6. Price [data type: number]
7. Currency [data type: short text]
8. Carrier [data type: short text]
Figure 4.19: Show how Flight_Out attributes looks like.
Save model as flight.model.
44
Creating airasia.robot
First, open RoboMaker application.
Figure 4.20: Creating Airasia.robot
Choose Create a new robot…
And click OK.
45
Figure 4.21: Choose Integration robot then, click NEXT
Figure 4.22: Enter the URL that the robot should start from
http://www.airasia.com/site/my/en/home.jsp
Integration
robot
http://www.airasia.com/site/my/en/home.jsp
46
Figure 4.23: Select the objects to input to the robot.
Click here to add Flight_In
47
Figure 4.24: Select the objects to output from the robot.
Click FINISH
Figure 4.25: This is how the first screen looks like, Load Page.
Select Flight_Out for
output object
48
Move your cursor to the origin and right-click your mouse. As shown in Figure 4.26, click on
―Select Option‖.
Figure 4.26: Select Option for Origin.
A pop-up screen will appear and choose from drop-down menu for your Origin and set it as
Value. In this tutorial, choose Kuala Lumpur LCCT.
Click On Select Option
49
Figure 4.27: Option to Select.
Do the same way to destination which is in this case Bintulu will be the destination.
Figure 4.28: Set the destination.
Select Origin :
Kuala Lumpur
LCCT
Set as value
Select Bintulu as the
destination
50
Next, we need to set the Departure date for the flight. The date will be extract to
Flight_In.dep_date. Put your cursor on the Departure Date and right-click on it. Select Option
and choose the date of departure. See Figure 4.29 and Figure 4.30.
Figure 4.29: Select Option for date of departure.
Figure 4.30: Select day of departure.
Right-click on date and select option
51
Day must be inserted as what it is. To do that, we need to convert the full date format to extract
only day from it. See Figure 4.31.
Figure 4.31: Select Converters
Figure 4.32: Get attribute
Select converters
Clicks configure
to configure Get
Attribute.
Configure it as
shown in Figure
4.33
52
Figure 4.33: Set attribute as Flight_In.dep_date.
However, value is not appearing there until you set the value by your own in all objects of
Flight_In. The values are as below (see Figure 4.34):
Flight_In.Origin = Kuala Lumpur LCCT
Flight_In.Destination = Bintulu
Flight_In.dep_date = 2009-08-01 00:00:00.0
Flight_In.ret_date = 2009-08-05 00:00:00.0
53
Figure 4.34: Setting attributes
After that, click on ―Configure ―to set the day for Flight_In.dep_date as shown in Figure 4.35.
Figure 4.35: Set attribute as Flight_In.dep_date
Fill attributes with
information as
shown
After fill all attributes,
click apply
Attribute: Flight_In.dep_date
54
Click on to add Converter and select Date Handling and choose Format Date as shown in
Figure 4.36.
Figure 4.36: Formatting date.
Configure Format Date as shown in Figure 4.37.
Click on Format Date
55
Figure 4.37: Format pattern.
Do the same steps for Month and Year of the departure date. However, in this step choose ―Aug
2009(―200908‖) as the Option to select and set it as value. See Figure 4.38.
Figure 4.38: Date format
Repeat step in Figure 4.30, Figure 4.31, Figure 4.32, Figure 4.33, Figure 4.34, Figure 4.35,
Figure 4.36, Figure 4.37, and Figure 4.38 to set a Day, Month and Year for Flight_In.ret_date.
In this case use 05 August 2009 as the return date.
Change Format
Pattern to “dd”
56
Next steps put your cursor on ―Search’ button as shown in Figure 4.39. Right-click on it and
choose ―Click‖.
Figure 4.39: Choose click to search for flight.
After click on search button, a screen which is the result of searching will appear. Click on the
table of the flight information and expand to create loops.
Choose “Click”
57
Figure 4.40: Creating loops
Figure 4.41: First tag finder
Next steps is extracting information for Flight_Out object.
Step 1: Click on table
Step 2: Expand the Green Line square to get loops
Step 3: Right-click inside greed square,
choose Loops and select Far Each Tag
Step 1: Click back to loops
Step 2:Replace 0
to 1
Step 3: Click here
58
Figure 4.42: Extracting to Flight_Out.origin
Configure Extraction using Advance Extract as seen in Figure 4.43.
Figure 4.43: Configure extraction by using Advance Extract.
Step 1: Click on
the Kuala Lumpur,
Expand it. Right-
click on the words.
Select Extraction,
Select Text and
choose
Flight_Out.Origin
Select Advance Extract
59
Figure 4.44: Patten, Output Expression
Next step is to extract the departure date. See steps in picture
Figure 4.45: Steps to extract date of departure to Flight_Out.dep_date.
Click
Configure to
configure
the words
Use this pattern:
.*to(.*)(.*
Output Expression:
$1
Step 1: right-click on the hour
Step 2: Choose Extraction => Extract
Date => Flight_Out.dep_date
60
Next step is to configure the date format. See Figure 4.46.
Figure 4.46: Date Format
Do the same steps to extract arrival date and save it to Flight_Out.arr_date. See steps in Figure
4.47.
Format Pattern: hhmm
61
Figure 4.47: Extracting arrival date
Figure 4.48: Set the Format Pattern of the date as hhmm as well.
Step 1: Right-click on
arrival hour
Step 2: Choose Extraction => Extract
date => Flight_Out.arr_date.
62
As we can see at the browser tables of information about Departure have all this (Figure 4.49).
We need to extract Depart (0905) to Flight_Out.dep_date, Arrive(1100) to Flight_Out.arr_date
which we have already done in previous steps. Now we need to extract Flight (AK 5146) to
Flight_Out.flight_no, Fare (156.00) to Flight_Out.price and Currency (MYR) to
Flight_Out.currency.
Figure 4.49: Depart table
Extracting Flight Number, see Figure 4.50.
4.50: Extracting the flight number.
63
Extracting price, see Figure 4.51.
Figure 4.51: Extracting price to Flight_Out.price.
Extracting the currency to Flight_Out.currency. See Figure 4.52.
64
Figure 4.52: Extracting currency.
For currency, set it as Advance Extract. See Figure 4.53.
Figure 4.53: Format for Advance Extract.
Step 1: Click here Add
Advance Extract
Step 2: Click here to
configure
Step 3: Set
pattern as
.* (.*)
Step 3: Set
Output
Expression
as $1
65
At the end of the robot, we must return the object to itself. See Figure 4.54.
Figure 4.54: Returning object.
For Return table, we need to extract some information like we did for Depart. Apply all steps
that used in the Table Depart.
Figure 4.55: Return table.
Creating branch and loops for table Return. See Figure 4.56 and Figure 4.57.
Choose return object
to Flight_Out
66
Figure 4.56: Creating branch.
Figure 4.57: Creating loops for Return table.
Step 1: click
here
Step 2: Click
here to create
branch
New branch
will appear.
Step 1: Click any
area in side table
Step 2: Expand the
green square to
satisfy the table for
loop Step 3: Right-click on
the green box area.
Step 4: Choose For
Each Tag loops.
67
After this steps, follow steps that we applied for Depart Table.
Last but not least, the robot must be debugging. Click on debug icon to debug the robot. See
Figure 4.58.
Figure 4.58: Debugging
The debugging screen will appear after you click debug icon. See steps in Figure 4.59 how to run
the debugging.
Figure 4.59: Debug screen
Click here to
debug.
Click here to run the
debugging.
Information collected
by your robot.
68
4.4.3 Intellij Setting
For the prototype, the setting is need. The general setting is as shown in the picture.
Figure 4.60: Path of the project compiler
From the picture, the path of the project compiler output needs to be set. This path is use to store
all project compilation results. A directory corresponding to each module is created under this
path. This directory contains two subdirectories: Production and Test for production code and
test sources, respectively. A module specific compiler output path can be configured for each of
the modules as required. In this case, directories call ―workspace‖ and subdirictories ―hotel‖ and
―out‖. The path is C:workspacehotelout.
69
Figure 4.61: Setting classes
Classes for the project also need to be set. Attach the classes to C:Program FilesKapow Mashup
Server 6.4APIrobosuite-java-apilibrobosuite-api.jar. This is to link the Intellij IDEA to Kapow
Robosuite API.
70
CHAPTER 5: FUTURE ENHANCEMENT AND CONCLUSION
5.1: Future Enhancement
From earlier, this paper introduces a way to collaborate and aggregate information from several
online tickets booking services website such as hotels, airlines and tickets booking. The
emphasize is on the technique and way to aggregation with Kapow Mashup. The prototype that
is developed is use only one website which is the hotel website. In the future, the websites could
be added. More research also could be done on how to make a comparison and use the data that
is collect by the robot and applying it for data mining purposes.
The area of Web services aggregations is seeing a large amount of activity as aggregation
mechanisms are still evolving. Some are being extended and new ones created to enhance their
capabilities. As multiple proposals emerge for aggregating Web services, it is important to
understand where the mechanisms needed fit in and how they relate to existing approaches.
Ongoing work will reflect the effects of the evolution of core specifications, including WSDL, as
well as the growth and adoption of Web services aggregation techniques. Refining and
expanding the classification will consider both adding categories, and additional dimensions for
existing categories, such as level and focus of constraints. We are also interested in identifying
primitive aggregation mechanisms, and understanding the conditions under which they may or
may not be combined.
The World Wide Web contains an immense amount of information, thus it is nowadays often
thought of as a huge database. However, like for relational databases, a database management
system (DBMS) is needed to combine data from different sources and give the information a
new meaning. In above sections API driven Mashup building was introduced as a way of mixing
up data from different Web sources just like combining data from different tables in a relational
database, which provided a way of managing information stored in the database we call the
World Wide Web. Building Mashups using API’s require high programming skills though and so
they are quite useless for a regular person, who wants to mix up data sources from all over the
web. Another point is that most information on the Web is not accessible over an API, so only a
small part of the WWW is remix-able. In [16], the vision of gathering data for Mashups easier in
the future is stated.
71
5.2 Conclusion
Everyday information will keep on added into websites throughout the world as long as there is
an access into the World Wide Web. People can rely on the internet whenever they need
information. Just one click into the net, they can have the information that they want. The
massive information and data on the internet need to be exploited and change it into useful
information. Assume that, website is databases consist of tables and using the website aggregator
tools we can query data from the website. This paper described how mashup technique can be
used to solve specific service issues for end users. In relation with this issue, a mashup technique
is proposed using tool that called Kapow Mashup Server. It is also described the relevant
technologies that can be used for mashup in different service layers. This type of architecture can
leverage and integrate the end user relevant information from the existing web applications in the
web.
72
REFERENCES
1. Mustafa Jarrar, Marios D. Dikaiakos: A Data Mashup Language for the Data Web
2. Bizer C, Heath T, Berners-Lee T:Linked Data: Principles and State of the Art. WWW
(2008)
3. Ainie Zeinaida, Nor Adnan Yahaya: Design and Implementation of an Aggregation-
based Tourism Web Information System
4. Marja-Riita Koivunen: Annotea and Semantic Web Supported Collaboration
5. Lidia Rovan: Realizing Semantic Web Portal Using Available Semantic Web
Technologies and Tools
6. Stuart Madnick, Michael Siegel: Seizing the Opportunity: Exploiting Web Aggregation
7. http://queue.acm.org/detail.cfm?id=1017013
8. http://www.langpop.com/. Retrieved 2009-01-16.
9. http://www.thirdnature.net/about_us.html
10. F. Curbera, M. Duftler, R. Khalaf, N. Mukhi, W. Nagy, and S. Weerawarana. BPWS4J.
Published online by IBM at http://www.alphaworks.ibm.com/tech/bpws4j, Aug 2002.
11. Fancisco Curbera, Matthew Duftler, Rania Khalaf, William Nagy, Nirmal Mukhi, and
Sanjiva Weerawarana. Unraveling the web services web: An introduction to SOAP,
WSDL, and UDDI. IEEE Internet Computing, 6(2):86–93, 1 2002.
12. Francisco Curbera, Rania Khalaf, Frank Leymann, and Sanjiva Weerawarana. Exception
handling in the bpel4ws language. In International Conference on Business Process
Management (BPM2003), LNCS, Eindhoven, the Netherlands, June 2003. Springer.
13. Francisco Curbera, Rania Khalaf, Nirmal Mukhi, Stefan Tai, and S. Weerawarana. Web
services, the next step: Robust service composition. Communications of the ACM:
Service Oriented Computing, 10 2003.
73
14. Francisco Curbera, Sanjiva Weerawarana, and Matthew J. Duftler. On component
composition languages. In Proc. International Workshop on Component–Oriented
Programming, May 2000.
15. Eric M. Dashofy, Nenad Medvidovic, and Richard N. Taylor. Using off-the-shelf
middleware to implement connectors in distributed software architectures. In Proc. of
International Conference on Software Engineering, pages 3–12, Los Angeles, California,
USA, May 1999.
16. Iskold, A. Yahoo! Pipes and the Web as Database. Available at
http:// www.readwriteweb.com/archives/yahoopipesweb-database.php. (Accesed on
01/01/2010)
17. A Mashup Architecture for Web End-user Application Designs Shah J Miah I and John
Gamlnack 2, Institute for Integrated and Intelligent Systems, Griffith University, Nathan
Calnpus, QLD 4111, Australia
18. The RDF Book Mashup: From Web APIs to a Web of Data Christian Bizer, Richard
Cyganiak, and Tobias Gauß, Freie Universit¨at Berlin
19. http://kapowtech.com/index.php/about-us/overview
74
GLOSSARY
World Wide Web
(WWW)
The World Wide Web, abbreviated as WWW and commonly known
as the Web, is a system of interlinked hypertext documents accessed
via the Internet. With a web browser, one can view web pages that
may contain text, images, videos, and other multimedia and navigate
between them by using hyperlinks.
Data Web Data Web refers to the transformation of the Web from a distributed
file system into a distributed database system.
Web 1.0 Web 1.0 (1991-2003) is a retronym that refers to the state of the
World Wide Web, and any website design style used before the
advent of the Web 2.0 phenomenon. Web 1.0 began with the release
of the WWW to the public in 1999, and is the general term that has
been created to describe the Web before the "bursting of the Dot-com
bubble" in 2001.
Since 2004, Web 2.0 has been the term used to describe the current
web design, business models and branding methods of sites on the
World Wide Web.
Web 2.0 The term Web 2.0 is commonly associated with web applications that
facilitate interactive information sharing, interoperability, user-
centered design, and collaboration on the World Wide Web. A Web
2.0 site gives its users the free choice to interact or collaborate with
each other in a social media dialogue as creators (prosumer) of user-
generated content in a virtual community, in contrast to websites
where users (consumer) are limited to the passive viewing of content
that was created for them. Examples of Web 2.0 include social-
networking sites, blogs, wikis, video-sharing sites, hosted services,
web applications, mashups and folksonomies.
APIs An application programming interface (API) is an interface
75
implemented by a software program that enables it to interact with
other software.
SOA Search oriented architecture, the use of search engine technology as
the main integration component in an information system
Annotea In metadata, Annotea is an RDF standard sponsored by the W3C to
enhance document-based collaboration via shared document metadata
based on tags, bookmarks, and other annotations.
Semantic Web Semantic Web is a group of methods and technologies to allow
machines to understand the meaning - or "semantics" - of information
on the World Wide Web.
RSS RSS (most commonly expanded as Really Simple Syndication) is a
family of web feed formats used to publish frequently updated
works—such as blog entries, news headlines, audio, and video—in a
standardized format.
ATOM The name Atom applies to a pair of related standards. The Atom
Syndication Format is an XML language used for web feeds, while
the Atom Publishing Protocol (AtomPub or APP) is a simple HTTP-
based protocol for creating and updating web resources.
REST Representational State Transfer (REST) is a style of software
architecture for distributed hypermedia systems such as the World
Wide Web.
Java servlets A Servlet is a Java class in Java EE that conforms to the Java Servlet
API, a protocol by which a Java class may respond to HTTP requests.
CGI Common Gateway Interface, a protocol for calling external software
via a web server to deliver dynamic content (and .cgi, its associated
file extension)
PHP PHP: Hypertext Preprocessor is a widely used, general-purpose
scripting language that was originally designed for web development
to produce dynamic web pages.
ASP Active Server Pages, a web-scripting interface by Microsoft.
76
JavaScript JavaScript is an implementation of the ECMAScript language
standard and is typically used to enable programmatic access to
computational objects within a host environment.
XML Extensible Markup Language (XML) is a set of rules for encoding
documents in machine-readable form. It is defined in the XML 1.0
Specification produced by the W3C, and several other related
specifications, all gratis open standards.
XHTML XHTML (eXtensible Hypertext Markup Language) is a family of
XML markup languages that mirror or extend versions of the widely
used Hypertext Markup Language (HTML), the language in which
web pages are written.
CSS Cascading Style Sheets (CSS) is a style sheet language used to
describe the presentation semantics (the look and formatting) of a
document written in a markup language.
DOM Document Object Model, a way to refer to XML or (X)HTML
elements as objects
SOAP SOAP, originally defined as Simple Object Access Protocol, is a
protocol specification for exchanging structured information in the
implementation of Web Services in computer networks.
WSDL The Web Services Description Language (WSDL, pronounced 'wiz-
del') is an XML-based language that provides a model for describing
Web services.
XMLTV XMLTV is an XML based file format for describing TV listings.
IPTV providers use XMLTV as the base reference template in their
systems, and extend it internally according to their business needs.
RDF Resource Description Framework, an official World Wide Web
Consortium (W3C) Semantic Web specification for metadata models
FOAF Friend of a friend (FOAF) is a phrase used to refer to someone that
one does not know well, literally, a friend of a friend.
IETF The Internet Engineering Task Force (IETF) develops and promotes
77
Internet standards, cooperating closely with the W3C and ISO/IEC
standards bodies and dealing in particular with standards of the
TCP/IP and Internet protocol suite.
ActiveX ActiveX is a framework for defining reusable software components in
a programming language independent way. Software applications can
then be composed from one or more of these components in order to
provide their functionality.
WOA Web Oriented Architecture, a computer systems architectural style.
IDE An integrated development environment (IDE) also known as
integrated design environment or integrated debugging environment
is a software application that provides comprehensive facilities to
computer programmers for software development.
DBMS A Database Management System (DBMS) is a set of computer
programs that controls the creation, maintenance, and the use of a
database.

Mais conteúdo relacionado

Mais procurados

Mais procurados (9)

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
C03406021027
C03406021027C03406021027
C03406021027
 
Jargon buster
Jargon busterJargon buster
Jargon buster
 
Web 3.0 and What It Means to Marketing
Web 3.0 and What It Means to MarketingWeb 3.0 and What It Means to Marketing
Web 3.0 and What It Means to Marketing
 
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUESTUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
 
Optimized travel recommendation using location based collaborative filtering
Optimized travel recommendation using location based collaborative filteringOptimized travel recommendation using location based collaborative filtering
Optimized travel recommendation using location based collaborative filtering
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Davai predictive user modeling
Davai predictive user modelingDavai predictive user modeling
Davai predictive user modeling
 

Destaque

Online Airline Ticket reservation System
Online Airline Ticket reservation SystemOnline Airline Ticket reservation System
Online Airline Ticket reservation System
sathyakawthar
 
Airlines Mis
Airlines MisAirlines Mis
Airlines Mis
ankushmit
 
Flight reservation and ticketing system ppt
Flight reservation and ticketing system pptFlight reservation and ticketing system ppt
Flight reservation and ticketing system ppt
marcorelano
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systems
Java and .NET Architect
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentation
Amol Salve
 

Destaque (14)

Zara restaurantandlounge
Zara restaurantandloungeZara restaurantandlounge
Zara restaurantandlounge
 
Pair assignment
Pair assignmentPair assignment
Pair assignment
 
Online Airline Ticket reservation System
Online Airline Ticket reservation SystemOnline Airline Ticket reservation System
Online Airline Ticket reservation System
 
Air asia
Air asiaAir asia
Air asia
 
Airlines Mis
Airlines MisAirlines Mis
Airlines Mis
 
Flight reservation and ticketing system ppt
Flight reservation and ticketing system pptFlight reservation and ticketing system ppt
Flight reservation and ticketing system ppt
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systems
 
Business Plan Presentation
Business Plan PresentationBusiness Plan Presentation
Business Plan Presentation
 
Introduction to Airline Information System
Introduction to Airline Information SystemIntroduction to Airline Information System
Introduction to Airline Information System
 
Airline reservation system documentation
Airline reservation system documentationAirline reservation system documentation
Airline reservation system documentation
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentation
 
Overview of airline booking process
Overview of airline booking processOverview of airline booking process
Overview of airline booking process
 
Airline Reservation System
Airline Reservation SystemAirline Reservation System
Airline Reservation System
 
Business plan for fast food restaurant
Business plan for fast food restaurantBusiness plan for fast food restaurant
Business plan for fast food restaurant
 

Semelhante a Web aggregation and mashup with kapow mashup server

IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
Zac Darcy
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IJwest
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
dannyijwest
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
butest
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
butest
 

Semelhante a Web aggregation and mashup with kapow mashup server (20)

Web and Android Application for Comparison of E-Commerce Products
Web and Android Application for Comparison of E-Commerce ProductsWeb and Android Application for Comparison of E-Commerce Products
Web and Android Application for Comparison of E-Commerce Products
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...
 
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM  FOR E-COMMERCE WEBSITES USERS USING ...
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...
 
E3602042044
E3602042044E3602042044
E3602042044
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
 
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Web mining
Web miningWeb mining
Web mining
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused web
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
 
Product Comparison Website using Web scraping and Machine learning.
Product Comparison Website using Web scraping and Machine learning.Product Comparison Website using Web scraping and Machine learning.
Product Comparison Website using Web scraping and Machine learning.
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
Rutuja SEO.pdf
Rutuja SEO.pdfRutuja SEO.pdf
Rutuja SEO.pdf
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 

Mais de Yudep Apoi (9)

Amalan Pertanian Yang Baik Untuk Penanaman Lada
Amalan Pertanian Yang Baik Untuk Penanaman LadaAmalan Pertanian Yang Baik Untuk Penanaman Lada
Amalan Pertanian Yang Baik Untuk Penanaman Lada
 
RHB SE User Manual Draft
RHB SE User Manual DraftRHB SE User Manual Draft
RHB SE User Manual Draft
 
Steps how to create active x using visual studio 2008
Steps how to create active x using visual studio 2008Steps how to create active x using visual studio 2008
Steps how to create active x using visual studio 2008
 
MIT521 software testing (2012) v2
MIT521   software testing  (2012) v2MIT521   software testing  (2012) v2
MIT521 software testing (2012) v2
 
MIT520 software architecture assignments (2012) - 1
MIT520   software architecture assignments (2012) - 1MIT520   software architecture assignments (2012) - 1
MIT520 software architecture assignments (2012) - 1
 
Intranet (leave management module) admin
Intranet (leave management module) adminIntranet (leave management module) admin
Intranet (leave management module) admin
 
Intranet (hiring module)
Intranet (hiring module)Intranet (hiring module)
Intranet (hiring module)
 
Intranet (callback module)
Intranet (callback module)Intranet (callback module)
Intranet (callback module)
 
Intranet (attendance module)
Intranet (attendance module)Intranet (attendance module)
Intranet (attendance module)
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Último (20)

TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 

Web aggregation and mashup with kapow mashup server

  • 1. 1 CHAPTER 1: INTRODUCTION 1.0: Introduction Nowadays the Internet has become a very vast platform for storing information. With just a few clicks, we can browse to a lot of information. Information stored in the World Wide Web (WWW) or in short, the Web can be accessed from anywhere, anytime and anyhow. The massively increasing structured data on the Web (Data Web) and the need for novel methods to exploit these data to their full potential is the motivation of this thesis. Building on the remarkable success of Web 2.0 mashups, this thesis regards the websites as a database, where each web data source is seen as a table, and a mashup is seen as a query over these sources. Fast development with growing complexity of websites has made the Web become essential to the Internet users. Besides providing information, websites become a platform where users can be provided with services such as online booking system. This thesis explores the problem of aggregating information about online booking system from several websites and delivers them through one point of access or portal. The aggregation tool used in this research is called Kapow Mashup Server. 1.1 Problem Statement There are already quite a number of available websites that support online booking services in Malaysia like Airasia (www.airasia.com), Malaysia Airlines System (www.malaysiaairlines.com), Firefly (www.fireflyz.com.my) and Maswing (www.maswing.com.my). However, looking for the right information such as price rate, date booking, availability and so on will be time consuming since this has to be done through repetitive manual browsing of the relevant websites. The need of an automated system to provide such information is important. Comparisons through a manual browsing will also not a competitive way. Users want to make a comparison in terms of price rate, will have troubles to do it by browsing several websites.
  • 2. 2 To aggregate the data from these several websites will need specific and right tools to do it. Different websites have their own architecture and data is located in the different frames. It is a challenge how to extract these data and offer integrated access through it in one portal. 1.2 Motivational Example Searching on the Web, we can find lot of websites or portals that provide information about online booking. We know that the most important information about online booking that users would like to have is time, duration, price and comparison of several booking services. Let us consider the following scenario. When we search for information about a flight schedule, price, and destination and so on, we have to browse into many websites for the information. Isn’t it easy if we just browse into one website where from it we are able to browse all airlines that available for a particular destination? Users also wish to make a comparison of the information regarding online booking that they are about to make. To open many web browsers and make a comparison manually is not really an effective way. Another scenario is, when users want to book for hotel’s room online. As we know today, most of hotels have their own website which provides online information booking services. Users may want to know about the hotel price, availability, as well as check-in and check-out date. With one website that controls and aggregates all these information, it eases users in making a comparison. With the advent of powerful tools for extracting and integrating data from these several of web sites realizing this one stop portal is becoming lot easier. 1.3 Research Question Internet world give us a lot of benefits especially in providing information. Within just a click on the Internet, people can find any information that they want. There are millions of websites that are available in the Internet as we know it. People need these information and data to help them make a decision and to compare information. The question is how to manipulate the data?
  • 3. 3 Internet technology has also developed rapidly towards greater efficiency. People can do anything through online transactions. Especially for online travelling ticket information, people want to know about this information, date departure, time of departure, and price of the ticket. They want to compare all those information before they make a booking. Since there is a lot of website that they can use to make a booking, it is going to be a tedious work to compare information between several website by opening it one by one. Then this thesis comes with the question ―How to aggregate data into a single online ticket website?‖ 1.4 Aim and Research Objectives The main aim of this research is to develop a prototype of a portal as a proof of concept for the problem of aggregating information that are currently available in Malaysian web-based online booking systems. Towards this end, we have identified the following specific research objectives: 1. To identify tools and agents that is suitable for web mashup and aggregation. 2. To explore a way to aggregate and mashup information on online booking system. 3. To collaborate data from several online booking system and create a portal where the data can be manipulate. 1.5 Summary of Contributions There are two main contributions of this research work. First, a prototype has been developed as a proof of concept. The prototype is a portal which contains data or information that will extract from several online booking systems. The portal will display each information according to user’s needs. Secondly, a guidelines and manual will be providing on how a web aggregation and mashup can be done with selection tools. The guideline will include what is need and what are the techniques that involve building the prototype.
  • 4. 4 1.6 Thesis Organization The rest of this thesis is organized as follows. Chapter 2, will describes on literature review, papers and works that have been done on web aggregation and mashup. In this chapter, several works or case studies will be discuss and state what scope that they have covered in the web aggregation and mashup. In Chapter 3, we will describe about methodology that will be used to achieve the objectives of the paper. Figure below show the brief about methodology that will be use: Figure 1.1: Thesis organization In Chapter 4, more explanations of the prototype in term of its design and implementation will be elaborate in details. Chapter 5 will be the conclusion and future enhancement of the thesis. Literature review and finding information Selection of Websites and suitable tools Prototype development Publishing prototype and guidelines
  • 5. 5 CHAPTER 2: LITERATURE REVIEW 2.1: Introduction Works on web aggregation and mashup have grown rapidly. In web development, a mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. The term implies easy, fast integration, frequently using open APIs and data sources to produce enriching results that were not necessarily the original reason for producing the raw source data. According to Larry Dignan [19] based on the presentation by Gartner analyst David Gootzit the future of portal is mashups, SOA, more aggregation. 2.2: Related works Momondo.com is a travel search engine that allows the consumer to compare prices on flights, hotels and car rental. The search engine aggregates results from more than 700 travel websites simultaneously to within seconds give an overview of the best offers found. Momondo doesn’t sell tickets; instead it shows the consumer where to buy at the best prices and links to the supplier. It is free of charge to use Momondo, which receives commission from sponsored links and advertising. In 2007 NBC Today’s Travel recommended that when it comes to finding the best offers on flights the consumer should go to sites like Kayak, Mobissimo, SideStep and Momondo instead of buying tickets from third-party sites that actually sell travel and are dealing directly with the airlines. In addition to the price comparisons Momondo also offers city guides written by the site's users and by bloggers based in different cities. Kayak.com is a travel search engine website based in the United States. Founded in 2004, it aggregates information from hundreds of other travel sites and helps user’s book flights, hotels, cruises, and rental cars. Kayak combines results from online travel agencies, consolidators such as Orbitz, and other sources such as large hotel chains. Like momondo.com, Kayak doesn't sell directly to the consumer; rather, it aggregates results from other sites then redirects the visitor to one of these sites for reservation. Thus, Kayak.com
  • 6. 6 makes money from pay per click advertising, when the consumer clicks-through to one of the compared websites (for example, when the consumer is redirected to the Orbitz website). 2.3: Paper on Web Aggregation and Mashup In [3], they discuss about the design and implementation of a prototype web information system that users web aggregation as the core engine. Annotea is one of project that related to the field. Annotea is a Semantic Web based project for which the inspiration comes from users’ collaboration problem in the web. It examined what users did naturally and selected familiar metaphors for supporting better collaboration [4]. In [5], they define a semantic web portal as any web portal that is developed based on semantic web technologies. They are in process of developing such web portal using available semantic technologies. Only standard technologies promising generic solution are selected. As result they expect that they will be able to provide basic development guidelines in the form of portal architecture and design patterns. In [6], they examine the development of web aggregators, entities that collect information from a wide range of sources, with or without prior arrangements, and add value through post- aggregation services. New web-page extraction tools, context sensitive mediators, and agent technologies have greatly reduced the barriers to constructing aggregators. They predict that aggregators will soon emerge in industries where they were not formerly present.
  • 7. 7 2.4 Others Works on Web Aggregation and Mashups 2.4.1 Mapping mashups In this age of information technology, humans are collecting a prodigious amount of data about things and activities, both of which are wont to be annotated with locations. All of these diverse data sets that contain location data are just screaming to be presented graphically using maps. One of the big catalysts for the advent of mashups was Google's introduction of its Google Maps API. This opened the floodgates, allowing Web developers to mash all sorts of data onto maps. Not to be left out, APIs from Microsoft (Virtual Earth), Yahoo (Yahoo Maps), and AOL (MapQuest) shortly followed. 2.4.2 Video and photo mashups The emergence of photo hosting and social networking sites like Flickr with APIs that expose photo sharing has led to a variety of interesting mashups. Because these content providers have metadata associated with the images they host (such as who took the picture, what it is a picture of, where and when it was taken, and more), mashup designers can mash photos with other information that can be associated with the metadata. For example, a mashup might analyze song or poetry lyrics and create a mosaic or collage of relevant photos, or display social networking graphs based upon common photo metadata (subject, timestamp, and other metadata.). Yet another example might take as input a Web site (such as a news site like CNN) and render the text in photos by matching tagged photos to words from the news. 2.4.3 Search and Shopping mashups Search and shopping mashups have existed long before the term mashup was coined. Before the days of Web APIs, comparative shopping tools such as BizRate, PriceGrabber, MySimon, and Google's Froogle used combinations of business-to-business (b2b) technologies or screen- scraping to aggregate comparative price data. To facilitate mashups and other interesting Web applications, consumer marketplaces such as eBay and Amazon have released APIs for programmatically accessing their content.
  • 8. 8 2.4.4 News mashups News sources (such as the New York Times, the BBC, or Reuters) have used syndication technologies like RSS and Atom since 2002 to disseminate news feeds related to various topics. Syndication feed mashups can aggregate a user's feeds and present them over the Web, creating a personalized newspaper that caters to the reader's particular interests. An example is Diggdot.us, which combines feeds from the techie-oriented news sources Digg.com, Slashdot.org, and Del.icio.us. 2.5 Related Technologies A mashup application is architecturally comprised of three different participants that are logically and physically disjoint: API/content providers, the mashup site, and the client's Web browser.  The API/content providers. These are the providers of the content being mashed. To facilitate data retrieval, providers often expose their content through Web-protocols such as REST, Web Services, and RSS/Atom. However, many interesting potential data- sources do not conveniently expose APIs. Mashups that extract content from sites like Wikipedia, TV Guide, and virtually all government and public domain Web sites do so by a technique known as screen scraping. In this context, screen scraping denotes the process by which a tool attempts to extract information from the content provider by attempting to parse the provider's Web pages, which were originally intended for human consumption.  The mashup site. This is where the mashup is hosted. Interestingly enough, just because this is where the mashup logic resides, it is not necessarily where it is executed. On one hand, mashups can be implemented similarly to traditional Web applications using server-side dynamic content generation technologies like Java servlets, CGI, PHP or ASP.
  • 9. 9  The client's Web browser. This is where the application is rendered graphically and where user interaction takes place. As described above, mashups often use client-side logic to assemble and compose the mashed content. 2.5.1 Ajax There is some dispute over whether the term Ajax is an acronym or not (some would have it represent "Asynchronous JavaScript + XML"). Regardless, Ajax is a Web application model rather than a specific technology. It comprises several technologies focused around the asynchronous loading and presentation of content:  XHTML and CSS for style presentation  The Document Object Model (DOM) API exposed by the browser for dynamic display and interaction  Asynchronous data exchange, typically of XML data  Browser-side scripting, primarily JavaScript When used together, the goal of these technologies is to create a smooth, cohesive Web experience for the user by exchanging small amounts of data with the content servers rather than reload and re-render the entire page after some user action. You can construct Ajax engines for mashups from various Ajax toolkits and libraries (such as Sajax or Zimbra), usually implemented in JavaScript. The Google Maps API includes a proprietary Ajax engine, and the effect it has on the user experience is powerful: it behaves like a truly local application in that there are no scrollbars to manipulate or translation arrows that force page reloads. 2.5.2 Web protocols: SOAP and REST Both SOAP and REST are platform neutral protocols for communicating with remote services. As part of the service-oriented architecture paradigm, clients can use SOAP and REST to interact with remote services without knowledge of their underlying platform implementation: the functionality of a service is completely conveyed by the description of the messages that it requests and responds with.
  • 10. 10 SOAP is a fundamental technology of the Web Services paradigm. Originally an acronym for Simple Object Access Protocol, SOAP has been re-termed Services-Oriented Access Protocol (or just SOAP) because its focus has shifted from object-based systems towards the interoperability of message exchange. There are two key components of the SOAP specification. The first is the use of an XML message format for platform-agnostic encoding, and the second is the message structure, which consists of a header and a body. The header is used to exchange contextual information that is not specific to the application payload (the body), such as authentication information. The SOAP message body encapsulates the application-specific payload. SOAP APIs for Web services are described by WSDL documents, which themselves describe what operations a service exposes, the format for the messages that it accepts (using XML Schema), and how to address it. SOAP messages are typically conveyed over HTTP transport, although other transports (such as JMS or e-mail) are equally viable. REST is an acronym for Representational State Transfer, a technique of Web-based communication using just HTTP and XML. Its simplicity and lack of rigorous profiles set it apart from SOAP and lend to its attractiveness. Unlike the typical verb-based interfaces that you find in modern programming languages (which are composed of diverse methods such as getEmployee(), addEmployee(), listEmployees(), and more), REST fundamentally supports only a few operations (that is POST, GET, PUT, DELETE) that are applicable to all pieces of information. The emphasis in REST is on the pieces of information themselves, called resources. For example, a resource record for an employee is identified by a URI, retrieved through a GET operation, updated by a PUT operation, and so on. In this way, REST is similar to the document- literal style of SOAP services. 2.5.3 Screen scraping Lack of APIs from content providers often forces mashup developers to resort to screen scraping in order to retrieve the information they seek to mash. Scraping is the process of using software tools to parse and analyze content that was originally written for human consumption in order to extract semantic data structures representative of that information that can be used and manipulated programmatically.
  • 11. 11 A handful of mashups use screen scraping technology for data acquisition, especially when pulling data from the public sectors. For example, real-estate mapping mashups can mash for- sale or rental listings with maps from a cartography provider with scraped "comp" data obtained from the county records office. Another mashup project that scrapes data is XMLTV, a collection of tools that aggregates TV listings from all over the world. Screen scraping is often considered an inelegant solution, and for good reasons. It has two primary inherent drawbacks. The first is that, unlike APIs with interfaces, scraping has no specific programmatic contract between content-provider and content-consumer. Scrapers must design their tools around a model of the source content and hope that the provider consistently adheres to this model of presentation. Web sites have a tendency to overhaul their look-and-feel periodically to remain fresh and stylish, which imparts severe maintenance headaches on behalf of the scrapers because their tools are likely to fail. The second issue is the lack of sophisticated, re-usable screen-scraping toolkit software, colloquially known as scrAPIs. The dearth of such APIs and toolkits is largely due to the extremely application-specific needs of each individual scraping tool. This leads to large development overheads as designers are forced to reverse-engineer content, develop data models, parse, and aggregate raw data from the provider's site. 2.5.4 Semantic Web and RDF The inelegant aspects of screen scraping are directly traceable to the fact that content created for human consumption does not make good content for automated machine consumption. Enter the Semantic Web, which is the vision that the existing Web can be augmented to supplement the content designed for humans with equivalent machine-readable information. In the context of the Semantic Web, the term information is different from data; data becomes information when it conveys meaning (that is, it is understandable). The Semantic Web has the goal of creating Web infrastructure that augments data with metadata to give it meaning, thus making it suitable for automation, integration, reasoning, and re-use.
  • 12. 12 The W3C family of specifications collectively known as the Resource Description Framework (RDF) serves this purpose of providing methodologies to establish syntactic structures that describe data. XML in itself is not sufficient; it is too arbitrary in that you can code it in many ways to describe the same piece of data. RDF-Schema adds to RDF's ability to encode concepts in a machine-readable way. Once data objects can be described in a data model, RDF provides for the construction of relationships between data objects through subject-predicate-object triples ("subject S has relationship R with object O"). The combination of data model and graph of relationships allows for the creation of ontologies, which are hierarchical structures of knowledge that can be searched and formally reasoned about. For example, you might define a model in which a "carnivore-type" as a subclass of "animal-type" with the constraint that it "eats" other "animal-type", and create two instances of it: one populated with data concerning cheetahs and polar bears and their habitats, another concerning gazelles and penguins and their respective habitats. Inference engines might then "mash" these separate model instances and reason that cheetahs might prey on gazelles but not penguins. RDF data is quickly finding adoption in a variety of domains, including social networking applications (such as FOAF -- Friend of a Friend) and syndication (such as RSS, which I describe next). In addition, RDF software technology and components are beginning to reach a level of maturity, especially in the areas of RDF query languages (such as RDQL and SPARQL) and programmatic frameworks and inference engines (such as Jena and Redland). 2.5.5 RSS and ATOM RSS is a family of XML-based syndication formats. In this context, syndication implies that a Web site that wants to distribute content creates an RSS document and registers the document with an RSS publisher. An RSS-enabled client can then check the publisher's feed for new content and react to it in an appropriate manner. RSS has been adopted to syndicate a wide variety of content, ranging from news articles and headlines, changelogs for CVS checkins or wiki pages, project updates, and even audiovisual data such as radio programs. Version 1.0 is RDF-based, but the most recent, version 2.0, is not.
  • 13. 13 Atom is a newer, but similar, syndication protocol. It is a proposed standard at the Internet Engineering Task Force (IETF) and seeks to maintain better metadata than RSS, provide better and more rigorous documentation, and incorporates the notion of constructs for common data representation. These syndication technologies are great for mashups that aggregate event-based or update- driven content, such as news and weblog aggregators. 2.6 Aggregation and Mashup Challenges To mashup and aggregate the web, it has its own challenges. The challenges can be divided into three which is technical challenges, component challenges and social challenges. 2.6.1 Technical Challenges Like any other data integration domain, mashup development is replete with technical challenges that need to be addressed, especially as mashup applications become more features and functionality rich. For example, translation systems between data models must be designed. When converting data into common forms, reasonable assumptions often have to be made when the mapping is not a complete one (for example, one data source might have a model in which an address-type contains a country-field, whereas another does not). Already challenging, this is exacerbated by the fact that the mashup developers might not be domain experts on the source data models because the models are third-party to them, and these reasonable assumptions might not be intuitive or clear. In addition to missing data or incomplete mappings, the mashup designer might discover that the data they wish to integrate is not suitable for machine automation; that it needs cleansing. For example, law enforcement arrest records might be entered inconsistently, using common abbreviations for names (such as "mkt sqr" in one record and "Market Square" in another),
  • 14. 14 making automated reasoning about equality difficult, even with good heuristics. Semantic modelling technologies, such as RDF, can help ease the problem of automatic reasoning between different data sets, provided that it is built-in to the data-store. Legacy data sources are likely to require much human effort in terms of analysis and data cleansing before they can be availed to semantic modelling technologies. Another host of integration issues facing mashup developers arise when screen scraping techniques must be used for data acquisition. Deriving parsing and acquisition tools and data models requires significant reverse-engineering effort. Even in the best case where these tools and models can be created, all it takes is a re-factoring of how the source site presents its content to break the integration process, and cause mashup application failure. 2.6.2 Component Challenges The Ajax model of Web development can provide a much richer and more seamless user experience than the traditional full-page-refresh, but it poses some difficulties as well. At its fundamentals, Ajax entails using the browser's client-side scripting capabilities in conjunction with its DOM to achieve a method of content delivery that was not entirely envisioned by the browser's designers. However, this subjects Ajax-based applications to the same browser compatibility issues that have plagued Web designers ever since Microsoft created Internet Explorer. For example, Ajax engines make use of an XMLHttpRequst object to exchange data asynchronously with remote servers. In Internet Explorer 6, this object is implemented with ActiveX rather than native JavaScript, which requires that ActiveX be enabled. Meanwhile in Mozilla Firefox, this object require an extension or plug-in. A more fundamental requirement is that Ajax requires that JavaScript be enabled within the user's browser. This might be a reasonable assumption for the majority of the population, but there are certainly users who use browsers or automated tools that either do not support JavaScript or do not have it enabled. One such set of tools are the robots, spiders, and Web crawlers that aggregate information for Internet and intranet search engines. Without graceful degradation, Ajax-based mashup applications might find themselves missing out on both a minority user base as well as search engine visibility.
  • 15. 15 The use of JavaScript to asynchronously update content within the page can also create user interface issues. Because content is no longer necessarily linked to the URL in the browser's address bar, users might not experience the functionality that they normally expect when they use the browser's BACK button, or the BOOKMARK feature. And, although Ajax can reduce latency by requesting incremental content updates, poor designs can actually hinder the user experience, such as when the granularity of update is small enough that the quantity and overhead of updates saturate the available resources. Also, take care to support the user (for example, with visual feedback such as progress bars) while the interface loads or content is updated. As with any distributed, cross-domain application, mashup developers and content providers alike will also need to address security concerns. The notion of identity can prove to be a sticky subject, as the traditional Web is primarily built for anonymous access. 2.6.3 Social Challenges In addition to the technical challenges, social issues have (or will) surface as mashups become more popular. One of the biggest social issues facing mashup developers is the trade-off between the protection of intellectual property and consumer privacy versus fair-use and the free flow of information. Unwitting content providers (targets of screen scraping), and even content providers who expose APIs to facilitate data retrieval might determine that their content is being used in a manner that they do not approve of. The mashup Web application genre is still in its infancy, with hobbyist developers who produce many mashups in their spare time. These developers might not be cognizant of (or concerned with) issues such as security. Additionally, content providers are only beginning to see the value in providing APIs for machine-based content access, and many do not consider them a core business focus. This combination can yield poor software quality, as priorities such as testing and quality assurance take the backseat to proof-of-concept and innovation. The community as a whole will
  • 16. 16 have to work together to assemble open standards and reusable toolkits in order to facilitate mature software development processes. Before mashups can make the transition from cool toys to sophisticated applications, much work will have to go into distilling robust standards, protocols, models, and toolkits. For this to happen, major software development industry leaders, content providers, and entrepreneurs will have to find value in mashups, which means viable business models. API providers will need to determine whether or not to charge for their content, and if so, how (for example, by subscription or by per-use). Perhaps they will provide varying levels of quality-of-service. Some marketplace providers, such as eBay or Amazon, might find that the free use of their APIs increases product movement. Mashup developers might look for an ad-based revenue model, or perhaps build interesting mashup applications with the goal of being acquired.
  • 17. 17 CHAPTER 3: METHODOLOGY 3.1: Research Activities To complete this research, prototyping approach has been used in the system development process. From the research activities carried out and the development of the prototype, it is sufficient to prove that conceptual idea will work satisfactory towards archieving the main idea of web mashup and aggregation. The proposed research methodology will primarily focused on three main activities. Firstly, emphasis will be put on the identifying of the information model to be used for aggregating web data. These include tools, platform and latest technology that will be used in the prototype development. It is important that the tools that involve are easy to use and has certain ability to just not extracting data but more than that such as update the data as the client website changing. In this stage, object and class will be identifying to satisfied information that we need to harvest. Secondly, after information model is identified and tools is confirmed, an integration and collaboration of the websites we be put on focus. Model and web bot (Kapow Robot) is developed and deploy to harvest information that is need. This bots will bring back the data and these are data that we will use. Lastly, interface will develop to be as the portal for information that has been collected by the mashup robot.
  • 18. 18 3.2: Overview of Development Process A prototype will be developing in the research. Prototyping is the rapid development of a system. In the past, the developed system was normally thought of as inferior in some way to the required system so further development was required. There are 5 steps in prototype development methodology. Below are the steps: 1. Gather requirements 2. Build prototype 3. Evaluate prototype 4. If accepted, throw away prototype and redesign 5. if rejected, re-gather requirements and repeat from step 2 Below is the illustration picture of prototype methodology: Figure 3.1: Prototype Model 3.3 Gather requirement There are a few methods that are used to gather requirement. In this case, we used internet, relevant papers, journals, and explore in the university library to find information. Basically, information about previous paper journal or researches about the topic are needed. Comes into the prototype specifically, the seawares, hardware and technologies that suitable to develop the prototype are needed. Evolutionary prototyping Throw-away Prototyping Delivered system Executable Prototype + System Specification Outline Requirements
  • 19. 19 There are a few websites that use almost the same concept of the research prototype. For an example, www.kayak.com. According to wikipedia, Kayak.com is a travel search engine website based in the United States. Founded in 2004, it aggregates information from hundreds of other travel sites and helps users’ book flights, hotels, cruises, and rental cars. Kayak combines results from online travel agencies, consolidators such as Orbitz, and other sources such as large hotel chains. Kayak is built on a Java, Apache, Perl and Linux platform. It uses XML-HTTP and JavaScript to create a rich User interface. When it comes to the prototype development, we are going to need the tools and software to developing it. Table below are list of software and hardware required: Software requirement: EDI Platform Windows Windows 2000 SP2 and SP4, Windows Server 2003 Standard Edition SP1 and x64 Standard Edition, Windows Server 2003 R2 Standard Edition for x86 and x64, Windows XP SP2, Windows Vista Linux Red Hat Enterprise Linux 5.0 Server Platform Windows Windows 2000 SP2 and SP4, Window Server 2003 Standard Edition SP1 and x64 Standard Edition, Windows Server 2003 R2 Standard Edition for x86 and x64, Windows XP SP2 Linux Red Hat Enterprise Linux 5.0, Debian 4.0 (on x86 and x64) Database Oracle Version 8.1.7.0, 9i R2 and 10g R2 IBM DB2 UDB 7.2 and 8.2 Microsoft SQL Server Version 2000 and 2005 Sybase Adaptive Server Enterprise 15.0
  • 20. 20 PointBase Server Version 4.4 and 4.5 MySQL Version 4.0, 4.1 and 5.0 APIs Java J2SE 1.3 + JAXP or J2SE 1.4 or later .NET C#, .NET Version 1.0 and 1.1 Clipping Portlets BEA WebLogic Portal Version 8.1 (all service packs) IBM WebSpere Portal Version 5.0 and 5.1 Standard Java Portal JSR-168 Clipping Browsers Microsoft Internet Explorer Version 6.0 Mozilla Firefox Version 1.5+ (both Windows and Linux) Tag library JSP Version 1.2 and 2.0 Web Services BEA WebLogic Workshop Version 8.1 (all service packs) .NET .NET Version 1.0 and 1.1 Code Generation Java J2SE 1.3 or later .NET C#, .NET Version 1.0 and 1.1 Table 3.1: Software Requirements
  • 21. 21 Hardware requirements: The table below specifies system specification for different platforms. The requirements may depend on the application so these should only be taken as guidelines and not as absolute numbers. A complex clipping solution might require much more power than a simple collection solution. The recommendations for servers are for one server. The number of servers used for a given application (the size of a cluster) is a completely different matter and should be estimated using methods described elsewhere. Minimum Recommended IDE Windows Intel Pentium 1GHz CPU, 512MB RAM, 200MB Free Disk Space Intel Pentium 2GHz CPU, 1GB RAM, 200MB Free Disk Space Linux Intel Pentium 1GHz CPU, 512MB RAM, 200MB Free Disk Space Intel Pentium 2GHz CPU, 1GB RAM, 200MB Free Disk Space Server Windows Intel Pentium 2GHz CPU, 1GB RAM, 200MB Free Disk Space Intel Pentium 2GHz CPU, 2GB RAM, 200MB Free Disk Space Linux Intel Pentium 2GHz CPU, 1GB RAM, 200MB Free Disk Space Intel Pentium 2GHz CPU, 2GB RAM, 200MB Free Disk Space Souce: http://kdc.kapowtech.com/documentation_6_4/Technical/TechnicalDataSheet6_4.pdf Table 3.2: Hardware Requirements Beside information about software and hardware requirement, information about websites that will be the target for harvesting information also will be identified. Websites that have the ability to do online booking/ticket system will be put on priority.
  • 22. 22 3.4 Build prototype In build the prototype, every aspect from installation and reading the manual must be prepared well. The main tool to develop the prototype is Kapow Mashup Server. The other tools are Intelij Idea. 3.2.1 Kapow Mashup Server When we talk about web data access, extraction and harvesting, Kapow Mashup Server is a tool that suitable to do all those things. Kapow also known as web integration platform and the Kapow Mashup Server make it possible to access data or content from any browse-able application or website. Over the past few years, the Kapow Mashup Server has become a new lightweight services and mashup standard among Internet-intensive businesses in the areas of media, financial services, travel, manufacturing and information services firms (background checking, information providers, etc.). Kapow mashup server is a platform for web integration. Kapow mashup server helps transform the resources of the web into well define nuggets of information and functionality. In effect, kapow mashup server transforms a web site into more services available to client application. According to source from [19], The Kapow Web Data Server powers solutions in web and business intelligence, portal generation, SOA/WOA enablement, and content migration. Kapow’s patented visual programming and integrated development environment (IDE) technology enables business and technical decision-makers to create innovative business applications. With Kapow, new applications can be completed and deployed in a fraction of the time and cost associated with traditional software development methods. The research using Kapow as the main application tools because Kapow have the best result when it comes to data aggregation. By creating models and robots, all function such as collection of internal or external web-based data sources, website clip, and so on can be done easily without doing any programming.
  • 23. 23 Few abilities of the Kapow Mashup Server that contributes to the development of the prototype are: - Web integration - Code generation - Data harvesting The Kapow Mashup Server also provides web-to-web data integration functionality, allowing data extraction from one website, transforming it into a new format, and pushing it through input forms into a second website. This process can be a many-to-many process, extracting data from multiple websites, combining and transforming and pushing them into multiple other websites. Web based transformation is also supported e.g. using a website for real-time language translation or HTML to XML conversion. 3.2.2 Intelij Idea To code all web bases programming language such as Java, HTML and PHP, a platform is needed. IntelliJ IDEA is a code-centric IDE focused on developer productivity. IntelliJ IDEA deeply understands the code and gives a set of powerful tools without imposing any particular workflow or project structure. Imagine that we have a large source code-base that we need to browse or modify it. For instance, we might want to use a library and find out how it works, or we might need to get acquainted with existing code and to modify it. Yet another example is that a new JDK becomes available and we are keen to see the changes in the standard Java libraries and so on. Conventional tools like find and replace text may not completely address these goals because when we use them, it is easy to find or replace too much or too little. Of course, if someone already knows the source code well, then using the whole words option and regular expressions may help make our find-and-replace queries smarter. This is an advantage why we use this tool for the development of the prototype.
  • 24. 24 3.5 Evaluate prototype To be more specific, the prototype approach that we use is called extreme prototype. Basically, it breaks down web development into three phases, each one based on the preceding one. The first phase is a static prototype that consists mainly of HTML pages. In the second phase, the screens are programmed and fully functional using a simulated services layer. In the third phase the services are implemented. The process is called Extreme Prototyping to draw attention to the second phase of the process, where a fully-functional UI is developed with very little regard to the services other than their contract. In this stage, all robots and model that were created with the Kapow Mashup Server are ready to be deployed. From the ability of kapow that allow generating codes, it will be copy and paste to the programming development tool Intellij Idea.
  • 25. 25 CHAPTER 4: PROTOTYPE DESIGN AND IMPLEMENTATION This chapter will explain and show the developer on actual concept of the research and how various measurements are taken to prove the concept. The whole implementation process of the aggregation and mashup tool, name as Kapow as Web Aggregation and Mashup for Online Booking System. 4.1 Conceptual Design 4.1.1 Kapow Mashup Server as Tool The Kapow Mashup Server enable you to collect, connect and mashup everything on corporate intranets as well as the World Wide Web. Because of these abilities, the Kapow Mashup Server has made it the first choice to be use in the development of the prototype. Kapow Mashup Server provides web-to-web data integration functionality, allowing data extraction from one website, transforming it into a new format, and pushing it through input forms into a second website. This process can be a many-to-many process, extracting data from multiple websites, combining and transforming and pushing them into multiple other websites. Web based transformation is also supported e.g. using a website for real-time language translation or HTML to XML conversion. Figure 4.1: Diffrent layer involved in Kapow Mashup Server (kapow website)
  • 26. 26 Figure 4.1 generally explain how the Kapow work. Tree layer are involved which is integrated development environment, web based management and scalable server environment. In the next topic, we will explain more details about each of these layers. There are 4 important elements that are involved in the integrated development environment layer. We can call this layer as the primary studio tools of Kapow Mashup Server. Figure 4.2: Kapow ModelMaker Interface ModelMaker is the RoboSuite application for writing and maintaining domain models that are used in RoboMaker. With ModelMaker, we can easily create new domain models and configure existing models, as well as add, delete, and configure the objects within a domain model. Meanwhile, RoboMaker is the RoboSuite application for creating and debugging robots. RoboMaker is an integrated development environment (IDE) for robots. This means that RoboMaker is all we need for programming robots in an easy-to-understand visual programming language. To support us in the construction of robots, RoboMaker provides us with powerful programming features including interactive visual programming; full debugging capabilities, an overview of the program state, and easy access to context-sensitive online help.
  • 27. 27 Figure 4.3: Kapow Mashup Server RoboMaker Interface. 4.1.2 Architecture of the Prototype The architecture of the prototype basically consists on 3-tier architecture. There will be an existing online booking/ticket system on the internet that will be used as the data sources. These data will be extracted by the kapow mashup server tools, to be specific kapow robot which is built accordingly with the model. The robot will be deploy to harvest the data that we want and bring it back to the portal that is develop by using several web programing language and using apache as the web server. Figure 4.4 show the architecture of the prototype.
  • 28. 28 Figure 4.4: Prototype architecture
  • 29. 29 4.2 Prototype Implementation Design 4.2.1 Websites In this thesis, for the purpose of testing we will only test the aggregator for one website. The website is www.Agoda.com. The website provides information about hotel. Information that usually need for customer are the name of the hotel, rate per night, location and date. We are using Kapow is the tool for extracting all those information. Websites have their own structure and design. Website often change their structure, layout and design so that it suits the current needs. The changing might be a problem to the prototype to adapt with because the data or information that is want to harvest might be change their location or position in the website. This may course the robot bring back the wrong information. 4.2.2 Model Model Maker is used to create and edit object models. An object model is like a type definition in a programming Language. It defines the structure of the objects that form the input and output of a robot. model maker a; is a visual tool for creating data objects that define that data structured utilized by robots for information collection, aggregation and integration. An object model consists of one or more attribute definitions, each of which define an attribute name, type, and other information. A given robot will return (or store) objects defined by one or more object models. For example, a data collection Robot for job postings could return objects defined by the object model Job. Job would contain attributes such as title and source (short text types), date (date type), description (long text) and so on. In case the objects are stored in a database at runtime, the database will have a table definition matching the object model. Model Maker can generate the SQL necessary to create the required tables in the database. Firstly, a model for hotel need to be creates. Since this is an input and output type of query, we need to create two object in the model. HotelQuery to input atributes country, city, checkindate and checkoutdate. HotelResult to extract output.
  • 30. 30 See pictures how the model look likes. Figure 4.5 below show the object call HotelQuery with attributes call country, city, checkin, and checkout. Figure 4.5: HotelQuery atributes
  • 31. 31 Figure 4.6 show the output object – HotelResult with attributes. Figure 4.6: HotelResult as output object.
  • 32. 32 4.4: Creating Robot 4.4.1: Creating Robot for Hotel Website A robot will be creating to be deploying to harvest all information accordingly to the model. Steps to create the robot are as shown. Step 1: Choose Integration robot type of robot from New Robot Wizard Figure 4.7: Create new Integration Robot
  • 33. 33 Step 2: Enter the URL that the robot should start from: www.agoda.com Figure 4.8: Enter www.agoda.com
  • 34. 34 Step 3: Select the objects that the robot should receive from input. Choose HotelQuery which is created in the model wizard. Figure 4.9: Choose HotelQuery
  • 35. 35 Step 4: From the wizard, select the objects that the robot should return as output. Figure 4.10: Choose HotelResult
  • 36. 36 Step 5: Select objects that the robot should use for holding temporary data during its execution. Figure 4.11: ScratchPad holding the temporary data Figure 4.12: Two output objects HotelResult and ScratchPad
  • 37. 37 Step 6: Entering the next attribute Figure 4.13: Loading website into the Kapow interface
  • 38. 38 Step 7: Figure 4.14: Select country
  • 39. 39 Step 8: Figure 4.15: Select clipping
  • 40. 40 Step 9: Figure 4.16: Debugging the robot
  • 41. 41 Step 10: Figure 4.17: Information that is collected
  • 42. 42 4.4.2: Creating Robot for Flight Website Airasia.com is chosen to be the website where information about the flight is extract. Firstly, model will be creating as follow. Create a model for the robot. First create a model for flight_in.model which will collect all data about from where you want to fly. This will be as your input data. Add attributes below: 1. Origin [data type: short text] 2. Destination [data type: short text] 3. Dep_date [data type: date] 4. Ret_date [data type: date] Figure 4.18: Show how Flight_In attributes looks like.
  • 43. 43 Create a model for flight_Out.model which will collect all data about the destination of your flight. This will be your output data. Add attributes for flight_out.model as below list: 1. Origin [data type: short text] 2. Destination [data type: short text] 3. Dep_date [data type: date] 4. Arr_date [ data type: date] 5. Flight_no [data type: date] 6. Price [data type: number] 7. Currency [data type: short text] 8. Carrier [data type: short text] Figure 4.19: Show how Flight_Out attributes looks like. Save model as flight.model.
  • 44. 44 Creating airasia.robot First, open RoboMaker application. Figure 4.20: Creating Airasia.robot Choose Create a new robot… And click OK.
  • 45. 45 Figure 4.21: Choose Integration robot then, click NEXT Figure 4.22: Enter the URL that the robot should start from http://www.airasia.com/site/my/en/home.jsp Integration robot http://www.airasia.com/site/my/en/home.jsp
  • 46. 46 Figure 4.23: Select the objects to input to the robot. Click here to add Flight_In
  • 47. 47 Figure 4.24: Select the objects to output from the robot. Click FINISH Figure 4.25: This is how the first screen looks like, Load Page. Select Flight_Out for output object
  • 48. 48 Move your cursor to the origin and right-click your mouse. As shown in Figure 4.26, click on ―Select Option‖. Figure 4.26: Select Option for Origin. A pop-up screen will appear and choose from drop-down menu for your Origin and set it as Value. In this tutorial, choose Kuala Lumpur LCCT. Click On Select Option
  • 49. 49 Figure 4.27: Option to Select. Do the same way to destination which is in this case Bintulu will be the destination. Figure 4.28: Set the destination. Select Origin : Kuala Lumpur LCCT Set as value Select Bintulu as the destination
  • 50. 50 Next, we need to set the Departure date for the flight. The date will be extract to Flight_In.dep_date. Put your cursor on the Departure Date and right-click on it. Select Option and choose the date of departure. See Figure 4.29 and Figure 4.30. Figure 4.29: Select Option for date of departure. Figure 4.30: Select day of departure. Right-click on date and select option
  • 51. 51 Day must be inserted as what it is. To do that, we need to convert the full date format to extract only day from it. See Figure 4.31. Figure 4.31: Select Converters Figure 4.32: Get attribute Select converters Clicks configure to configure Get Attribute. Configure it as shown in Figure 4.33
  • 52. 52 Figure 4.33: Set attribute as Flight_In.dep_date. However, value is not appearing there until you set the value by your own in all objects of Flight_In. The values are as below (see Figure 4.34): Flight_In.Origin = Kuala Lumpur LCCT Flight_In.Destination = Bintulu Flight_In.dep_date = 2009-08-01 00:00:00.0 Flight_In.ret_date = 2009-08-05 00:00:00.0
  • 53. 53 Figure 4.34: Setting attributes After that, click on ―Configure ―to set the day for Flight_In.dep_date as shown in Figure 4.35. Figure 4.35: Set attribute as Flight_In.dep_date Fill attributes with information as shown After fill all attributes, click apply Attribute: Flight_In.dep_date
  • 54. 54 Click on to add Converter and select Date Handling and choose Format Date as shown in Figure 4.36. Figure 4.36: Formatting date. Configure Format Date as shown in Figure 4.37. Click on Format Date
  • 55. 55 Figure 4.37: Format pattern. Do the same steps for Month and Year of the departure date. However, in this step choose ―Aug 2009(―200908‖) as the Option to select and set it as value. See Figure 4.38. Figure 4.38: Date format Repeat step in Figure 4.30, Figure 4.31, Figure 4.32, Figure 4.33, Figure 4.34, Figure 4.35, Figure 4.36, Figure 4.37, and Figure 4.38 to set a Day, Month and Year for Flight_In.ret_date. In this case use 05 August 2009 as the return date. Change Format Pattern to “dd”
  • 56. 56 Next steps put your cursor on ―Search’ button as shown in Figure 4.39. Right-click on it and choose ―Click‖. Figure 4.39: Choose click to search for flight. After click on search button, a screen which is the result of searching will appear. Click on the table of the flight information and expand to create loops. Choose “Click”
  • 57. 57 Figure 4.40: Creating loops Figure 4.41: First tag finder Next steps is extracting information for Flight_Out object. Step 1: Click on table Step 2: Expand the Green Line square to get loops Step 3: Right-click inside greed square, choose Loops and select Far Each Tag Step 1: Click back to loops Step 2:Replace 0 to 1 Step 3: Click here
  • 58. 58 Figure 4.42: Extracting to Flight_Out.origin Configure Extraction using Advance Extract as seen in Figure 4.43. Figure 4.43: Configure extraction by using Advance Extract. Step 1: Click on the Kuala Lumpur, Expand it. Right- click on the words. Select Extraction, Select Text and choose Flight_Out.Origin Select Advance Extract
  • 59. 59 Figure 4.44: Patten, Output Expression Next step is to extract the departure date. See steps in picture Figure 4.45: Steps to extract date of departure to Flight_Out.dep_date. Click Configure to configure the words Use this pattern: .*to(.*)(.* Output Expression: $1 Step 1: right-click on the hour Step 2: Choose Extraction => Extract Date => Flight_Out.dep_date
  • 60. 60 Next step is to configure the date format. See Figure 4.46. Figure 4.46: Date Format Do the same steps to extract arrival date and save it to Flight_Out.arr_date. See steps in Figure 4.47. Format Pattern: hhmm
  • 61. 61 Figure 4.47: Extracting arrival date Figure 4.48: Set the Format Pattern of the date as hhmm as well. Step 1: Right-click on arrival hour Step 2: Choose Extraction => Extract date => Flight_Out.arr_date.
  • 62. 62 As we can see at the browser tables of information about Departure have all this (Figure 4.49). We need to extract Depart (0905) to Flight_Out.dep_date, Arrive(1100) to Flight_Out.arr_date which we have already done in previous steps. Now we need to extract Flight (AK 5146) to Flight_Out.flight_no, Fare (156.00) to Flight_Out.price and Currency (MYR) to Flight_Out.currency. Figure 4.49: Depart table Extracting Flight Number, see Figure 4.50. 4.50: Extracting the flight number.
  • 63. 63 Extracting price, see Figure 4.51. Figure 4.51: Extracting price to Flight_Out.price. Extracting the currency to Flight_Out.currency. See Figure 4.52.
  • 64. 64 Figure 4.52: Extracting currency. For currency, set it as Advance Extract. See Figure 4.53. Figure 4.53: Format for Advance Extract. Step 1: Click here Add Advance Extract Step 2: Click here to configure Step 3: Set pattern as .* (.*) Step 3: Set Output Expression as $1
  • 65. 65 At the end of the robot, we must return the object to itself. See Figure 4.54. Figure 4.54: Returning object. For Return table, we need to extract some information like we did for Depart. Apply all steps that used in the Table Depart. Figure 4.55: Return table. Creating branch and loops for table Return. See Figure 4.56 and Figure 4.57. Choose return object to Flight_Out
  • 66. 66 Figure 4.56: Creating branch. Figure 4.57: Creating loops for Return table. Step 1: click here Step 2: Click here to create branch New branch will appear. Step 1: Click any area in side table Step 2: Expand the green square to satisfy the table for loop Step 3: Right-click on the green box area. Step 4: Choose For Each Tag loops.
  • 67. 67 After this steps, follow steps that we applied for Depart Table. Last but not least, the robot must be debugging. Click on debug icon to debug the robot. See Figure 4.58. Figure 4.58: Debugging The debugging screen will appear after you click debug icon. See steps in Figure 4.59 how to run the debugging. Figure 4.59: Debug screen Click here to debug. Click here to run the debugging. Information collected by your robot.
  • 68. 68 4.4.3 Intellij Setting For the prototype, the setting is need. The general setting is as shown in the picture. Figure 4.60: Path of the project compiler From the picture, the path of the project compiler output needs to be set. This path is use to store all project compilation results. A directory corresponding to each module is created under this path. This directory contains two subdirectories: Production and Test for production code and test sources, respectively. A module specific compiler output path can be configured for each of the modules as required. In this case, directories call ―workspace‖ and subdirictories ―hotel‖ and ―out‖. The path is C:workspacehotelout.
  • 69. 69 Figure 4.61: Setting classes Classes for the project also need to be set. Attach the classes to C:Program FilesKapow Mashup Server 6.4APIrobosuite-java-apilibrobosuite-api.jar. This is to link the Intellij IDEA to Kapow Robosuite API.
  • 70. 70 CHAPTER 5: FUTURE ENHANCEMENT AND CONCLUSION 5.1: Future Enhancement From earlier, this paper introduces a way to collaborate and aggregate information from several online tickets booking services website such as hotels, airlines and tickets booking. The emphasize is on the technique and way to aggregation with Kapow Mashup. The prototype that is developed is use only one website which is the hotel website. In the future, the websites could be added. More research also could be done on how to make a comparison and use the data that is collect by the robot and applying it for data mining purposes. The area of Web services aggregations is seeing a large amount of activity as aggregation mechanisms are still evolving. Some are being extended and new ones created to enhance their capabilities. As multiple proposals emerge for aggregating Web services, it is important to understand where the mechanisms needed fit in and how they relate to existing approaches. Ongoing work will reflect the effects of the evolution of core specifications, including WSDL, as well as the growth and adoption of Web services aggregation techniques. Refining and expanding the classification will consider both adding categories, and additional dimensions for existing categories, such as level and focus of constraints. We are also interested in identifying primitive aggregation mechanisms, and understanding the conditions under which they may or may not be combined. The World Wide Web contains an immense amount of information, thus it is nowadays often thought of as a huge database. However, like for relational databases, a database management system (DBMS) is needed to combine data from different sources and give the information a new meaning. In above sections API driven Mashup building was introduced as a way of mixing up data from different Web sources just like combining data from different tables in a relational database, which provided a way of managing information stored in the database we call the World Wide Web. Building Mashups using API’s require high programming skills though and so they are quite useless for a regular person, who wants to mix up data sources from all over the web. Another point is that most information on the Web is not accessible over an API, so only a small part of the WWW is remix-able. In [16], the vision of gathering data for Mashups easier in the future is stated.
  • 71. 71 5.2 Conclusion Everyday information will keep on added into websites throughout the world as long as there is an access into the World Wide Web. People can rely on the internet whenever they need information. Just one click into the net, they can have the information that they want. The massive information and data on the internet need to be exploited and change it into useful information. Assume that, website is databases consist of tables and using the website aggregator tools we can query data from the website. This paper described how mashup technique can be used to solve specific service issues for end users. In relation with this issue, a mashup technique is proposed using tool that called Kapow Mashup Server. It is also described the relevant technologies that can be used for mashup in different service layers. This type of architecture can leverage and integrate the end user relevant information from the existing web applications in the web.
  • 72. 72 REFERENCES 1. Mustafa Jarrar, Marios D. Dikaiakos: A Data Mashup Language for the Data Web 2. Bizer C, Heath T, Berners-Lee T:Linked Data: Principles and State of the Art. WWW (2008) 3. Ainie Zeinaida, Nor Adnan Yahaya: Design and Implementation of an Aggregation- based Tourism Web Information System 4. Marja-Riita Koivunen: Annotea and Semantic Web Supported Collaboration 5. Lidia Rovan: Realizing Semantic Web Portal Using Available Semantic Web Technologies and Tools 6. Stuart Madnick, Michael Siegel: Seizing the Opportunity: Exploiting Web Aggregation 7. http://queue.acm.org/detail.cfm?id=1017013 8. http://www.langpop.com/. Retrieved 2009-01-16. 9. http://www.thirdnature.net/about_us.html 10. F. Curbera, M. Duftler, R. Khalaf, N. Mukhi, W. Nagy, and S. Weerawarana. BPWS4J. Published online by IBM at http://www.alphaworks.ibm.com/tech/bpws4j, Aug 2002. 11. Fancisco Curbera, Matthew Duftler, Rania Khalaf, William Nagy, Nirmal Mukhi, and Sanjiva Weerawarana. Unraveling the web services web: An introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing, 6(2):86–93, 1 2002. 12. Francisco Curbera, Rania Khalaf, Frank Leymann, and Sanjiva Weerawarana. Exception handling in the bpel4ws language. In International Conference on Business Process Management (BPM2003), LNCS, Eindhoven, the Netherlands, June 2003. Springer. 13. Francisco Curbera, Rania Khalaf, Nirmal Mukhi, Stefan Tai, and S. Weerawarana. Web services, the next step: Robust service composition. Communications of the ACM: Service Oriented Computing, 10 2003.
  • 73. 73 14. Francisco Curbera, Sanjiva Weerawarana, and Matthew J. Duftler. On component composition languages. In Proc. International Workshop on Component–Oriented Programming, May 2000. 15. Eric M. Dashofy, Nenad Medvidovic, and Richard N. Taylor. Using off-the-shelf middleware to implement connectors in distributed software architectures. In Proc. of International Conference on Software Engineering, pages 3–12, Los Angeles, California, USA, May 1999. 16. Iskold, A. Yahoo! Pipes and the Web as Database. Available at http:// www.readwriteweb.com/archives/yahoopipesweb-database.php. (Accesed on 01/01/2010) 17. A Mashup Architecture for Web End-user Application Designs Shah J Miah I and John Gamlnack 2, Institute for Integrated and Intelligent Systems, Griffith University, Nathan Calnpus, QLD 4111, Australia 18. The RDF Book Mashup: From Web APIs to a Web of Data Christian Bizer, Richard Cyganiak, and Tobias Gauß, Freie Universit¨at Berlin 19. http://kapowtech.com/index.php/about-us/overview
  • 74. 74 GLOSSARY World Wide Web (WWW) The World Wide Web, abbreviated as WWW and commonly known as the Web, is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia and navigate between them by using hyperlinks. Data Web Data Web refers to the transformation of the Web from a distributed file system into a distributed database system. Web 1.0 Web 1.0 (1991-2003) is a retronym that refers to the state of the World Wide Web, and any website design style used before the advent of the Web 2.0 phenomenon. Web 1.0 began with the release of the WWW to the public in 1999, and is the general term that has been created to describe the Web before the "bursting of the Dot-com bubble" in 2001. Since 2004, Web 2.0 has been the term used to describe the current web design, business models and branding methods of sites on the World Wide Web. Web 2.0 The term Web 2.0 is commonly associated with web applications that facilitate interactive information sharing, interoperability, user- centered design, and collaboration on the World Wide Web. A Web 2.0 site gives its users the free choice to interact or collaborate with each other in a social media dialogue as creators (prosumer) of user- generated content in a virtual community, in contrast to websites where users (consumer) are limited to the passive viewing of content that was created for them. Examples of Web 2.0 include social- networking sites, blogs, wikis, video-sharing sites, hosted services, web applications, mashups and folksonomies. APIs An application programming interface (API) is an interface
  • 75. 75 implemented by a software program that enables it to interact with other software. SOA Search oriented architecture, the use of search engine technology as the main integration component in an information system Annotea In metadata, Annotea is an RDF standard sponsored by the W3C to enhance document-based collaboration via shared document metadata based on tags, bookmarks, and other annotations. Semantic Web Semantic Web is a group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information on the World Wide Web. RSS RSS (most commonly expanded as Really Simple Syndication) is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format. ATOM The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (AtomPub or APP) is a simple HTTP- based protocol for creating and updating web resources. REST Representational State Transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web. Java servlets A Servlet is a Java class in Java EE that conforms to the Java Servlet API, a protocol by which a Java class may respond to HTTP requests. CGI Common Gateway Interface, a protocol for calling external software via a web server to deliver dynamic content (and .cgi, its associated file extension) PHP PHP: Hypertext Preprocessor is a widely used, general-purpose scripting language that was originally designed for web development to produce dynamic web pages. ASP Active Server Pages, a web-scripting interface by Microsoft.
  • 76. 76 JavaScript JavaScript is an implementation of the ECMAScript language standard and is typically used to enable programmatic access to computational objects within a host environment. XML Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards. XHTML XHTML (eXtensible Hypertext Markup Language) is a family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written. CSS Cascading Style Sheets (CSS) is a style sheet language used to describe the presentation semantics (the look and formatting) of a document written in a markup language. DOM Document Object Model, a way to refer to XML or (X)HTML elements as objects SOAP SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for exchanging structured information in the implementation of Web Services in computer networks. WSDL The Web Services Description Language (WSDL, pronounced 'wiz- del') is an XML-based language that provides a model for describing Web services. XMLTV XMLTV is an XML based file format for describing TV listings. IPTV providers use XMLTV as the base reference template in their systems, and extend it internally according to their business needs. RDF Resource Description Framework, an official World Wide Web Consortium (W3C) Semantic Web specification for metadata models FOAF Friend of a friend (FOAF) is a phrase used to refer to someone that one does not know well, literally, a friend of a friend. IETF The Internet Engineering Task Force (IETF) develops and promotes
  • 77. 77 Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite. ActiveX ActiveX is a framework for defining reusable software components in a programming language independent way. Software applications can then be composed from one or more of these components in order to provide their functionality. WOA Web Oriented Architecture, a computer systems architectural style. IDE An integrated development environment (IDE) also known as integrated design environment or integrated debugging environment is a software application that provides comprehensive facilities to computer programmers for software development. DBMS A Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of a database.