Web Architectures - Lecture 02 - Web Information Systems (4011474FNR)
1. 2 December 2005
Web Information Systems
Web Architectures
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2
Web Information Systems
A web information system uses web technologies
for information and service delivery
Modern web information systems and web architectures
have to
be extensible to cater for emerging technolgies and new forms of
interaction (e.g. multimodal interaction)
manage heterogeneous information such as documents, structured
data, multimedia resources, semi-structured information, ...
integrate various sources (e.g. DBs) via multi-tier architectures
offer a notion of state to reflect the current application context
deal with information about users and their environment (context)
...
3. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3
Basic Client-Server Web Architecture
Effect of typing http://www.vub.ac.be in the broswer bar
(1) use a Domain Name Service (DNS) to get the IP address for
www.vub.ac.be (answer 134.184.129.2)
(2) create a TCP connection to 134.184.129.2
(3) send an HTTP request message over the TCP connection
(4) visualise the received HTTP response message in the browser
Internet
Client Server
HTTP Request
HTTP Response
4. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4
Web Server
Tasks of a web server
(1) setup connection
(2) receive and process
HTTP request
(3) fetch resource
(4) create and send
HTTP response
(5) logging
The most prominent web servers are the Apache HTTP
Server and Microsoft's Internet Information Services (IIS)
A lot of devices have an embedded web server
printers, WLAN routers, TVs, ...
Worldwide Web Servers, http://news.netcraft.com
5. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5
Example HTTP Request Message
GET / HTTP/1.1
Host: www.vub.ac.be
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101
Firefox/24.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
6. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6
Example HTTP Response Message
HTTP/1.1 200 OK
Date: Thu, 03 Oct 2013 17:02:19 GMT
Server: Apache/2.2.14 (Ubuntu)
X-Powered-By: PHP/5.3.2-1ubuntu4.15
Content-Language: nl
Set-Cookie: lang=nl; path=/; domain=.vub.ac.be; expires=Mon, 18-Sep-2073
17:02:16 GMT
Content-Type: text/html; charset=utf-8
Keep-Alive: timeout=15, max=987
Connection: Keep-Alive
Transfer-Encoding: chunked
<!DOCTYPE html>
<html lang="nl" dir="ltr">
<head>
...
<title>Vrije Universiteit Brussel | Redelijk eigenzinnig</title>
<meta name="Description" content="Welkom aan de VUB" />
...
</html>
7. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7
HTTP Protocol
Request/response communication model
HTTP Request
HTTP Response
Communication always has to be initiated by the client
Stateless protocol
HTTP can be used on top of various reliable protocols
TCP is by far the most commonly used one
runs on TCP port 80 by default
Latest version: HTTP/1.1
HTTPS scheme used for encrypted connections
8. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8
Uniform Resource Identifier (URI)
A Uniform Resource Identifier (URI) uniquely
identifies a resource
There are two types of URIs
Uniform Resource Locator (URL)
- contains information about the exact location of a resource
- consists of a scheme, a host and the path (resource name)
- e.g. http://wise.vub.ac.be/beat-signer/
- problem: the URL changes if resource is moved!
• idea of Persistent Uniform Resource Locators (PURLs) [https://purl.oclc.org]
Uniform Resource Name (URN)
- unique and location independent name for a resource
- consists of a scheme name, a namespace identifier and a namespace-specific
string (separated by colons)
- e.g. urn:ISBN:3837027139
9. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9
HTTP Message Format
Request and response messages have the same format
<html>
...
</html>
HTTP/1.1 200 OK
Date: Thu, 03 Oct 2013 17:02:19 GMT
Server: Apache/2.2.14 (Ubuntu)
X-Powered-By: PHP/5.3.2-1ubuntu4.15
Transfer-Encoding: chunked
Content-Type: text/html
header field(s)
blank line (CRLF)
message body (optional)
start line
HTTP_message = start_line , {header} , "CRLF" , {body};
10. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10
HTTP Request Message
Request-specific start line
Methods
GET : get a resource from the server
HEAD : get the header only (no body)
POST : send data (in the body) to the server
PUT : store request body on server
TRACE : get the "final" request (after it has potentially been modified by proxies)
OPTIONS : get a list of methods supported by the server
DELETE: delete a resource on the server
start_line = method, " " , resource , " " , version;
method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" ,
"OPTIONS" , "DELETE";
resource = complete_URL | path;
version = "HTTP/" , major_version, "." , minor_version;
11. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11
HTTP Response Message
Response-specific start line
Status codes
100-199 : informational
200-299 : success (e.g. 200 for 'OK')
300-399 : redirection
400-499 : client error (e.g. 404 for 'Not Found')
500-599 : server error (e.g. 503 for 'Service Unavailable')
start_line = version , status_code , reason;
version = "HTTP/" , major_version, "." , minor_version;
status_code = digit , digit , digit;
reason = string_phrase;
12. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12
HTTP Header Fields
There exist general headers (for requests and
responses), request headers, response headers, entity
headers and extension headers
Some important headers
Accept
- request header definining the Multipurpose Internet Mail Extensions (MIME)
that the client will accept
User-Agent
- request header specifying the type of client
Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1)
- general header helping to improve the performance since otherwise a new
HTTP connection has to be established for every single webpage element
Content-Type
- entity header specifing the body's MIME type
13. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13
HTTP Header Fields ...
Some important headers ...
If-Modified-Since
- request header that is used in combination with a GET request (conditional
GET); the resource is only returned if it has been modified since the specified
date
14. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14
MIME Types
The MIME type defines the request or response body's
content and is used for the appropiate processing
Standard MIME types are registered with the Internet
Assigned Numbers Authority (IANA) [RFC-2045]
mime = toplevel_type , "/" , subtype;
MIME Type Description
text/plain Human-readable text without formatting information
text/html HTML document
image/jpeg JPEG-encoded image
... ...
15. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15
HTTP Message Information
Various tools for HTTP message logging
e.g. HttpFox add-on for Firefox browser
Simple telnet connection
Until 1999 the W3C has been working on the HTTP Next
Generation (HTTP-NG) protocol as a replacement for
HTTP/1.1
never introduced
recently some work on HTTP/2.0
telnet wise.vub.ac.be 80 (press Enter)
GET /beat-signer HTTP/1.1 (press Enter)
Host: wise.vub.ac.be (press Enter 2 times)
16. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16
Proxies
A web proxy is situated between the client and the server
acts as a server to the client and as a client to the server
can for example be specified in the browser settings; used for
- firewalls and content filters
- transcoding (on the fly transformation of HTTP message body)
- content router (e.g. select optimal server in content distribution networks)
- anonymous browsing, ...
Internet
Client Server
Proxy
17. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17
Caches
A proxy cache is a special type of proxy server
can reduce server load if multiple clients share the same cache
often multi-level hierarchies of caches (e.g. continent, country
and regional level) with communication between sibling and
parent caches as defined by the Internet Cache Protocol (ICP)
passive or active (prefetching) caches
Internet
Client 1
Proxy Cache Server
Client 2
1
2
2 1
18. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18
Caches ...
Special HTTP cache control header fields
Expires
- expiration date after which the cached resource has to be refetched
Cache-Control: max-age
- maximum age of a document (in seconds) after it has been added to the cache
Cache-Control: no-cache
- response cannot be directly served from the cache (has to be revalidated first)
...
Validators
Last-modified time as validator
- cache with resource that has been last modified at time t uses an
If-Modified-Since t request for updates
Entity tags (ETag)
- changed by the publisher if content has changed; If-None-Match etag request
19. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19
Caches ...
Advantages
reduces latency and used network bandwidth
reduces server load (client and reverse proxy caches)
transparent to client and server
Disadvantages
additional resources (hardware) required
might get stale data out of the cache
creates additional network traffic if we use an active caching
approach (prefetching) but achieve a low cache hit rate
server loses control (e.g. access statistics) since no longer all
requests have to be sent to the server
20. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20
Tunnels
Implement one protocol on top of another protocol
e.g. HTTP as a carrier for SSL connections
Often used to "open" a firewall to protocols that would
otherwise be blocked
e.g. tunneling of SSL connections through an open HTTP port
Internet
SSL Client SSL Server
SSL
HTTP
SSL
HTTP[SSL] HTTP[SSL]
21. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21
Gateways
A gateway can act as a kind of "glue" between
applications (client) and resources (server)
translate between two protocols (e.g. from HTTP to FTP)
security accelerator (e.g. HTTPS/HTTP on the server side)
often the gateway and destination server are combined in a single
application server (HTTP to server application translator)
Internet
HTTP Client FTP Server
HTTP/FTP
Gateway
HTTP
FTP
22. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22
Session Management
HTTP is a stateless protocol
Session (state) tracking solutions
use of IP address
- problem: IP address is often not uniquely assigned to a single user
browser login
- use of special HTTP authenticate headers
- after a login the browser sends the user information in each request
URL rewriting
- add information to the URL in each request
hidden form fields
- similar to URL rewriting but information can also be in body (POST request)
cookies
- the server stores a piece of information on the client which is then sent back to
the server with each request
23. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23
Cookies
Introduced by Netscape in June 1994
A cookie is a piece of information that is
assigned to a client on their first visit
list of <key,value> pairs
often just a unique identifier
sent via Set-Cookie or Set-Cookie2 HTTP response headers
Browser stores the information in a "cookie database" and
sends it back every time the same server is accessed
Potential privacy issues
third-party websites might use persistent cookies for user tracking
Cookies can be disabled in the browser settings
24. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24
Hypertext Markup Language (HTML)
Dominant markup language for webpages
If you never heard about HTML have a look at
http://www.w3schools.com/html/
More details in the exercise and in the next lecture
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Beat Signer: Interactive Paper, PaperWorks, Paper++, ...</title>
</head>
<body>
Beat Signer is Associate Professor of Computer Science at the VUB ...
</body>
</html>
25. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25
Dynamic Web Content
Often it is not enough to serve static web pages but
content should be changed on the client or server side
Server-side processing
Common Gateway Interface (CGI)
Java Servlets
JavaServer Pages (JSP)
PHP: Hypertext Preprocessor (PHP)
...
Client-side processing
JavaScript
Java Applets
Adobe Flash
...
26. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26
Common Gateway Interface (CGI)
CGI was the first server-side processing solution
transparent to the user
certain requests (e.g. /account.pl) are forwarded via CGI to a
program by creating a new process
program processes the request and creates an answer with
optional HTTP response headers
Internet
Client Server
HTTP Request
HTTP Response
Program in
Perl, Tcl, C,
C++, Java, ..
HTML Pages
CGI
27. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27
Common Gateway Interface (CGI) ...
CGI Problems
a new process has to be started for each request
if the CGI program for example acts as a gateway to a database,
a new DB connection has to be established for each request
which results in a very poor performance
FastCGI solves some of the problems by introducing
persistent processes and process pools
CGI/FastCGI becomes more and more replaced by other
technologies (e.g. Java Servlets)
28. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28
Java Servlets
A Java servlet is a Java class that has to extend the
abstract HTTPServlet class
The Java servlet class is loaded by a servlet container
and relevant requests (based on a servlet binding) are
forwarded to the servlet instance for further processing
Internet
Client Server
HTTP Request
HTTP Response
HTML Pages
Servlet
Container
Servlets
29. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29
Java Servlets ...
Main HttpServlet methods
Servlet life cycle
a servlet is initialised once via the init() method
the doGet(), doPost() methods may be executed multiple
times (by different HTTP requests)
finally the servlet container may unload a servlet (upcall of the
destroy() method before that happens)
Servlet container (e.g. Apache Tomcat) either integrated
with web server or as standalone component
doGet(HttpServletRequest req, HttpServletResponse resp)
doPost(HttpServletRequest req, HttpServletResponse resp)
init(ServletConfig config)
destroy()
30. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30
Java Servlet Example
In the exercise you will learn how to process parameters etc.
package org.vub.wise;
import java.io.*;
import java.util.Date;
import javax.servlet.http.*;
import javax.servlet.*;
public class HelloWorldServlet extends HttpServlet {
public void doGet (HttpServletRequest req, HttpServletResponse res)
throws ServletException, IOException {
PrintWriter out = res.getWriter();
out.println("<html>");
out.println("<head><title>Hello World</title></head>");
out.println("<body>The time is " + new Date().toString() + "</body>");
out.println("</html>");
out.close();
}
}
31. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31
JavaServer Pages (JSP)
A "drawback" of Java servlets is that the whole page
(e.g. HTML) has to be defined within the servlet
not easy to share tasks between web designer and programmer
Add program code through scriptlets and markup to
existing HTML pages
These JSP documents are then either interpreted on the
fly (Apache Tomcat) or compiled into Java servlets
The JSP approach is similar to PHP or Active Server
Pages (ASP)
Note that Java servlets become more and more an
enabling technology (as with JSP)
32. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32
JavaScript
Interpreted scripting language for client-side processing
JavaScript functionality often embedded in HTML
documents but can also be provided in separate files
JavaScript often used to
validate data (e.g. in a form)
dynamically add content to a webpage
process events (onLoad, onFocus, etc.)
change parts of the original HTML document
create cookies
...
Note: Java and JavaScript are completely different
languages!
33. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33
JavaScript Example
Please have a look at the following JavaScript tutorial to
learn some of the basic constructs (operators, control
statements, etc.)
http://www.w3schools.com/JS/
In the exercise session you will use JavaScript to
implement a web application
<html>
<body>
<script type="text/javascript">
document.write("<h1>Hello World!</h1>");
</script>
</body>
</html>
34. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34
Java Applets
A Java applet is a program delivered to the client side in
the form of Java bytecode
executed in the browser using a Java Virtual Machine (JVM)
an applet has to extend the Applet or JApplet class
runs in the sandbox
Advantages
the user automatically always has the most recent version
high security for untrusted applets
full Java API available
Disadvantages
requires a browser Java plug-in
35. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35
Java Applets ...
Disadvantages ...
only signed applets can get more advanced functionality
- e.g. network connections to other machines than the source machine
More recently Java Web Start (JavaWS) is replacing
Java Applets
program no longer runs within the browser
- less problematic security restrictions
- less browser compatibility issues
Java Chess Applet Example
http://english.op.org/~peter/ChessApp/
36. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36
Exercise 2
Hands-on experience with the HTTP protocol
37. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37
References
David Gourley et al., HTTP: The Definitive
Guide, O'Reilly Media, September 2002
R. Fielding et al., RFC2616 - Hypertext Transfer
Protocol - HTTP/1.1
http://www.faqs.org/rfcs/rfc2616.html
N. Freed et al., RFC2045 - Multipurpose Internet Mail
Extensions (MIME)
http://www.faqs.org/rfcs/rfc2045.html
HTML and JavaScript Tutorials
http://www.w3schools.com
38. October 3, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38
References ...
Mick Knutson, HTTP: The Hypertext Transfer
Protocol (refcardz #172)
http://refcardz.dzone.com/refcardz/http-hypertext-transfer-
0
W. Jason Gilmore, PHP 5.4 (refcardz #23)
http://refcardz.dzone.com/refcardz/php-54-scalable
Java Servlet Tutorial
http://www.tutorialspoint.com/servlets/
39. 2 December 2005
Next Lecture
HTML5 and the Open Web Platform