2. WORLD WIDE WEB
The concept of Web was first proposed by Tim Berners-Lee and was
commercially started in the early 1990s. At present, the web is a
repository of documents called web pages. Web pages are distributed
all over the world and related pages are linked together. Each web
page is a file with a name and address.
Web pages are linked together using hypertext, which takes user from
one document to another when a link to it appears in the document.
Hypermedia is a similar concept but it includes not only text but also
images, audio and video files in web page documents.
WWW is a distributed client-server service. The service is distributed
over multiple locations called sites. Each site can have one or many
3. WEB CONCEPTS
• Web Client: Web client or browser interpret and display a web page. It has three
parts: a controller, client protocols & interpreters. Ex. Google, Safari, etc.
• Web Server: Servers store web pages. Each time a request is sent to the server, it
sends back corresponding web pages.
• Web Documents: Web Documents are of three types-
• Static Document: Fixed-content documents that cannot be changed by user.
When a request is sent by browser, only a copy of the document is sent back.
It is created using markup languages such as: HTML, XML, XHTML, etc.
• Dynamic document: Created by servers whenever browser requests the
document. Each request creates a fresh document using pre-written script or
program. These documents are created using scripting languages such as
JSP, ASP, ColdFusion, etc.
• Active Document: Upon request from a browser, document or script is sent
back to be run at the client site. Can be created using Java Applets.
4. UNIFORM RESOURCE LOCATOR (URL)
• Serves as a unique identifier for a web page to distinguish it from others.
• Four identifiers are required to define a web page namely- Protocol, Host, Port
• Protocol: Client Server program required to access web page. Protocols
generally used include HTTP, FTP, etc.
• Host: IP address or unique name given to server. IP address can be in
dotted-decimal notation and name can be the unique domain name of host.
• Port: Predefined 16-bit integer for client-server application. For eg. port no.
for HTTP is 80.
• Path: Identifies the location and name of file in underlying operating system.
• URL is composed of the above four identifiers in the following format:
protocol://host:port/path (port is optional)
5. HYPERTEXT TRANSFER PROTOCOL (HTTP)
• HTTP is an application-layer protocol for transmitting hypermedia documents,
such as HTML.
• Under HTTP protocol, the serve has port number 80 while client uses temporary
• HTTP uses TCP which is connection-oriented and reliable.
• When a client sends request to server, a connection must be established first and
then any transaction may take place. After transaction connection is terminated.
• Only one TCP connection is used to transfer data.
• Commands from the client to the serve are embedded in the request message.
• HTTP is a stateless protocol, meaning that the server does not keep any data
(state) between two requests.
• The messages sent by the client, usually a Web browser, are called requests and
the messages sent by the server as an answer are called responses.
• When HTTP is run over SSL, it is referred to as HTTPS and provides security.
6. PERSISTENT VS NON-PERSISTENT CONNECTIONS
• To retrieve documents from a single server, two methods can be used-
non-persistent and persistent connections.
• Persistent connections are the default since version 1.1 of HTTP but
can be changed to non-persistent which were the default prior to this
• Non-persistent connections- It involves creating a new TCP connection
for each document being retrieved from a server.
• Persistent connections- It includes opening only one TCP connection
for a server and using it to retrieve documents. Connection can be
closed upon request from client or time-out.
7. MESSAGE FORMATS
• Each message is made of
• In request message first
section is request line and in
response message is status
• Other three sections have
same names - Header lines,
Blank line and Entity body.
Response message (Source: internet)
Request message (Source: internet)
8. REQUEST MESSAGE
• Request line consists of three fields- method, URL and version.
• Methods define the request type. Several methods are defined in
HTTP 1.1 as shown below.
• Zero or more request header lines can be used to send additional
information from client to server. Each header contains a header
name, a colon, a space, and a header value.
• Entity body includes comments to be sent or file to be published on
website and may or may not be present.
9. RESPONSE MESSAGE
• Status line is composed of a status code, version
• Status code is a 3-digit number which defines the
status of the request.
• The status phrase explains the status code in text
• Zero or more request header lines can be used to
send additional information from client to server.
Each header contains a header name, a colon, a
space, and a header value.
• The body contains the document to be sent from
the server to the client. It is present unless the
response is error message.
• Cookies are pieces of data used to remember information about clients.
• When a server receives a request from a client, the server stores some
information about the client in a file or string. It includes the cookie in the
response sent to the client where the cookie is stored by the client’s bowser
in the cookie directory.
• When a client sends a request to the server, the browser looks up the
cookie directory to check if cookies from this server is present.If yes, then
the browser includes the cookie in the request and the server can then use
• Only the server that creates a cookie can access its content.
• A proxy server is a computer that keeps copies of responses to recent
• HTTP supports proxy servers.
• Proxy servers function as both client and server.
• When HTTP client sends request to proxy server, the proxy server checks its
cache, if response is not stored in cache, it redirects the request to
• Incoming responses are sent to proxy servers for future use.
• Proxy servers reduce the load on the original server, decrease traffic and
improves the latency.
• Client must be configured to access the proxy server instead of target server.