20120140504021

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME
198
NETWORK TRAFFIC OPTIMIZATION FOR PERFORMANCE
IMPROVEMENT IN THE WEB SERVICE INFRASTRUCTURES BY
CATEGORIZATION OF THE WEB CONTENTS WITH SIZE REDUCTION
APPROACH
Dr. Suryakant B Patil1
, Ms. Sonal S Deshmukh2
, Ms. Anuja D Bharate3
, Dr. Preeti Patil4
1
Professor, JSPM’s Imperial College of Engineering & Research, Wagholi, Pune
2, 3
PG Research Scholar, JSPM’s ICOER, Wagholi, Pune
4
Dean (SA), HOD & Professor, KIT’s COE, Kolhapur
ABSTRACT
Nowadays, the network traffic is tremendously increased. This has affected on the additional
requirements of the bandwidth, require more time to access the data from server which means
increased in response time. In this paper, we are proposing the mechanism of Latency Time, using
the algorithm MD5, with the help of hash key and using the formula to improve the efficiency or
performance of the system. The problem in proxy server cache memory is Content aliasing, means,
the same content occurs multiple times. With the help of proxy servers, users can directly fetch the
data, rather to go towards web server. So, the workload of web server will be reduced. The main goal
is to find the Response time with different downlink rate, according to the institute schedule which
we surveyed, minimize the bandwidth utilization and improve the performance of the proxy server
which can be measured by removal of content aliasing.
For this experimentation, we surveyed JSPM’s Wagholi campus, having 5 institutes in it.
Categories and Subject Descriptors
C.2.3 [Network Operation]: Network Management
C.4 [Performance of System]: Design Studies
GENERAL TERMS: Performance, Reliability, Experimentation, Algorithms.
Keywords: Content aliasing, MD5, Latency Time, Proxy Server.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING
AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 4, April (2014), pp. 198-204
© IAEME: www.iaeme.com/ijaret.asp
Journal Impact Factor (2014): 7.8273 (Calculated by GISI)
www.jifactor.com
IJARET
© I A E M E

199
1. INTRODUCTION
In the field of web server management, researchers have focused on aliasing in proxy server
caches for a long time [6]. Web caching consists of storing frequently referred objects on a caching
server instead of the original server, so that web servers can make better use of network bandwidth,
reduce the workload on servers, and improve the response time for users [12, 14]. Aliasing means
giving multiple names to the same thing.
Aliasing in proxy server caches occurs when the same content is stored in cache multiple
times [5]. A proxy server acts as a mediator between the original server and the clients.
On the World Wide Web, aliasing commonly occurs when a client makes two requests, and
both the requests have the same payload. Currently, browsers perform cache lookups using Uniform
Resource Locators (URLs) as identifiers. Aliasing causes repetitive data transfers even when the
current request has already been cached under a different URL [8, 11].
Websites that contain the same content are called mirrors. Mirrors are redundancy
mechanisms built into the web space to serve web pages faster, but they cost in terms of cache space
[9, 13]. As the amount of web traffic increases, the efficient utilization of network bandwidth
increasingly becomes more important. The Technique needs to analyze web traffic and understand its
characteristics to be able to optimize the use of network bandwidth, to reduce network latency, and to
improve response time for users [1].
Nowadays, usage of World Wide Web has been tremendously increased. In any
organization/institute, internet is using by most of the students, faculties, administrative department
for different purposes [4, 7].
The web pages, which are accessed by the all users, that get stored somewhere, we call that
storage “Cache proxy memory”, which is available at client side.
It may happen that, some of the users requests for same URL’s, so the pages get stored
multiple times in cache memory. So duplication occurs & it grabs more space, which causes to the
poor performance. So the eviction of content is very necessary. Therefore, we are using various
methods. After the removal of content aliasing, results are better performance, less space. Still there
may be lots of other content in cache, which occupies more space. So, it is mandatory to filter that
content from cache by using various replacement policies such as FIFO, LRU, LFU, etc.
FIFO: It is first in first out policy, manage by queue. When cache get full then it is mandatory to
avail some space. So, this scheme removes the pages, which get inserted first.
LRU: It is Least Recently Used policy, which filter out the web pages that will not be used in near
future.
LFU: It is Least Frequently Used policy, which evicts the web pages which are not frequently used.
From our survey, it is observed that the static data is used more than dynamic data. So, to keep the
static content in the cache is beneficial, as dynamic gets updated frequently. For same request of
users, proxy doesn’t have to access the web server and web server’s workload also gets reduced. So
it results to improve the latency time of cache.
II. LITERATURE SURVEY
Shivkumar and Garcia-Molina investigated mirroring in a large crawler data set and reported
that in the WebTV client trace far more aliasing happens than expected. In fact, they reported that
36% of reply bodies are accessible through more than one URL [7]. Similarly, surveyed techniques
for identifying mirrors on the Internet [3].Investigated mirroring in a large crawler data set and

200
reported that roughly 10% of popular hosts are mirrored to some extent [3].Considered approximate
mirroring or “syntactic similarity” [3]. Although they introduce sophisticated measures of document
similarity, they report that most “clusters” of similar documents in a large crawler data set contain
only identical documents.
Duplication has both positive and negative aspects. On one hand the redundancy makes
retrieval easier: if a search engine has missed one copy, maybe it has the other; or if one page has
become unavailable, maybe a replica can be retrieved. On the other hand, from the point of view of
search engines storing duplicate content is a waste of resources and from the user’s point of view,
getting duplicate answers in response to a query is a nuisance.
The principal reason for duplication on the Web is the systematic replication of content
across distinct hosts, a phenomenon known as “mirroring” (These notions are defined more precisely
below.) It is estimated that at least 10% of the hosts on the WWW are mirrored. Each document on
the WWW has a unique name called the Universal Resource Locator (URL). The URL consists of
three disjoint parts, namely the access method, a hostname, and a path.
III. EXPERIMENTATION AND RESULTS
Besides the obvious goals of Web caching system, a Web caching system having a number of
properties. They are fast access, robustness, transparency, scalability, efficiency, adaptivity, stability,
load balanced, ability to deal with heterogeneity, and simplicity. Discuss below.
• Fast access: From user’s point of view, access latency is an important measurement of the
quality of Web service. A desirable caching system should aim at reducing Web access latency.
In particular, it should provide user a lower latency on average than those without employing a
caching system. Robustness. From user’s prospect, the robustness means availability, which is
another important measurement of quality of Web service. User’s desire to have Web service
available whenever they want. The robustness has three aspects. First, it’s desirable that a few
proxies crash wouldn’t tear the entire system down. The caching system should eliminate the
single point failure as much as possible. Second, the caching system should fall back gracefully
in case of failures. Third, the caching system would be design in such a way that it’s easy to
recover from a failure.
• Transparency: A Web caching system should be transparent for the user, the only results user
should notice are faster response and higher availability.
• Scalability: We have seen an explosive growth in network size and density in last decades and
is facing a more rapid increasing growth in near future. The key to success in such an
environment is the scalability. We would like a caching scheme to scale well along the
increasing size and density of network. This requires all protocols employed in the caching
system to be a slight weight as possible.
• Efficiency: There are two aspects to efficiency. First, how much over head does the Web
caching system impose on network? We would like a caching system to impose a minimal
additional burden on the network. This includes both control packets and extra data packets
incurred by using a caching system. Second, the caching system shouldn’t adopt any scheme
which leads to underutilization of critical resources in network.

201
• Load balancing: It’s desirable that the caching scheme distributes the load evenly through the
entire network. A single proxy/server shouldn’t be a bottleneck (or hotspot) and there by
degrades the performance of a portion of the network or even slow down the entire service
system.
• Ability to deal with heterogeneity: As networks grow in scale and coverage, they span arrange
of hardware and software architectures. The Web caching scheme need adapt to arrange of
network architectures.
Fig. 1: Content Classification for different Content categories.
• Simplicity: Simplicity is always an asset. Simpler schemes are easier to implement and likely to
be accepted as international standards. We would like an ideal Web caching mechanism to be
simple to deploy.
• Adaptivity: It’s desirable to make the caching system adapt to the dynamic changing of the user
demand and the network environment. The adaptivity involves several aspects: cache
management, cache routing, proxy placement, etc. This is essential to achieve optimal
performance.
• Stability: The schemes used in Web caching system shouldn’t introduce in stabilities into the
network.
Campus includes total 5 institutes, we observed all that.Figure1 shows, the classification of all
the static data for whole campus. Figure1 directly shows that, in which category the user is
mostly interested. As we surveyed at our Institute Campus, most of the students request for the
same pages, so duplication may occur. So, the cache require more space, which leads to poor
performance. The blue line(upper line) in figure shows the content which has duplication(with
CA) and red line(lower line) shows the content without duplication(without CA).
0
1000
2000
3000
4000
5000
6000
GIF PNG JPG HTML
Size(KB)
Content Categories
JSPM's Wagholi Campus Classification Content Categories
With CA
Without CA

202
Fig. 2: Total data classification for Campus with and without Content Aliasing
Figure2 shows that, the classification of static and dynamic data for whole campus. The
joined bar shows the duplicated and not-duplicated data. So, just observing the figure, data get
reduced after duplication.
Fig. 3: Size Reduction in different institutes at JSPM’s Wagholi Campus
Figure3 shows that, data for different institutes at JSPM’s Wagholi Campus.HTTP defines
several headers which were specifically designed to support caching. Though the HTTP specification
specifies certain behaviors for web caches, it does not specify how to keep cached objects up to date.
From our survey, at JSPM’s Wagholi Campus, we come up with some measures of the various
categories and sizes.
0
5000
10000
15000
20000
25000
STATIC DYNAMIC TOTAL
Size(KB)
Category
JSPM's Wagholi Campus Classification Static Dynamic
With CA
Without CA
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
ICOER BSIOTR CHARAK ENIAC KAUTILYA TOTAL
Size(KB)
Various Institutes
Size Reductions in all Institutes of JSPM's Wagholi
Before CA
After CA

203
The HTTP GET message is used to retrieve a web object given its URL. However GET alone
does not guarantee that it will return a fresh object. HTTP headers that may effect caching can be
classified into two categories. The first category includes headers appended to retrieve a web object
for cache control. The second category includes headers appended when a web object is returned.
IV. CONCLUSION
In an Environment where saving bandwidth on the shared external network is of utmost
importance, the proxy cache should use a replacement policy that achieves high byte hit rates. A
proxy cache could also utilize multiple replacement policies. This work can be further optimize by
the Daemon Process, which can be design and run periodically to check the consistency of the data
cached and the data at the web server. This can be scheduled during the slack time with the less
traffic which will not add any additional toll on the bandwidth. When caching is implemented,
frequently accessed content is stored close to the users, eliminating this duplicated effort. A request
from a user’s browser is first sent to the network’s caching server. If the requested contentfound in
the web cache and the information is fresh, the content is sent directly back to the requester, skipping
an upstream journey to the target website. Which we have shown here with our experimentations at
JSPM’s Wagholi Campus with end result is marginable reduction in the size followed by bandwidth.
REFERENCES
[1] KartikBommepally, Glisa T. K., Jeena J. Prakash, SanasamRanbir Singh and Hema A Murthy
“Internet Activity Analysis through Proxy Log” IEEE, 2010.
[2] Jun Wu; Ravindran, K., "Optimization algorithms for proxy server placement in content
distribution networks," Integrated Network Management-Workshops, 2009.
[3] Ngamsuriyaroj, S. ; Rattidham, P. ; Rassameeroj, I. ; Wongbuchasin, P. ; Aramkul, N. ;
Rungmano, S. “Performance Evaluation of Load Balanced Web Proxies” IEEE, 2011.
[4] Chen, W.; Martin, P.; Hassanein, H.S., "Caching dynamic content on the Web," Canadian
Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp. 947- 950 vol.2, 4-7
May 2003.
[5] Sadhna Ahuja, Tao Wu and Sudhir Dixit “On the Effects of Content Compression on Web
Cache Performance,” Proceedings of the International Conference on Information
Technology: Computers and Communications, 2003.
[6] A. Mahanti, C. Williamson, and D. Eager, “Traffic Analysis of a Web Proxy Caching
Hierarchy,” IEEE Network Magazine, May 2000.
[7] N. Shivakumar and H. Garcia-Molina, “Finding near Replicas of Documents on the Web”
Proc. Workshop on Web Databases, Mar. 1998.
[8] Jeffrey C. mogul “A trace-based analysis of duplicate suppression in HTTP,” Compaq
Computer Corporation Western Research Laboratory, Nov. 1999.
[9] S B Patil, SachinChavan, PreetiPatil; “High Quality Design and Methodology Aspects to
Enhance Large Scale Web Services”, International Journal of Advances in Engineering &
Technology (IJAET-2012), ISSN: 2231-1963, March 2012, Volume3, Issue1, Pages175-185.
(Journal Impact Factor: 1.96)
[10] Prof. S B Patil, Sachin Chavan, Dr. Preeti Patil and Prof. Sunita R Patil, “High Quality
Design to Enhance and Improve Performance of Large Scale Web Applications”,
International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1,
2012, pp. 198 - 205, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. (Journal Impact
Factor: 1.0425)

204
[11] S B Patil, D. B. Kulkarni; “Improving web performance through Hierarchical caching &
content aliasing”, The 7th International Conference on “Information Integration and Web-
based Applications & Services”, 19-21 September 2005, Kuala Lumpur, Malaysia.
[12] Srikantha Rao, PreetiPatil, S B Patil, SunitaPatil“Customized Approach for Efficient Data
Storing and Retrieving from University Database Using Repetitive Frequency Indexing”,
IEEE INTERNATIONAL CONFERENCE PUBLICATIONS, RAIT 2012, ISM Dhanbad,
Jahrkhand, March 15–17, 2012 (Aavailable on IEEE Xplore) Print ISBN: 978-1-4577-0694-
3, Digital Object Identifier: 10.1109/RAIT.2012.6194612 Page(s): 511 – 514
[13] Srikantha Rao, PreetiPatil, S B Patil;“Enhanced Software Development Strategy implying
High Quality Design for Large Scale Database Projects”, International Conference and
Workshop on Emerging Trends in Technology ICWET 2012, ISBN: 978-0-615-58717-2,
TCET Mumbai, February 22–25, 2012, Pages: 508-513
[14] Srikantha Rao, PreetiPatil, S B Patil;“Object-Oriented Software Engineering Paradigm: A
Seamless Interface in Software Development Life Cycle”, ACM_Asia_Pacific International
Conference on Advances in Computing (ICAC-2008), Anuradha Engineering College,
Chikhali, Feb 2008.
[15] S.Saira Thabasum, “Need for Design Patterns and Frameworks for Quality Software
Development”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 1, 2012, pp. 54 - 58, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[16] Dr.K.Prasadh and R.Senthilkumar, “Nonhomogeneous Network Traffic Control System
Using Queueing Theory”, International Journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 3, 2012, pp. 394 - 405, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
[17] Sachin Chavan and Nitin Chavan, “Improving Access Latency of Web Browser by using
Content Aliasing in Proxy Cache Server”, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 2, 2013, pp. 356 - 365, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.

20120140504021

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (7)

Destaque

Destaque (20)

Semelhante a 20120140504021

Semelhante a 20120140504021 (20)

Mais de IAEME Publication

Mais de IAEME Publication (20)

Último

Último (20)

20120140504021