SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
1




        Design and Evaluation of a Proxy Cache for
                   Peer to Peer Traffic
 Mohamed Hefeeda, Senior Member, IEEE, and Cheng-Hsin Hsu, Student Member, IEEE, and Kianoosh
                             Mokhtarian, Student Member, IEEE

       Abstract—Peer-to-peer (P2P) systems generate a major fraction of the current Internet traffic, and they significantly increase the load
       on ISP networks and the cost of running and connecting customer networks (e.g., universities and companies) to the Internet. To
       mitigate these negative impacts, many previous works in the literature have proposed caching of P2P traffic, but very few (if any) have
       considered designing a caching system to actually do it. This paper demonstrates that caching P2P traffic is more complex than caching
       other Internet traffic, and it needs several new algorithms and storage systems. Then, the paper presents the design and evaluation of
       a complete, running, proxy cache for P2P traffic, called pCache. pCache transparently intercepts and serves traffic from different P2P
       systems. A new storage system is proposed and implemented in pCache. This storage system is optimized for storing P2P traffic, and
       it is shown to outperform other storage systems. In addition, a new algorithm to infer the information required to store and serve P2P
       traffic by the cache is proposed. Furthermore, extensive experiments to evaluate all aspects of pCache using actual implementation
       and real P2P traffic are presented.

       Index Terms—Caching, Peer-to-Peer Systems, File Sharing, Storage Systems, Performance Evaluation

                                                                               !



1     I NTRODUCTION                                                                   In this paper, we present the design, implementation, and
                                                                                   evaluation of a proxy cache for P2P traffic, which we call
File-sharing using peer-to-peer (P2P) systems is currently the                     pCache. pCache is designed to transparently intercept and
killer application for the Internet. A huge amount of traffic                       serve traffic from different P2P systems, while not affect-
is generated daily by P2P systems [1]–[3]. This huge amount                        ing traffic from other Internet applications. pCache explicitly
of traffic costs university campuses thousands of dollars every                     considers the characteristics and requirements of P2P traffic.
year. Internet service providers (ISPs) also suffer from P2P                       As shown by numerous previous studies, e.g., [1], [2], [8],
traffic [4], because it increases the load on their routers and                     [10], [11], network traffic generated by P2P applications has
links. Some ISPs shape or even block P2P traffic to reduce                          different characteristics than traffic generated by other Internet
the cost. This may not be possible for some ISPs, because                          applications. For example, object size, object popularity, and
they fear losing customers to their competitors. To mitigate the                   connection pattern in P2P systems are quite different from
negative effects of P2P traffic, several approaches have been                       their counterparts in web systems. Most of these characteristics
proposed in the literature, such as designing locality-aware                       impact the performance of caching systems, and therefore,
neighbor selection algorithms [5] and caching of P2P traffic                        they should be considered in their design. In addition, as will
[6]–[8]. We believe that caching is a promising approach to                        be demonstrated in this paper, designing proxy caches for
mitigate some of the negative consequences of P2P traffic,                          P2P traffic is more complex than designing caches for web
because objects in P2P systems are mostly immutable [1]                            or multimedia traffic, because of the multitude and diverse
and the traffic is highly repetitive [9]. In addition, caching                      nature of existing P2P systems. Furthermore, P2P protocols
does not require changing P2P protocols and can be deployed                        are not standardized and their designers did not provision for
transparently from clients. Therefore, ISPs can readily deploy                     the potential caching of P2P traffic. Therefore, even deciding
caching systems to reduce their costs. Furthermore, caching                        whether a connection carries P2P traffic and if so extracting
can co-exist with other approaches, e.g., enhancing P2P traffic                     the relevant information for the cache to function (e.g., the
locality, to address the problems created by the enormous                          requested byte range) are non-trivial tasks.
volume of P2P traffic.
                                                                                     The main contributions of this paper are as follows.

• M. Hefeeda is with the School of Computing Science, Simon Fraser                   •   We propose a new storage system optimized for P2P
  University, 250-13450 102nd Ave, Surrey, BC V3T0A3, Canada.                            proxy caches. The proposed system efficiently serves
  Email: mhefeeda@cs.sfu.ca.
• C. Hsu is with Deutsche Telekom R&D Lab USA, 5050 El Camino Real
                                                                                         requests for object segments of arbitrary lengths, and it
  Suite 221, Los Altos, CA 94022.                                                        has a dynamic segment merging scheme to minimize the
• K. Mokhtarian is with Mobidia Inc., 230-10451 Shellbridge Way,                         number of disk I/O operations. Using traces collected
  Richmond, BC V6X 2W8, Canada.
                                                                                         from a popular P2P system, we show that the proposed
    This work is partially supported by the Natural Sciences and Engineering             storage system is much more efficient in handling P2P
    Research Council (NSERC) of Canada.                                                  traffic than storage systems designed for web proxy
                                                                                         caches. More specifically, our experimental results indi-
2



      cate that the proposed storage system is about five times      effective handling of P2P traffic in order to reduce its negative
      more efficient than the storage system used in a famous        impacts on ISPs and the Internet.
      web proxy implementation.                                        The rest of this paper is organized as follows. In Sec. 2, we
  •   We propose a new algorithm to infer the information           summarize the related work. Sec. 3 presents an overview of the
      required to store and serve P2P traffic by the cache.          proposed pCache system. This is followed by separate sections
      This algorithm is needed because some P2P systems (e.g.,      describing different components of pCache. Sec. 4 presents the
      BitTorrent) maintain information about the exchanged ob-      new inference algorithm. Sec. 5 presents the proposed storage
      jects in metafiles that are held by peers, and peers request   management system. Sec. 6 explains why full transparency
      data relative to information in these metafiles, which         is required in P2P proxy caches and describes our method to
      are not known to the cache. Our inference algorithm is        achieve it in pCache. It also explains how non-P2P connections
      efficient and provides a quantifiable confidence on the          are efficiently tunneled through pCache. We experimentally
      returned results. Our experiments with real BitTorrent        evaluate the performance of pCache in Sec. 7. Finally, in
      clients running behind our pCache confirm our theoretical      Sec. 8, we conclude the paper and outline possible extensions
      analysis and show that the proposed algorithm returns         for pCache.
      correct estimates in more than 99.7% of the cases. Also,
      the ideas of the inference algorithm are general and could
      be useful in inferring other characteristics of P2P traffic.
                                                                    2    R ELATED W ORK
  •   We demonstrate the importance and difficulty of provid-        P2P Traffic Caching: Models and Systems. The benefits
      ing full transparency in P2P proxy caches, and we present     of caching P2P traffic have been shown in [9] and [4]. Object
      our solution for implementing it in the Linux kernel. We      replacement algorithms especially designed for proxy cache of
      also propose efficient splicing of non-P2P connections in      P2P traffic have also been proposed in [6] and in our previous
      order to reduce the processing and memory overhead on         works [8], [11]. The above works do not present the design
      the cache.                                                    and implementation of an actual proxy cache for P2P traffic,
  •   We conduct an extensive experimental study to evaluate        nor do they address storage management issues in such caches.
      the proposed pCache system and its individual com-            The authors of [7] propose using already-deployed web caches
      ponents using actual implementation in two real P2P           to serve P2P traffic. This, however, requires modifying the
      networks: BitTorrent and Gnutella which are currently         P2P protocols to wrap their messages in HTTP format and
      among the most popular P2P systems. (A recent white           to discover the location of the nearest web cache. Given the
      paper indicates the five most popular P2P protocols are:       distributed and autonomous nature of the communities devel-
      BitTorrent, eDonkey, Gnutella, DirectConnect, and Ares        oping P2P client software, incorporating these modifications
      [12]). BitTorrent and Gnutella are chosen because they        into actual clients may not be practical. In addition, several
      are different in their design and operation: The former is    measurement studies have shown that the characteristics of
      swarm-based and uses built-in incentive schemes, while        P2P traffic are quite different from those of web traffic [1],
      the latter uses a two-tier overlay network. Our results       [2], [8], [10], [11]. These different characteristics indicate that
      show that our pCache is scalable and it benefits both the      web proxy caches may yield poor performance if they were
      clients and the ISP in which it is deployed, without hurt-    to be used for P2P traffic. The experimental results presented
      ing the performance of the P2P networks. Specifically,         in this paper confirm this intuition.
      clients behind the cache achieve much higher download            In order to store and serve P2P traffic, proxy caches need
      speeds than other clients running in the same conditions      to identify connections belonging to P2P systems and extract
      without the cache. In addition, a significant portion of the   needed information from them. Several previous works address
      traffic is served from the cache, which reduces the load       the problem of identifying P2P traffic, including [15]–[17].
      on the expensive WAN links for the ISP. Our results also      The authors of [15] identify P2P traffic based on application
      show that the cache does not reduce the connectivity of       signatures appearing in the TCP stream. The authors of [17]
      clients behind it, nor does it reduce their upload speeds.    analyze the behavior of different P2P protocols, and identify
      This is important for the whole P2P network, because          patterns specific to each protocol. The authors of [16] identify
      reduced connectivity could lead to decreased availability     P2P traffic using only transport layer information. A compar-
      of peers and the content stored on them, while reduced        ison among different methods is conducted in [18]. Unlike
      upload speeds could degrade the P2P network scalability.      our work, all previous identification approaches only detect
                                                                    the presence of P2P traffic: they just decide whether a packet
   In addition, this paper summarizes our experiences and           or a stream of packets belongs to one of the known P2P
lessons learned during designing and implementing a running         systems. This is useful in applications such as traffic shaping
system for caching P2P traffic, which was demonstrated at            and blocking, capacity planning, and service differentiation.
[13]. These lessons could be of interest to other researchers       P2P traffic caching, however, does need to go beyond just
and to companies developing products for managing P2P               identifying P2P traffic; it requires not only the exchanged
traffic. We make the source code of pCache (more than 11,000         object ID, but also the requested byte range of that object. In
lines of C++ code) and our P2P traffic traces available to the       some P2P protocols, this information can easily be obtained
research community [14]. Because it is open-source, pCache          by reading a few bytes from the payload, while in others it is
could stimulate more research on developing methods for             not straightforward.
3



   Several P2P caching products have been introduced to the                                                                   pCache

market. Oversi’s OverCache [19] implements P2P caching,                                      Storage System Manager
but takes a quite different approach compared to pCache. An                                              Block
                                                                                       In−memory                     Replacement
OverCache server participates in P2P networks and only serves                                          Allocation
                                                                                        Structures                     Policy
                                                                                                        Method
peers within the ISP. This approach may negatively affect fair-
ness in P2P networks because peers in ISPs with OverCache
deployed would never contribute as they can always receive                                                P2P Traffic Processor
                                                                                      Connection
data from the cache servers without uploading to others. This                          Manager         Parser   Composer Analyzer

in turns degrades the performance of P2P networks. PeerApp’s
UltraBand [20] supports multiple P2P protocols, such as Bit-                                 P2P
                                                                                             Traffic
Torrent, Gnutella, and FastTrack. These commercial products
                                                                                                                    Gateway Router
demonstrate the importance and timeliness of the problem
addressed in this paper. The details of these products, however,                        P2P Traffic Identifier
are not publicly available, thus the research community cannot
use them to evaluate new P2P caching algorithms. In contrast,                                                       Internet Traffic
the pCache system is open-source and could be used to develop                             Transparent Proxy

algorithms for effective handling of P2P traffic in order to
reduce loads on ISPs and the Internet. That is, we develop
                                                                     Fig. 1. The design of pCache.
pCache software as a proof of concept, rather than yet another
commercial P2P cache. Therefore, we do not compare our
pCache against these commercial products.                            multimedia object into segments using segmentation strategies
   Storage Management for Proxy Caches. Several storage              such as uniform, exponential, and frame-based. P2P proxy
management systems have been proposed in the literature              caches do not have the freedom to choose a segmentation
for web proxy caches [21]–[24] and for multimedia proxy              strategy; it is imposed by P2P software clients. Rate-split
caches [25]–[27]. These systems offer services that may not          caching employs scalable video coding that encodes a video
be useful for P2P proxy caches, e.g., minimizing startup delay       into several substreams, and selectively caches some of these
and clustering of multiple objects coming from the same              substreams. This requires scalable video coding structures,
web server, and they lack other services, e.g., serving partial      which renders rate-split caching useless for P2P applications.
requests with arbitrary byte ranges, which are essential in P2P
proxy caches. We describe some examples to show why such
systems are not suitable for P2P caches.                             3    OVERVIEW OF P C ACHE
   Most storage systems of web proxy caches are designed for         The proposed pCache is to be used by autonomous systems
small objects. For example, the Damelo system [28] supports          (ASes) or ISPs that are interested in reducing the burden
objects that have sizes less than a memory page, which is            of P2P traffic. Caches in different ASes work independently
rather small for P2P systems. The UCFS system [23] maintains         from each other; we do not consider cooperative caching
data structures to combine several tiny web objects together in      in this paper. pCache would be deployed at or near the
order to fill a single disk block and to increase disk utilization.   gateway router of an AS. The main components of pCache
This adds overhead and is not needed in P2P systems, because         are illustrated in Fig. 1. At high-level, a client participating in
a segment of an object is typically much larger than a disk          a P2P network issues a request to download an object. This
block. Clustering of web objects downloaded from the same            request is transparently intercepted by pCache. If the requested
web server is also common in web proxy caches [22], [24].            object or parts of it are stored in the cache, they are served
This clustering exploits the temporal correlation among these        to the requesting client. This saves bandwidth on the external
objects in order to store them near to each other on the disk.       (expensive) links to the Internet. If no part of the requested
This clustering is not useful in P2P systems, because even a         object is found in the cache, the request is forwarded to the
single object is typically downloaded from multiple senders.         P2P network. When the response comes back, pCache may
   Proxy caches for multimedia streaming systems [25]–[27],          store a copy of the object for future requests from other clients
on the other hand, could store large objects and serve byte          in its AS. Clients inside the AS as well as external clients are
ranges. Multimedia proxy caches can be roughly categorized           not aware of pCache, i.e., pCache is fully transparent in both
into four classes [25]: sliding-interval, prefix, segment-based,      directions.
and rate-split caching. Sliding-interval caching employs a              As shown in Fig. 1, the Transparent Proxy and P2P Traffic
sliding-window for each cached object to take advantage of           Identifier components reside on the gateway router. They
the sequential access pattern that is common in multimedia           transparently inspect traffic going through the router and
systems. Sliding-interval caching is not applicable to P2P           forward only P2P connections to pCache. Traffic that does
traffic, because P2P applications do not request segments in          not belong to any P2P system is processed by the router
a sequential manner. Prefix caching stores the initial portion        in the regular way and is not affected by the presence of
of multimedia objects to minimize client start-up delays. P2P        pCache. Once a connection is identified as belonging to a
applications seek shorter total download times rather than           P2P system, it is passed to the Connection Manager, which
start-up delays. Segment-based multimedia caching divides a          coordinates different components of pCache to store and serve
4



requests from this connection. pCache has a custom-designed       inspecting the byte stream of the connection, the Parser
Storage System optimized for P2P traffic. In addition, pCache      determines the boundaries of messages exchanged between
needs to communicate with peers from different P2P systems.       peers, and it extracts the request and response messages that
For each supported P2P system, the P2P Traffic Processor           are of interest to the cache. The Parser returns the ID of
provides three modules to enable this communication: Parser,      the object being downloaded in the session, as well as the
Composer, and Analyzer. The Parser performs functions such        requested byte range (start and end bytes). The byte range is
as identifying control and payload messages, and extracting       relative to the whole object. The Composer prepares protocol-
messages that could be of interest to the cache such as           specific messages, and may combine data stored in the cache
object request messages. The Composer constructs properly-        with data obtained from the network into one message to be
formatted messages to be sent to peers. The Analyzer is a         sent to a peer. While the Parser and Composer are mostly
place holder for any auxiliary functions that may need to be      implementation details, the Analyzer is not. The Analyzer
performed on P2P traffic from some systems. For example,           contains an algorithm to infer information required by the
in BitTorrent the Analyzer infers information (piece length)      cache to store and serve P2P traffic, but is not included in
needed by pCache that is not included in messages exchanged       the traffic itself. This is not unusual (as demonstrated below
between peers.                                                    for the widely-deployed BitTorrent), since P2P protocols are
   The design of the proposed P2P proxy cache is the re-          not standardized and their designers did not conceive/provision
sult of several iterations and refinements based on extensive      for caching P2P traffic. We propose an algorithm to infer this
experimentation. Given the diverse and complex nature of          information with a quantifiable confidence.
P2P systems, proposing a simple, well-structured, design that
is extensible to support various P2P systems is indeed a
nontrivial systems research problem. Our running prototype        4.2 Inference Algorithm for Information Required by
currently serves BitTorrent and Gnutella traffic at the same       the Cache
time. To support a new P2P system, two things need to be          The P2P proxy cache requires specifying the requested byte
done: (i) installing the corresponding application identifier in   range such that it can correctly store and serve segments.
the P2P Traffic Identifier, and (ii) loading the appropriate        Some P2P protocols, most notably BitTorrent, do not provide
Parser, Composer, and optionally Analyzer modules of the          this information in the traffic itself. Rather, they indirectly
P2P Traffic Processor. Both can be done in run-time without        specify the byte range, relative to information held only by end
recompiling or impacting other parts of the system.               peers and not known to the cache. Thus, the cache needs to
   Finally, we should mention that the proxy cache design         employ an algorithm to infer the required information, which
in Fig. 1 does not require users of P2P systems to perform        we propose in this subsection. We describe our algorithm in
any special configurations of their client software, nor does it   the context of the BitTorrent protocol, which is currently the
need the developers of P2P systems to cooperate and modify        most-widely used P2P protocol [12]. Nonetheless, the ideas
any parts of their protocols: It caches and serves current P2P    and analysis of our algorithm are fairly general and could be
traffic as is. Our design, however, can further be simplified       applied to other P2P protocols, after collecting the appropriate
to support cases in which P2P users and/or developers may         statistics and making minor adjustments.
cooperate with ISPs for mutual benefits—a trend that has              Objects exchanged in BitTorrent are divided into equal-
recently seen some interests with projects such as P4P [29],      size pieces. A single piece is downloaded by issuing multiple
[30]. For example, if users were to configure their P2P clients    requests for byte ranges, i.e., segments, within the piece.
to use proxy caches in their ISPs listening on a specific port,    Thus, a piece is composed of multiple segments. In request
the P2P Traffic Identifier would be much simpler.                   messages, the requested byte range is specified by an offset
                                                                  within the piece and the number of bytes in the range. Peers
4     P2P T RAFFIC I DENTIFIER AND P ROCESSOR                     know the length of the piece of the object being downloaded,
                                                                  because it is included in the metafile (torrent file) held by
This section describes the P2P Traffic Identifier and Processor
                                                                  them. The cache needs the piece length for three reasons.
components of pCache, and presents a new algorithm to infer
                                                                  First, the piece length is needed to perform segment merging,
information needed by the cache to store and serve P2P traffic.
                                                                  which can reduce the overhead on the cache. For example,
                                                                  assuming the cache has received byte range [0, 16] KB of
4.1   Overview                                                    piece 1 and range [0, 8] KB of piece 2; without knowing the
The P2P Traffic Identifier determines whether a connection          precise piece length, the cache can not merge these two byte
belongs to any P2P system known to the cache. This is done        ranges into a continuous byte range. Second, the piece length is
by comparing a number of bytes from the connection stream         required to support cross-torrent caching of the same content,
against known P2P application signatures. The details are         as different torrent files can have different piece length. Third,
similar to the work in [15], and therefore omitted. We have       the piece length is needed for cross-system caching. For
implemented identifiers for BitTorrent and Gnutella.               example, a file cached for in BitTorrent networks may be
   To store and serve P2P traffic, the cache needs to per-         used to serve Gnutella users requesting the same file, while
form several functions beyond identifying the traffic. These       Gnutella protocol has no concept of piece length. One way
functions are provided by the P2P Traffic Processor, which         for the cache to obtain this information is to capture metafiles
has three components: Parser, Composer, and Analyzer. By          and match them with objects. This, however, is not always
5



possible, because peers frequently receive metafiles by various                                                40%
means/applications including emails and web downloads.




                                                                                 Fraction of Total Torrents
   Our algorithm infers the piece length of an object when the                                                30%
object is first seen by the cache. The algorithm monitors a
few segment requests, and then it makes an informed guess                                                     20%
about the piece length. To design the piece length inference
algorithm, we collected statistics on the distribution of the                                                 10%

piece length in actual BitTorrent objects. We wrote a script
                                                                                                              0%
to gather and analyze a large number of torrent files. We                                                            16KB   64KB 256KB 1MB       4MB   16MB
                                                                                                                                 Piece Length
collected more than 43,000 unique torrent files from popular
torrent websites on the Internet. Our analysis revealed that
                                                                     Fig. 2. Piece length distribution in BitTorrent.
more than 99.8% of the pieces have sizes that are power
of 2. The histogram of the piece length distribution is given
in Fig. 2, which we will use in the algorithm. Using these
results, we define the set of all possible piece lengths as           Fig. 2 to find the required number of samples n in order to
S = {2kmin , 2kmin +1 , . . . , 2kmax }, where 2kmin and 2kmax       achieve a given probability of correctness.
are the minimum and maximum possible piece lengths. We                  The assumption that samples are equally likely to be in
denote by L the actual piece length that we are trying to infer.     [0, L] is realistic, because the cache observes requests from
A request for a segment comes in the form {offset, size} within      many clients at the same time for an object, and the clients
this piece, which tells us that the piece length is at least as      are not synchronized. In few cases, however, there could be
large as offset + size. There are multiple requests within the       one client and that client may issue requests sequentially. This
piece. We denote each request by xi = offseti + sizei , where        depends on the actual implementation of the BitTorrent client
0 < xi ≤ L. The inference algorithm observes a set of n              software. To address this case, we use the following heuristic.
samples x1 , x2 , . . . , xn and returns a reliable guess for L.     We maintain the maximum value seen in the samples taken
   After observing the n-th sample, the algorithm computes           so far. If a new sample increases this maximum value, we
k such that 2k is the minimum power of two integer that is           reset the counters of the observed samples. In Sec. 7.5, we
greater than or equal to xi (i = 1, 2, . . . , n). For example, if   empirically validate the above inference algorithm and we
n = 5 and the xi values are 9, 6, 13, 4, 7, then the estimated       show that the simple heuristic improves its accuracy.
piece length would be 2k = 16. Our goal is to determine the             Handling Incorrect Inferences. In the evaluation section,
minimum number of samples n such that the probability that           we experimentally show that the accuracy of our inference
the estimated piece length is correct, i.e., P r(2k = L), exceeds    algorithm is about 99.7% when we use six segments in the
a given threshold, say 99%. We compute the probability that          inference, and it is almost 100% if we use 10 segments. The
the estimated piece length is correct as follows. We define E         very rare incorrect inferences are observed and handled by
as the event that the n samples are in the range [0, 2k ] and        the cache as follows. An incorrect inference is detected by
at least one of them does not fall in [0, 2k−1 ], where 2k is        observing a request exceeding the inferred piece boundary,
the estimated value returned by the algorithm. Assuming that         because we use the minimum possible estimate for the piece
each sample is equally likely to be anywhere within the range        length. This may happen at the beginning of a download
[0, L], we have:                                                     session for an object that was never seen by the cache before.
                                                                     The cache would have likely stored very few segments of this
                                      P r(2k = L)                    object and most of them are still in the buffer, not flushed to
P r(2k = L | E) = P r(E | 2k = L)
                                         P r(E)                      the disk yet. Thus, all the cache needs to do is to re-adjust a
                                                  P r(2k = L)        few pointers in the memory with the correct piece length. In
 =[1 − P r(all n samples in [0, 2k−1 ] | 2k = L)]                    the worst case, a few disk blocks will also need to be moved
                                                     P r(E)
                                                                     to other disk locations. An even simpler solution is possible:
        1 n P r(2k = L)
 =[1 − ( ) ]              .                                (1)       discard from the cache these few segments, which has a
        2        P r(E)                                              negligible cost. Finally, we note that the cache starts storing
In the above equation, P r(2k = L) can be directly known             segments after the inference algorithm returns an estimate, but
from the empirically-derived histogram in Fig. 2. Furthermore,       it does not immediately serve these segments to other clients
P r(E) is given by:                                                  until a sufficient number of segments (e.g., 100) have been
                                                                     downloaded to make the probability of incorrect inference
P r(E) =         P r(l = L)P r(E | l = L)                            practically 0. Therefore, the consequence of a rare incorrect
           l∈S                                                       inference is a slight overhead in the storage system and for a
                     1                     1      1                  temporary period (till the correct piece length is computed),
 =P r(2k = L)(1 − ( )n ) + P r(2k+1 = L)(( )n − ( )n )+
                     2                     2      4                  and no wrong data is served to the clients at anytime.
               kmax        1 kmax −k n    1 kmax −k+1 n
  · · · + P r(2     = L)((( )       ) − (( )         ) ).
                           2              2                          5   S TORAGE M ANAGEMENT
                                                      (2)
                                                                     In this section, we elaborate on the need for a new storage
Our algorithm solves Eqs. (2) and (1) using the histogram in         system, in addition to what we mentioned in Sec. 2. Then, we
6


                                                                                            Segment Lookup Table                  Object #0
present the design of the proposed storage system.
                                                                       ID Hash         Segments                                 Segment #0
                                                                                 (disjoint and sorted)   Offset Len RefCnt Buffer* Block*
                                                                       Object #0 Segments*
5.1   The Need for a New Storage System                                                                                        Segment #1
                                                                       Object #1 Segments*
                                                                                                         Offset Len RefCnt Buffer* Block*
In P2P systems, a receiving peer requests an object from
                                                                                                                               Segment#n
multiple sending peers in the form of segments, where a
                                                                                                         Offset Len RefCnt Buffer* Block*
segment is a range of bytes. Object segmentation is protocol
dependent and even implementation dependent. Moreover, seg-
ment sizes could vary based on the number and type of senders          Segment* Size     Data

in the download session, as in the case of Gnutella. Therefore,
successive requests of the same object can be composed of              Segment* Size     Data
segments with different sizes. For example, a request comes                  Memory Page Buffers                     Disk Blocks
to the cache for the byte range [128—256 KB] as one segment,
which is then stored locally in the cache for future requests.    Fig. 3. The proposed storage management system.
However, a later request may come for the byte range [0—512
KB] of the same object and again as one segment. The cache
should be able to identify and serve the cached portion of        yield very close performance to the kernel-space ones. A user-
the second request. Furthermore, previous work [11] showed        space storage management system can be built on top of a
that to improve the cache performance, objects should be          large file created by the file system of the underlying operating
incrementally admitted in the cache because objects in P2P        system, or it can be built directly on a raw disk partition. We
systems are fairly large, their popularity follows a flattened-    implement and analyze two versions of the proposed storage
head model [1], [11], and they may not be even downloaded in      management system: on top of the ext2 Linux file system
their entirety since users may abort the download sessions [1].   (denoted by pCache/Fs) and on a raw disk partition (denoted
This means that the cache will usually store random fragments     by pCache/Raw).
of objects, not complete contiguous objects, and the cache           As shown in Fig. 3, the proposed storage system maintains
should be able to serve these partial objects.                    two structures in the memory: metadata and page buffers.
   While web proxy caches share some characteristics with         The metadata is a two-level lookup table designed to enable
P2P proxy caches, their storage systems can yield poor per-       efficient segment lookups. The first level is a hash table keyed
formance for P2P proxy caches, as we show in Sec. 7.3.            on object IDs; collisions are resolved using common chaining
The most important reason is that most web proxy caches           techniques. Every entry points to the second level of the table,
consider objects with unique IDs, such as URLs, in their          which is a set of cached segments belonging to the same
entirety. P2P applications almost never request entire objects,   object. Every segment entry consists of a few fields: Offset
instead, segments of objects are exchanged among peers. A         indicates the absolute segment location within the object,
simple treatment of reusing web proxy caches for P2P traffic       Len represents the number of bytes in this segment, RefCnt
is to define a hash function on both object ID and segment         keeps track of how many connections are currently using this
range, and consider segments as independent entities. This        segment, Buffer points to the allocated page buffers, and Block
simple approach, however, has two major flaws. First, the hash     points to the assigned disk blocks. Each disk is divided into
function destroys the correlation among segments belonging to     fixed-size disk blocks, which are the smallest units of disk
the same objects. Second, this approach cannot support partial    operations. Therefore, block size is a system parameter that
hits, because different segment ranges are hashed to different,   may affect caching performance: larger block sizes are more
unrelated, values.                                                vulnerable to segmentations, while smaller block sizes may
   The proposed storage system supports efficient lookups          lead to high space overhead. RefCnt is used to prevent evicting
for partial hits. It also improves the scalability of the cache   a buffer page if there are connections currently using it.
by reducing the number of I/O operations. This is done               Notice that segments do not arrive to the cache sequentially,
by preserving the correlation among segments of the same          and not all segments of an object will be stored in the
objects, and dynamically merging segments. To the best of our     cache. Thus, a naive contiguous allocation of all segments
knowledge, there are no other storage systems proposed in the     will waste memory, and will not efficiently find partial hits.
literature for P2P proxy caches, even though the majority of      We implement the set of cached segments as a balanced (red-
the Internet traffic comes from P2P systems [1]–[3].               black) binary tree, which is sorted based on the Offset field.
                                                                  Using this structure, partial hits can be found in at most
                                                                  O(log S) steps, where S is the number of segments in the
5.2   The Proposed Storage System                                 object. This is done by searching on the offset field. Segment
A simplified view of the proposed storage system is shown in       insertions and deletions are also done in logarithmic number
Fig. 3. We implement the proposed system in the user space,       of steps. Since this structure supports partial hits, the cached
so that it is not tightly coupled with the kernels of operating   data is never obtained from the P2P network again, and only
systems, and can be easily ported to different kernel versions,   mutually disjoint segments are stored in the cache.
operating system distributions, and even to various operating        The second part of the in-memory structures is the page
systems. As indicated by [24], user-space storage systems         buffers. Page buffers are used to reduce disk I/O operations as
7



well as to perform segment merging. As shown in Fig. 3, we            While partial transparency is sufficient for most web proxy
propose to use multiple sizes of page buffers, because requests    caches, it is not enough and will not work for P2P proxy
come to the cache from different P2P systems in variable sizes.    caches. This is because external peers may not respond to
We also pre-allocate these pages in memory for efficiency.          requests coming from the IP address of the cache, since the
Dynamic memory allocation may not be suitable for proxy            cache is not part of the P2P network and it does not participate
caches, since it imposes processing overheads. We maintain         in the P2P protocol. Implementing many P2P protocols in
unoccupied pages of the same size in the same free-page list. If   the cache to make it participate in P2P networks is a tedious
peers request segments that are in the buffers, they are served    task. More important, participation in P2P networks imposes
from memory and no disk I/O operations are issued. If the          significant overhead on the cache itself, because it will have
requested segments are on the disk, they need to be swapped        to process protocol messages and potentially serve too many
in some free memory buffers. When all free buffers are used        objects to external peers. For example, if the cache was to
up, the least popular data in some of the buffers are swapped      participate in the tit-for-tat BitTorrent network, it would have
out to the disk if this data has been modified since it was         to upload data to other external peers proportional to data
brought in memory, and it is overwritten otherwise.                downloaded by the cache on behalf of all internal BitTorrent
                                                                   peers. Furthermore, some P2P systems require registration and
                                                                   authentication steps that must be done by users, and the cache
6     T RANSPARENT          C ONNECTION           M ANAGE -        cannot do these steps.
MENT                                                                  Supporting full transparency in P2P proxy caches is not
This section starts by clarifying the importance and com-          trivial, because it requires the cache to process and modify
plexity of providing full transparency in P2P proxy caches.        packets destined to remote peers, i.e., its network stack accepts
Then, it presents our proposed method for splicing non-P2P         packets with non-local IP addresses and port numbers.
connections in order to reduce the processing and memory
overhead on the cache. We implement the Connection Manager         6.2 Efficient Splicing of non-P2P Connections
in Linux 2.6. The details on the implementation of connection      Since P2P systems use dynamic ports, the proxy process may
redirection and the operation of the Connection Manager [14]       initially intercept some connections that do not belong to
are not presented due to the space limitations.                    P2P systems. This can only be discovered after inspecting a
                                                                   few packets using the P2P Traffic Identification module. Each
6.1   Importance of Full Transparency                              intercepted connection is split into a pair of connections, and
                                                                   all packets have to go through the proxy process. This imposes
Based on our experience of developing a running caching
                                                                   overhead on the proxy cache and may increase the end-to-end
system, we argue that full transparency is important in P2P
                                                                   delay of the connections. To reduce this overhead, we propose
proxy caches. This is because non-transparent proxies may
                                                                   to splice each pair of non-P2P connections using TCP splicing
not take full advantage of the deployed caches, since they
                                                                   techniques [32], which have been used in layer-7 switching.
require users to manually configure their applications. This
                                                                   For spliced connections, the sockets in the proxy process are
is even worse in the P2P traffic caching case due to the
                                                                   closed and packets are relayed in the kernel stack instead
existence of multiple P2P systems, where each has many
                                                                   of passing them up to the proxy process (in the application
different client implementations. Transparent proxies, on the
                                                                   layer). Since packets do not traverse through the application
other hand, actively intercept connections between internal and
                                                                   layer, TCP splicing reduces overhead on maintaining a large
external hosts, and they do not require error-prone manual
                                                                   number of open sockets as well as forwarding threads. Several
configurations of the software clients.
                                                                   implementation details had to be addressed. For example,
   Transparency in many web caches, e.g., Squid [31], is
                                                                   since different connections start from different initial sequence
achieved as follows. The cache intercepts the TCP connec-
                                                                   numbers, the sequence numbers of packets over the spliced
tion between a local client and a remote web server. This
                                                                   TCP connections need to be properly changed before being
interception is done by connection redirection, which forwards
                                                                   forwarded. This is done by keeping track of the sequence
connection setup packets traversing through the gateway router
                                                                   number difference between two spliced TCP connections, and
to a proxy process. The proxy process accepts this connection
                                                                   updating sequence numbers before forwarding packets.
and serves requests on it. This may require the proxy to create
a connection with the remote server. The web cache uses its
IP address when communicating with the web server. Thus,           7    E XPERIMENTAL E VALUATION OF P C ACHE
the server can detect the existence of the cache and needs         In this section, we conduct extensive experiments to evaluate
to communicate with it. We call this type of caches partially      all aspects of the proposed pCache system with real P2P
transparent, because the remote server is aware of the cache. In   traffic. We start our evaluation, in Sec. 7.1, by validating
contrast, we refer to proxies that do not reveal their existence   that our implementation of pCache is fully transparent, and
as fully-transparent proxies. When a fully-transparent proxy       it serves actual non-corrupted data to clients as well as
communicates with the internal host, it uses the IP address        saves bandwidth for ISPs. In Sec. 7.2, we show that pCache
and port number of the external host, and similarly when it        improves the performance of P2P clients without reducing
communicates with the external host it uses the information        the connectivity in P2P networks. In Sec. 7.3, we rigorously
of the internal host.                                              analyze the performance of the proposed storage management
8


                                                                                                                       3
system and show that it outperforms others. In Sec. 7.4, we                                                       10




                                                                                  Number of Scheduled Downloads
                                                                                                                           Total Number of Download: 2729
demonstrate that pCache is scalable and could easily serve
most customer ASes in the Internet using a commodity PC.                                                               2
                                                                                                                  10
In Sec. 7.5, we validate the analysis of the inference algorithm
and show that its estimates are correct in more than 99.7% of
the cases with low overhead. Finally, in Sec. 7.6, we show                                                        10
                                                                                                                       1


that the proposed connection splicing scheme improves the
performance of the cache.
                                                                                                                       0
                                                                                                                  10 0
   The setup of the testbed used in the experiments consists                                                        10       10
                                                                                                                                  1
                                                                                                                                                10
                                                                                                                                                     2       3
                                                                                                                                                            10
of two separate IP subnets. Traffic from client machines in                                                                        Object Rank

subnet 1 goes through a Linux router on which we install
                                                                     Fig. 5. Download sessions per object.
pCache. Client machines in subnet 2 are directly attached
to the campus network. All internal links in the testbed
have 100 Mb/s bandwidth. All machines are configured with
static IP addresses, and appropriate route entries are added         in BitTorrent as follows. We developed a crawler to contact
in the campus gateway router in order to forward traffic to           popular torrent search engines such as TorrentSpy, MiniNova,
subnet 1. Our university normally filters traffic from some            and IsoHunt. We collected numerous torrent files. Each torrent
P2P applications and shapes traffic from others. Machines in          file contains information about one object; an object can
our subnets were allowed to bypass these filters and shapers.         be a single file or multiple files grouped together. We also
pCache is running on a machine with an Intel Core 2 Duo              contacted trackers to collect the total number of downloads
1.86 GHz processor, 1 GB RAM, and two hard drives: one               that each object received, which indicates the popularity of
for the operating system and another for the storage system          the object. We randomly took 500 sample objects from all the
of the cache. The operating system is Linux 2.6. In our              torrent files collected by our crawler, which were downloaded
experiments, we concurrently run 10 P2P software clients             111,620 times in total. The size distribution of the chosen
in each subnet. We could not deploy more clients because             500 objects ranges from several hundred kilobytes to a few
of the excessive volume of traffic they generate, which is            hundred megabytes. To conduct experiments within a reason-
problematic in a university setting (during our experiments, we      able amount of time, we scheduled about 2,700 download
were notified by the network administrator that our BitTorrent        sessions. The number of download sessions assigned to an
clients exchanged more than 300 GB in one day!).                     object is proportional to its popularity. Fig. 5 shows the
                                                                     distribution of the scheduled download sessions per objects.
                                                                     This figure indicates that our empirical popularity data follows
7.1   Validation of pCache and ISP Benefits                           Mandelbrot-Zipf distribution, which is a generalized form of
pCache is a fairly complex software system with many com-            Zipf-like distributions with an extra parameter to capture the
ponents. The experiments in this section are designed to             flattened head nature of the popularity distribution observed
show that the whole system actually works. This verification          near the most popular objects in our popularity data. The
shows that: (i) P2P connections are identified and transparently      popularity distribution collect by our crawler is similar to those
split, (ii) non-P2P connections are tunneled through the cache       observed in previous measurement studies [1], [8].
and are not affected by its existence, (iii) segment lookup,            After distributing the 2,700 scheduled downloads among the
buffering, storing, and merging all work properly and do not         500 objects, we randomly shuffled them such that downloads
corrupt data, (iv) the piece-length inference algorithm yields       for the same object do not come back to back with each
correct estimates, and (v) data is successfully obtained from        other, but rather they are interleaved with downloads for other
external peers, assembled with locally cached data, and then         objects. We then equally divided the 2,700 downloads into
served to internal peers in the appropriate message format.          ten lists. Each of these ten lists, along with the corresponding
All of these are shown for real BitTorrent traffic with many          torrent files, was given to a pair of CTorrent clients to execute:
objects that have different popularities. The experiments also       one in subnet 1 (behind the cache) and another in subnet 2. To
compare the performance of P2P clients with and without the          conduct fair comparisons between the performance of clients
proxy cache.                                                         with and without the cache, we made each pair of clients start
   We modified an open-source BitTorrent client called CTor-          a scheduled download session at the same time and with the
rent. CTorrent is a light-weight (command-line) C++ program;         same information. Specifically, only one of the two clients
we compiled it on both Linux and Windows. We did not                 contacted the BitTorrent tracker to obtain the addresses of
configure CTorrent to contact or be aware of the cache in             seeders and leechers of the object that need to be downloaded.
anyway. We deployed 20 instances of the modified CTorrent             This information was shared with the other client. Also, a new
client on the four client machines in the testbed. Ten clients (in   download session was not started unless the other client either
subnet 1) were behind the proxy cache and ten others were            finished or aborted its current session. We let the 20 clients
directly connected to the campus network. All clients were           run for two days, collecting detailed statistics from each client
controlled by a script to coordinate the download of many            every 20 seconds.
objects. The objects and the number of downloads per object             Several sanity checks were performed and passed. First,
were carefully chosen to reflect the actual relative popularity       all downloaded objects passed the BitTorrent checksum test.
9



          1                                               1                                                     1

         0.8                                             0.8                                                   0.8

         0.6                                             0.6                                                   0.6
   CDF




                                                   CDF




                                                                                                         CDF
         0.4                                             0.4                                                   0.4

         0.2                                             0.2                                                   0.2
                                Without pCache                                     Without pCache                                          Without pCache
                                With pCache                                        With pCache                                             With pCache
          0                                               0                                                     0
           0    100       200       300      400           0   10        20       30      40        50           0      5     10     15     20     25       30
                  Download Speed (KB)                                 Upload Speed (KB)                                 Max Number of Connected Peers

               (a) Download Speed                                   (b) Upload Speed                                 (c) Number of Connected Peers

Fig. 4. The impact of the cache on the performance of clients.


Second, the total number of download sessions that completed                  clients, in addition to benefiting the ISP deploying the cache
was almost the same for the clients with and without the                      by saving WAN traffic. Furthermore, the increased download
cache. Third, clients behind the cache were regular desktops                  speed did not require clients behind the cache to significantly
participating in the campus network, and other non-P2P net-                   increase their upload speeds, as shown in Fig. 4(b). This is also
work applications were running on them. These applications                    beneficial for both clients and the ISP deploying the cache,
included web browsers, user authentication modules using cen-                 while it does not hurt the global BitTorrent network, as clients
tralized database (LDAP), and file system backup applications.                 behind the cache are still uploading to other external peers.
All applications worked fine through the cache. Finally, a                     The difference between the upload and download speeds can
significant portion of the total traffic was served from the                    be attributed mostly to the contributions of the cache. Fig. 4(c)
cache; up to 90%. This means that the cache identified, stored,                demonstrates that the presence of the cache did not reduce the
and served P2P traffic. This is all happened transparently                     connectivity of clients behind the cache; they have roughly
without changing the P2P protocol or configuring the client                    the same number of connected peers. This is important for the
software. Also, since this is BitTorrent traffic, the piece-length             local clients as well as the whole network, because reduced
inference algorithm behaved as expected, and our cache did                    connectivity could lead to decreased availability of peers and
not interfere with the incentive scheme of BitTorrent. We                     the content stored on them. Finally, we notice that the cache
note that the unusually high 90% byte hit rate seen in our                    may add some latency in the beginning of the download
experiments is due to the limited number of clients that we                   session. This latency was in the order of milliseconds in our
have; the clients did not actually generate enough traffic to                  experiments. This small latency is really negligible in P2P
fill the whole storage capacity of the cache. In full-scale                    systems in which sessions last for minutes if not hours.
deployment of the cache, the byte hit rate is expected to be
smaller, in the range of 30 to 60% [11], which would still
yield significant savings in bandwidth given the huge volume                   7.3 Performance of the Storage System
of P2P traffic.                                                                Experimental Setup. We compare the performance of the
                                                                              proposed storage management system against the storage
                                                                              system used in the widely-deployed Squid proxy cache [31],
7.2       Performance of P2P Clients                                          after modifying it to serve partial hits needed for P2P traffic.
Next, we analyze the performance of the clients and the impact                This is done to answer the question: “What happens if we were
on the P2P network. We are interested in the download speed,                  to use the already-deployed web caches for P2P traffic?” To the
upload speed, and number of peers to which each of our                        best of our knowledge, we are not aware of any other storage
clients is connected. These statistics were collected every 20                systems designed for P2P proxy caches, and as described
seconds from all 20 clients. The download (upload) speed                      in Sec. 2, storage systems proposed in [21]–[24] for web
was measured as the number of bytes downloaded (uploaded)                     proxy caches and in [25]–[27] for multimedia caches are not
during the past period divided by the length of that period.                  readily applicable to P2P proxy caches. Therefore, we could
Then, the average download (upload) speed for each completed                  not conduct a meaningful comparison with them.
session was computed. For the number of connected peers,                         We implemented the Squid system, which organizes files
we took the maximum number of peers that each of our                          into a two-level directory structure: 16 directories in the first
clients was able to connect to during the sessions. We then                   level and 256 subdirectories in every first-level directory. This
used the maximum among all clients (in the same subnet) to                    is done to reduce the number of files in each directory in order
compare the connectivity of clients with and without the cache.               to accelerate the lookup process and to reduce the number
The results are summarized in Fig. 4. Fig. 4(a) shows that                    of i-nodes touched during the lookup. We implemented two
clients behind the cache achieved much higher download speed                  versions of our proposed storage system, which are denoted
than other clients. This is expected as a portion of the traffic               by pCache/Fs and pCache/Raw in the plots. pCache/Fs is
comes from the cache. Thus, the cache would benefit the P2P                    implemented on top of the ext2 Linux file system by opening
10



a large file that is never closed. To write a segment, we move       system with a large number of requests. We also needed the
the file pointer to the appropriate offset and then write that       stream of requests to be reproducible such that comparisons
segment to the file. The offset is determined by our disk block      among different storage systems are fair. To satisfy these two
allocation algorithm. Reading a segment is done in a similar        requirements, we collected traces from an operational P2P
way. The actual writing/reading to the disk is performed by         network instead of just generating synthetic traces. Notice that
the ext2 file system, which might perform disk buffering and         running the cache with real P2P clients (as in the previous
pre-fetching. pCache/Raw is implemented on a raw partition          section) would not give us enough traffic to stress the storage
and it has complete control of reading/writing blocks from/to       system, nor would it create identical situations across repeated
the disk. pCache/Raw uses direct disk operations.                   experiments with different storage systems because of the high
    In addition to the Squid storage system, we also imple-         dynamics in P2P systems. To collect this trace, we modified an
mented a storage system that uses a multi-directory structure,      open-source Gnutella client to run in super-peer mode and to
where segments of the same object are grouped together under        simultaneously connect to up to 500 other super-peers in the
one directory. We denote this storage system as Multi-dir. This     network (the default number of connections is up to only 16).
is similar to the previous proposals on clustering correlated       Gnutella was chosen because it has a two-tier structure, where
web objects together for better disk performance, such as in        ordinary peers connect to super peers and queries/replies are
[21]. We extended Multi-dir for P2P traffic to maintain the          forwarded among super peers for several hops. This enabled us
correlation among segments of the same object. Finally, we          to passively monitor the network without injecting traffic. Our
also implemented the Cyclic Object Storage System (COSS)            monitoring super peer ran continuously for several months,
which has recently been proposed to improve the performance         and because of its high connectivity it was able to record
of the Squid proxy cache [33, Sec. 8.4]. COSS sequentially          a large portion of the query and reply messages exchanged
stores web objects in a large file, and wraps around when it         in the Gnutella network. It observed query/reply messages in
reaches the end of the file. COSS overwrites the oldest objects      thousands of ASes across the globe, accounting to more than
once the disk space is used up.                                     6,000 tera bytes of P2P traffic. We processed the collected data
    We isolate the storage component from the rest of the           to separate object requests coming from individual ASes. We
pCache system. All other components (including the traffic           used the IP addresses of the receiving peers and an IP-to-AS
identification, transparent proxy, and connection splicing mod-      mapping tool in this separation. We chose one large AS (AS
ules) are disabled to avoid any interactions with them. We also     9406) with a significant amount of P2P traffic. The created
avoid interactions with the underlying operating system by          trace contains: timestamp of the request, ID of the requested
installing two hard drives: one for the operating system and        object, size of the object, and the IP address of the receiver.
the other is dedicated to the storage system of pCache. No          Some of these traces were used in our previous work [11],
processes, other than pCache, have access to the second hard        and are available online [14].
drive. The capacity of the second hard drive is 230 GB, and            The trace collected from Gnutella provides realistic object
it is formatted into 4 KB blocks. The replacement policy used       sizes, relative popularities, and temporal correlation among
in the experiments is segment-based LRU, where the least-           requests in the same AS. However, it has a limitation: in-
recently used segment is evicted from the cache.                    formation about how objects are segmented and when exactly
    We subject a specific storage system (e.g., pCache/Fs) to a      each segment is requested is not known. We could not obtain
long trace of 1.5 million segment requests; the trace collection    this information because it is held by the communicating peers
and processing are described below. During the execution of         and transferred directly between them without going through
the trace, we periodically collect statistics on the low-level      super peers. To mitigate this limitation, we divided objects
operations performed on the disk. These statistics include the      into segments with typical sizes, which we can know either
total number of: read operations, write operations, and head        from the protocol specifications or from analyzing a small
movements (seek length in sectors). We also measure the trace       sample of files. In addition, we generated the request times for
completion time, which is the total time it takes the storage       segments as follows. The timestamp in the trace marks the start
system to process all requests in the trace. These low-level disk   of downloading an object. A completion time for this object is
statistics are collected using the blktrace tool, which is an       randomly generated to represent peers with different network
optional module in Linux kernels. Then, the whole process is        connections and the dynamic conditions of the P2P network.
repeated for a different storage system, but with the same trace    The completion time can range from minutes to hours. Then,
of segment requests. During these experiments, we fix the total      the download time of each segment of the object is randomly
size of memory buffers at 0.5 GB. In addition, because they are     scheduled between the start and end times of downloading
built on top of the ext2 file system, Squid and pCache/Fs have       that object. This random scheduling is not unrealistic, because
access to the disk cache internally maintained by the operating     this is actually what is being done in common P2P systems
system. Due to the intensive I/O nature of these experiments,       such as BitTorrent and Gnutella. Notice that using this random
Linux may substantially increase the size of the disk cache         scheduling, requests for segments from different objects will
by stealing memory from other parts of the pCache system,           be interleaved, which is also realistic. We sort all requests for
which can degrade the performance of the whole system. To           segments based on their scheduled times and take the first
solve this problem, we modified the Linux kernel to limit the        1.6 million requests to evaluate the storage systems. With this
size of the disk cache to a maximum of 0.5 GB.                      number of requests, some of our experiments took two days
    P2P Traffic Traces. We needed to stress the storage              to finish.
11


                                           6                                                                           14
                                    x 10                                                                        x 10
                              3.5                                                                          10                                                                          30
                                               Squid                                                                        Squid                                                           Squid
                                                                                                                                                                                            Multi-dir
  Number of Read Operations

                               3               Multi-dir                                                                    Multi-dir
                                               pCache/Fs                                                   8                pCache/Fs                                                  25   pCache/Fs




                                                                                                                                                                Completion Time (hr)
                                                                                   Seek Length (sectors)
                                               pCache/Raw                                                                   pCache/Raw                                                      pCache/Raw
                              2.5
                                                                                                                                                                                       20   COSS
                               2                                                                           6
                                                                                                                                                                                       15
                              1.5                                                                          4
                                                                                                                                                                                       10
                               1
                                                                                                           2                                                                           5
                              0.5

                               0                                                                           0                                                                           0
                                0                   5           10       15                                 0                    5           10       15                                0        5           10       15
                                                    Number of Requests    x 10
                                                                               5                                                 Number of Requests    x 10
                                                                                                                                                            5                                    Number of Requests    x 10
                                                                                                                                                                                                                            5


                                                 (a) Read Operations                                                          (b) Head Movements                                              (c) Completion Time

Fig. 6. Comparing the performance of the proposed storage management system on top of a raw disk partition
(pCache/Raw) and on top of a large file (pCache/Fs) versus the Squid and other storage systems.


   Main Results. Some of our results are presented in Fig. 6                                                                             to our storage system. This is because Squid uses a hash
for a segment size of 0.5 MB. Results for other segment sizes                                                                            function to determine the subdirectory of each segment, which
are similar. We notice that the disk I/O operations resulted                                                                             destroys the locality among segments of the same object.
by the COSS storage system are not reported in Figs. 6(a)                                                                                This leads to many head jumps between subdirectory to serve
and 6(b). This is because COSS storage system has a built-in                                                                             segments of the same object. In contrast, our storage system
replacement policy that leads to fewer cache hits, thus fewer                                                                            performs segment merging in order to store segments of the
read operations and many more write operations, than other                                                                               same object near to each other on the disk, which reduces the
storage systems. Since the COSS storage system results in                                                                                number of head movements.
different disk access pattern than that of other storage sys-                                                                               Fig. 6(b) also reveals that Multi-dir can reduce the number
tems, we cannot conduct a meaningful low-level comparison                                                                                of head movements compared to the Squid storage system.
between them in Figs. 6(a) and 6(b). We will discuss more                                                                                This is because the Multi-dir storage system exploits the
about the replacement policy of the COSS storage system                                                                                  correlation among segments of the same P2P object by storing
in a moment. Fig. 6 demonstrates that the proposed storage                                                                               these segments in one directory. This segment placement struc-
system is much more efficient in handling the P2P traffic                                                                                  ture enables the underlying file system to cluster correlated
than other storage systems, including Squid, Multi-dir, and                                                                              segments closer as most file systems cluster files in the same
COSS. The efficiency is apparent in all aspects of the disk                                                                               subdirectory together. Nevertheless, the overhead of creating
I/O operations: The proposed system issues a smaller number                                                                              many files/directories and maintaining their i-nodes was non-
of read and write operations and it requires a much smaller                                                                              trivial. This can be observed in Fig. 6(a) where the average
number of disk head movements. Because of the efficiency in                                                                               number of read operations of Multi-dir is slightly smaller than
each element, the total time required to complete the whole                                                                              that of pCache/Raw when the number of cached objects is
trace (Fig. 6(c)) under the proposed system is less than 5                                                                               smaller (in the warm-up period), but rapidly increases once
hours, while it is 25 hours under the Squid storage system.                                                                              more objects are cached. We note that the Multi-dir results in
That is, the average service time per request using pCache/Fs                                                                            fewer read operations than pCache/Raw in the warm-up period,
or pCache/Raw is almost 1/5th of that time using Squid. This                                                                             because of the Linux disk buffer and prefectch mechanism.
experiment also shows that COSS and Multi-dir improves the                                                                               This advantage of Multi-dir quickly diminishes when the
performance of the Squid system, but they have an inferior                                                                               number of i-nodes increases.
performance compared to our pCache storage system. Notice                                                                                   We observe that COSS performs better than Squid but
also that, unlike the case for Squid, Multi-dir, and COSS, the                                                                           worse than Multi-dir in Fig. 6(c). The main cause of its bad
average time per request of our proposed storage system does                                                                             performance is because COSS uses the large file to mimic an
not rapidly increase with number of requests. Therefore, the                                                                             LRU queue for object replacement, which requires it to write
proposed storage system can support more concurrent requests,                                                                            a segment to the disk whenever there is a cache hit. This not
and it is more scalable than other storage systems, including                                                                            only increases the number of write operations but also reduces
the widely-deployed Squid storage system.                                                                                                the effective disk utilization, because a segment may be stored
   Because it is optimized for web traffic, Squid anticipates a                                                                           on the disk several times although only one of these copies
large number of small web objects to be stored in the cache,                                                                             (the most recent one) is accessible. We plot the effective disk
which justifies the creation of many subdirectories to reduce                                                                             utilization of all considered storage systems in Fig. 7, where
the search time. Objects in P2P systems, on the other hand,                                                                              storage systems except COSS employ a segment-based LRU
have various sizes and can be much larger. Thus the cache                                                                                with high/low-watermarks at 90% and 80%, respectively. In
could store a smaller number of P2P objects. This means                                                                                  this figure, we observe that the disk utilization for COSS is
that maintaining many subdirectories may actually add more                                                                               less than 50% of that for other storage systems. Since COSS
overhead to the storage system. In addition, as Fig. 6(b) shows,                                                                         tightly couples the object replacement policy with the storage
Squid requires a larger number of head movements compared                                                                                system, it is not desirable for P2P proxy caches, because
12



                                100                                    and we stress the storage system by continuously submitting
                                                                       requests. We measure the average throughput (in Mbps) of the
         Disk Utilization (%)
                                80                                     storage system every eight minutes, and we plot the average
                                                                       results in Fig. 8. We ignore the first one hour (warm-up
                                60
                                                                       period), because the cache was empty and few data swapping
                                40
                                                                       operations between memory and disk occur. The figure shows
                                                       Squid           that in the steady state an average throughput of more than
                                                       Multi-dir
                                20                     pCache/Fs       300 Mbps can easily be provided by our cache running on
                                                       pCache/Raw      a commodity PC. This kind of throughput is probably more
                                                       COSS
                                 0
                                  0   5           10         15
                                                                       than enough for the majority of customer ASes such as
                                      Number of Requests      x 10
                                                                   5   universities and small ISPs, because the total capacities of their
                                                                       Internet access links are typically smaller than the maximum
Fig. 7. Disk utilization for various storage systems.                  throughput that can be achieved by pCache. For large ISPs
                                                                       with Gbps links, a high-end server with high-speed disk array
                                                                       could be used. Notice that our pCache code is not highly
their performance may benefit from replacement policies that            optimized for performance. Notice also that the 300 Mbps
capitalize on the unique characteristics of P2P traffic, such as        is the throughput for P2P traffic only, not the whole Internet
the ones observed in [6], [8].                                         traffic, and it represents a worst-case performance because the
   Additional Results and Comments. We analyzed the
                                                                       disk is continuously stressed.
performance of pCache/Fs and pCache/Raw, and compared
them against each other. As mentioned before, pre-fetching
by the Linux file system may negatively impact the perfor-              7.5 Evaluation of the Inference Algorithm
mance of pCache/Fs. We verified this by disabling this pre-             We empirically evaluate the algorithm proposed in Sec. 4 for
fetching using the hdparm utility. By comparing pCache/Fs              inferring the piece length in BitTorrent traffic. We deployed ten
versus pCache/Raw using variable segment sizes, we found               CTorrent clients on the machines behind the cache. Each client
that pCache/Fs outperforms pCache/Raw when the segment                 was given a few thousands torrent files. For each torrent file,
sizes are small (16KB or smaller). pCache/Raw, however,                the client contacted a BitTorrent tracker to get potential peers.
is more efficient for larger segment sizes. This is because             The client re-connected to the tracker if the number of peers
when segment sizes are small, the buffering performed by               dropped below 10 to obtain more peers. After knowing the
the Linux file system helps pCache/Fs by grouping several               peers, the client started issuing requests and receiving traffic.
small read/write operations together. pCache/Raw bypasses              The cache logged all requests during all download sessions.
this buffering and uses direct disk operations. The benefit of          Many of the sessions did not have any traffic exchanged,
buffering diminishes as the segment size increases. Therefore,         because there were not enough active peers trading pieces of
we recommend using pCache/Fs if pCache will mostly serve               those objects anymore, which is normal in BitTorrent. These
P2P traffic with small segments such as BitTorrent. If the              sessions were dropped after a timeout period. We ran the
traffic is dominated by larger segments, as in the case of              experiment for several days. The cache collected information
Gnutella, pCache/Raw is recommended.                                   from more than 2,100 download sessions. We applied the
                                                                       inference algorithm on the logs and we compared the estimated
7.4   Scalability of pCache                                            piece lengths against the actual ones (known from the torrent
To analyze the scalability of pCache, ideally we should deploy         files).
thousands of P2P clients and configure them to incrementally               The results of this experiment are presented in Fig. 9.
request objects from different P2P networks. While this would          The x-axis shows the number of samples taken to infer
test the whole system, it was not possible to conduct in               the piece length, and the y-axis shows the corresponding
our university setting, because it would have consumed too             accuracy achieved. The accuracy is computed as the number
much bandwidth. Another less ideal option is to create many            of correct inferences over the total number of estimations,
clients and connect them in local P2P networks. However,               which is 2,100. We plot the theoretical results computed
emulating the high dynamic behavior of realistic P2P networks          from Eqs. (1) and (2) and the actual results achieved by
is not straightforward, and will make our results questionable         our algorithm with and without the improvement heuristic.
no matter what we do. Instead, we focus on the scalability             The figure shows that the performance of our basic algorithm
of the slowest part, the bottleneck, of pCache, which is the           using actual traffic is very close to the theoretical results,
storage system according to previous works in the literature           which validates our analysis in Sec. 4. In addition, the simple
such as [34]. All other components of pCache perform simple            heuristic improved the inference algorithm significantly. The
operations on data stored in the main memory, which is orders          improved algorithm infers the piece length correctly in about
of magnitude faster than the disk. We acknowledge that this            99.7% of the cases, which is done by using only 6 samples
is only a partial scalability test, but we believe it is fairly        on average. From further analysis, we found that the samples
representative.                                                        used by the algorithm correspond to less than 3% of each
   We use the large trace of 1.5 million requests used in the          object on average, which shows how fast the inference is done
previous section. We run the cache for about seven hours,              in terms of the object’s size. Keeping this portion small is
Design and evaluation of a proxy cache for
Design and evaluation of a proxy cache for

Mais conteúdo relacionado

Mais procurados

LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA csandit
 
Implementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerImplementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerAbdelrahman Hosny
 
An Efficient approach of Integrated file Replication and Consistency Maintena...
An Efficient approach of Integrated file Replication and Consistency Maintena...An Efficient approach of Integrated file Replication and Consistency Maintena...
An Efficient approach of Integrated file Replication and Consistency Maintena...ijceronline
 
Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Communications systems
Communications systemsCommunications systems
Communications systemsMohd Arif
 
Communications
CommunicationsCommunications
CommunicationsDev Sharma
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsVideoguy
 
Internet quality of service an overview
Internet quality of service  an overviewInternet quality of service  an overview
Internet quality of service an overviewfadooengineer
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
High performance and flexible networking
High performance and flexible networkingHigh performance and flexible networking
High performance and flexible networkingJohn Berkmans
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSNexgen Technology
 
MPEG DASH White Paper
MPEG DASH White PaperMPEG DASH White Paper
MPEG DASH White Paperidrajeev
 
A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...IJCSIS Research Publications
 

Mais procurados (17)

LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
 
Implementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy ServerImplementing a Caching Scheme for Media Streaming in a Proxy Server
Implementing a Caching Scheme for Media Streaming in a Proxy Server
 
20 24
20 2420 24
20 24
 
An Efficient approach of Integrated file Replication and Consistency Maintena...
An Efficient approach of Integrated file Replication and Consistency Maintena...An Efficient approach of Integrated file Replication and Consistency Maintena...
An Efficient approach of Integrated file Replication and Consistency Maintena...
 
Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Bcs 052 solved assignment
Bcs 052 solved assignmentBcs 052 solved assignment
Bcs 052 solved assignment
 
Communications systems
Communications systemsCommunications systems
Communications systems
 
IBM & Aspera
IBM & AsperaIBM & Aspera
IBM & Aspera
 
Communications
CommunicationsCommunications
Communications
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering Systems
 
Internet quality of service an overview
Internet quality of service  an overviewInternet quality of service  an overview
Internet quality of service an overview
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
High performance and flexible networking
High performance and flexible networkingHigh performance and flexible networking
High performance and flexible networking
 
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMSPROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
PROVABLE MULTICOPY DYNAMIC DATA POSSESSION IN CLOUD COMPUTING SYSTEMS
 
MPEG DASH White Paper
MPEG DASH White PaperMPEG DASH White Paper
MPEG DASH White Paper
 
A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...
 
Paper on IPV6 & LTE
Paper on IPV6 & LTEPaper on IPV6 & LTE
Paper on IPV6 & LTE
 

Destaque

Efficient computation of range aggregates
Efficient computation of range aggregatesEfficient computation of range aggregates
Efficient computation of range aggregatesingenioustech
 
Phish market protocol
Phish market protocolPhish market protocol
Phish market protocolingenioustech
 
Impact of le arrivals and departures on buffer
Impact of  le arrivals and departures on bufferImpact of  le arrivals and departures on buffer
Impact of le arrivals and departures on bufferingenioustech
 
On the quality of service of crash recovery
On the quality of service of crash recoveryOn the quality of service of crash recovery
On the quality of service of crash recoveryingenioustech
 
Peering equilibrium multi path routing
Peering equilibrium multi path routingPeering equilibrium multi path routing
Peering equilibrium multi path routingingenioustech
 
Dynamic measurement aware
Dynamic measurement awareDynamic measurement aware
Dynamic measurement awareingenioustech
 
Applied research of e learning
Applied research of e learningApplied research of e learning
Applied research of e learningingenioustech
 

Destaque (8)

Efficient computation of range aggregates
Efficient computation of range aggregatesEfficient computation of range aggregates
Efficient computation of range aggregates
 
Phish market protocol
Phish market protocolPhish market protocol
Phish market protocol
 
Impact of le arrivals and departures on buffer
Impact of  le arrivals and departures on bufferImpact of  le arrivals and departures on buffer
Impact of le arrivals and departures on buffer
 
On the quality of service of crash recovery
On the quality of service of crash recoveryOn the quality of service of crash recovery
On the quality of service of crash recovery
 
Peering equilibrium multi path routing
Peering equilibrium multi path routingPeering equilibrium multi path routing
Peering equilibrium multi path routing
 
Dynamic measurement aware
Dynamic measurement awareDynamic measurement aware
Dynamic measurement aware
 
Applied research of e learning
Applied research of e learningApplied research of e learning
Applied research of e learning
 
Intrution detection
Intrution detectionIntrution detection
Intrution detection
 

Semelhante a Design and evaluation of a proxy cache for

International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...
International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...
International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...ijp2p
 
Congestion control for_p2_p_live_streaming
Congestion control for_p2_p_live_streamingCongestion control for_p2_p_live_streaming
Congestion control for_p2_p_live_streamingijp2p
 
CONGESTION CONTROL FOR P2P LIVE STREAMING
CONGESTION CONTROL FOR P2P LIVE STREAMINGCONGESTION CONTROL FOR P2P LIVE STREAMING
CONGESTION CONTROL FOR P2P LIVE STREAMINGijp2p
 
Ijp2 p
Ijp2 pIjp2 p
Ijp2 pijp2p
 
TR14-05_Martindell.pdf
TR14-05_Martindell.pdfTR14-05_Martindell.pdf
TR14-05_Martindell.pdfTomTom149267
 
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...cscpconf
 
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...csandit
 
Evaluation of a topological distance
Evaluation of a topological distanceEvaluation of a topological distance
Evaluation of a topological distanceIJCNCJournal
 
pWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting FrameworkpWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting Frameworkdegarden
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...ijp2p
 
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...ijp2p
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ijp2p
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ijp2p
 
A P2P Job Assignment Protocol For Volunteer Computing Systems
A P2P Job Assignment Protocol For Volunteer Computing SystemsA P2P Job Assignment Protocol For Volunteer Computing Systems
A P2P Job Assignment Protocol For Volunteer Computing SystemsAshley Smith
 
PeerToPeerComputing (1)
PeerToPeerComputing (1)PeerToPeerComputing (1)
PeerToPeerComputing (1)MurtazaB
 
P2P Video-On-Demand Systems
P2P Video-On-Demand SystemsP2P Video-On-Demand Systems
P2P Video-On-Demand SystemsAshwini More
 
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe SystemOntology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe Systemtheijes
 

Semelhante a Design and evaluation of a proxy cache for (20)

International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...
International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...
International Journal of Peer to Peer Networks (IJP2P) Vol.6, No.2, August 20...
 
Congestion control for_p2_p_live_streaming
Congestion control for_p2_p_live_streamingCongestion control for_p2_p_live_streaming
Congestion control for_p2_p_live_streaming
 
CONGESTION CONTROL FOR P2P LIVE STREAMING
CONGESTION CONTROL FOR P2P LIVE STREAMINGCONGESTION CONTROL FOR P2P LIVE STREAMING
CONGESTION CONTROL FOR P2P LIVE STREAMING
 
Ijp2 p
Ijp2 pIjp2 p
Ijp2 p
 
TR14-05_Martindell.pdf
TR14-05_Martindell.pdfTR14-05_Martindell.pdf
TR14-05_Martindell.pdf
 
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
 
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
 
Evaluation of a topological distance
Evaluation of a topological distanceEvaluation of a topological distance
Evaluation of a topological distance
 
pWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting FrameworkpWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting Framework
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
 
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
IDENTIFICATION OF EFFICIENT PEERS IN P2P COMPUTING SYSTEM FOR REAL TIME APPLI...
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
 
presentation_SB_v01
presentation_SB_v01presentation_SB_v01
presentation_SB_v01
 
A P2P Job Assignment Protocol For Volunteer Computing Systems
A P2P Job Assignment Protocol For Volunteer Computing SystemsA P2P Job Assignment Protocol For Volunteer Computing Systems
A P2P Job Assignment Protocol For Volunteer Computing Systems
 
S26117122
S26117122S26117122
S26117122
 
PeerToPeerComputing (1)
PeerToPeerComputing (1)PeerToPeerComputing (1)
PeerToPeerComputing (1)
 
P2P Video-On-Demand Systems
P2P Video-On-Demand SystemsP2P Video-On-Demand Systems
P2P Video-On-Demand Systems
 
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe SystemOntology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
 

Mais de ingenioustech

Supporting efficient and scalable multicasting
Supporting efficient and scalable multicastingSupporting efficient and scalable multicasting
Supporting efficient and scalable multicastingingenioustech
 
Monitoring service systems from
Monitoring service systems fromMonitoring service systems from
Monitoring service systems fromingenioustech
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization foringenioustech
 
Measurement and diagnosis of address
Measurement and diagnosis of addressMeasurement and diagnosis of address
Measurement and diagnosis of addressingenioustech
 
Exploiting dynamic resource allocation for
Exploiting dynamic resource allocation forExploiting dynamic resource allocation for
Exploiting dynamic resource allocation foringenioustech
 
Throughput optimization in
Throughput optimization inThroughput optimization in
Throughput optimization iningenioustech
 
Online social network
Online social networkOnline social network
Online social networkingenioustech
 
It auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingIt auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingingenioustech
 
Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlBayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlingenioustech
 
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]ingenioustech
 
Active reranking for web image search
Active reranking for web image searchActive reranking for web image search
Active reranking for web image searchingenioustech
 
Java & dotnet titles
Java & dotnet titlesJava & dotnet titles
Java & dotnet titlesingenioustech
 

Mais de ingenioustech (17)

Supporting efficient and scalable multicasting
Supporting efficient and scalable multicastingSupporting efficient and scalable multicasting
Supporting efficient and scalable multicasting
 
Monitoring service systems from
Monitoring service systems fromMonitoring service systems from
Monitoring service systems from
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization for
 
Measurement and diagnosis of address
Measurement and diagnosis of addressMeasurement and diagnosis of address
Measurement and diagnosis of address
 
Exploiting dynamic resource allocation for
Exploiting dynamic resource allocation forExploiting dynamic resource allocation for
Exploiting dynamic resource allocation for
 
Throughput optimization in
Throughput optimization inThroughput optimization in
Throughput optimization in
 
Tcp
TcpTcp
Tcp
 
Privacy preserving
Privacy preservingPrivacy preserving
Privacy preserving
 
Peace
PeacePeace
Peace
 
Online social network
Online social networkOnline social network
Online social network
 
Layered approach
Layered approachLayered approach
Layered approach
 
It auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingIt auditing to assure a secure cloud computing
It auditing to assure a secure cloud computing
 
Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlBayesian classifiers programmed in sql
Bayesian classifiers programmed in sql
 
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]
Conditional%20 shortest%20path%20routing%20in%20delay%20tolerant%20networks[1]
 
Active reranking for web image search
Active reranking for web image searchActive reranking for web image search
Active reranking for web image search
 
Vebek
VebekVebek
Vebek
 
Java & dotnet titles
Java & dotnet titlesJava & dotnet titles
Java & dotnet titles
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Design and evaluation of a proxy cache for

  • 1. 1 Design and Evaluation of a Proxy Cache for Peer to Peer Traffic Mohamed Hefeeda, Senior Member, IEEE, and Cheng-Hsin Hsu, Student Member, IEEE, and Kianoosh Mokhtarian, Student Member, IEEE Abstract—Peer-to-peer (P2P) systems generate a major fraction of the current Internet traffic, and they significantly increase the load on ISP networks and the cost of running and connecting customer networks (e.g., universities and companies) to the Internet. To mitigate these negative impacts, many previous works in the literature have proposed caching of P2P traffic, but very few (if any) have considered designing a caching system to actually do it. This paper demonstrates that caching P2P traffic is more complex than caching other Internet traffic, and it needs several new algorithms and storage systems. Then, the paper presents the design and evaluation of a complete, running, proxy cache for P2P traffic, called pCache. pCache transparently intercepts and serves traffic from different P2P systems. A new storage system is proposed and implemented in pCache. This storage system is optimized for storing P2P traffic, and it is shown to outperform other storage systems. In addition, a new algorithm to infer the information required to store and serve P2P traffic by the cache is proposed. Furthermore, extensive experiments to evaluate all aspects of pCache using actual implementation and real P2P traffic are presented. Index Terms—Caching, Peer-to-Peer Systems, File Sharing, Storage Systems, Performance Evaluation ! 1 I NTRODUCTION In this paper, we present the design, implementation, and evaluation of a proxy cache for P2P traffic, which we call File-sharing using peer-to-peer (P2P) systems is currently the pCache. pCache is designed to transparently intercept and killer application for the Internet. A huge amount of traffic serve traffic from different P2P systems, while not affect- is generated daily by P2P systems [1]–[3]. This huge amount ing traffic from other Internet applications. pCache explicitly of traffic costs university campuses thousands of dollars every considers the characteristics and requirements of P2P traffic. year. Internet service providers (ISPs) also suffer from P2P As shown by numerous previous studies, e.g., [1], [2], [8], traffic [4], because it increases the load on their routers and [10], [11], network traffic generated by P2P applications has links. Some ISPs shape or even block P2P traffic to reduce different characteristics than traffic generated by other Internet the cost. This may not be possible for some ISPs, because applications. For example, object size, object popularity, and they fear losing customers to their competitors. To mitigate the connection pattern in P2P systems are quite different from negative effects of P2P traffic, several approaches have been their counterparts in web systems. Most of these characteristics proposed in the literature, such as designing locality-aware impact the performance of caching systems, and therefore, neighbor selection algorithms [5] and caching of P2P traffic they should be considered in their design. In addition, as will [6]–[8]. We believe that caching is a promising approach to be demonstrated in this paper, designing proxy caches for mitigate some of the negative consequences of P2P traffic, P2P traffic is more complex than designing caches for web because objects in P2P systems are mostly immutable [1] or multimedia traffic, because of the multitude and diverse and the traffic is highly repetitive [9]. In addition, caching nature of existing P2P systems. Furthermore, P2P protocols does not require changing P2P protocols and can be deployed are not standardized and their designers did not provision for transparently from clients. Therefore, ISPs can readily deploy the potential caching of P2P traffic. Therefore, even deciding caching systems to reduce their costs. Furthermore, caching whether a connection carries P2P traffic and if so extracting can co-exist with other approaches, e.g., enhancing P2P traffic the relevant information for the cache to function (e.g., the locality, to address the problems created by the enormous requested byte range) are non-trivial tasks. volume of P2P traffic. The main contributions of this paper are as follows. • M. Hefeeda is with the School of Computing Science, Simon Fraser • We propose a new storage system optimized for P2P University, 250-13450 102nd Ave, Surrey, BC V3T0A3, Canada. proxy caches. The proposed system efficiently serves Email: mhefeeda@cs.sfu.ca. • C. Hsu is with Deutsche Telekom R&D Lab USA, 5050 El Camino Real requests for object segments of arbitrary lengths, and it Suite 221, Los Altos, CA 94022. has a dynamic segment merging scheme to minimize the • K. Mokhtarian is with Mobidia Inc., 230-10451 Shellbridge Way, number of disk I/O operations. Using traces collected Richmond, BC V6X 2W8, Canada. from a popular P2P system, we show that the proposed This work is partially supported by the Natural Sciences and Engineering storage system is much more efficient in handling P2P Research Council (NSERC) of Canada. traffic than storage systems designed for web proxy caches. More specifically, our experimental results indi-
  • 2. 2 cate that the proposed storage system is about five times effective handling of P2P traffic in order to reduce its negative more efficient than the storage system used in a famous impacts on ISPs and the Internet. web proxy implementation. The rest of this paper is organized as follows. In Sec. 2, we • We propose a new algorithm to infer the information summarize the related work. Sec. 3 presents an overview of the required to store and serve P2P traffic by the cache. proposed pCache system. This is followed by separate sections This algorithm is needed because some P2P systems (e.g., describing different components of pCache. Sec. 4 presents the BitTorrent) maintain information about the exchanged ob- new inference algorithm. Sec. 5 presents the proposed storage jects in metafiles that are held by peers, and peers request management system. Sec. 6 explains why full transparency data relative to information in these metafiles, which is required in P2P proxy caches and describes our method to are not known to the cache. Our inference algorithm is achieve it in pCache. It also explains how non-P2P connections efficient and provides a quantifiable confidence on the are efficiently tunneled through pCache. We experimentally returned results. Our experiments with real BitTorrent evaluate the performance of pCache in Sec. 7. Finally, in clients running behind our pCache confirm our theoretical Sec. 8, we conclude the paper and outline possible extensions analysis and show that the proposed algorithm returns for pCache. correct estimates in more than 99.7% of the cases. Also, the ideas of the inference algorithm are general and could be useful in inferring other characteristics of P2P traffic. 2 R ELATED W ORK • We demonstrate the importance and difficulty of provid- P2P Traffic Caching: Models and Systems. The benefits ing full transparency in P2P proxy caches, and we present of caching P2P traffic have been shown in [9] and [4]. Object our solution for implementing it in the Linux kernel. We replacement algorithms especially designed for proxy cache of also propose efficient splicing of non-P2P connections in P2P traffic have also been proposed in [6] and in our previous order to reduce the processing and memory overhead on works [8], [11]. The above works do not present the design the cache. and implementation of an actual proxy cache for P2P traffic, • We conduct an extensive experimental study to evaluate nor do they address storage management issues in such caches. the proposed pCache system and its individual com- The authors of [7] propose using already-deployed web caches ponents using actual implementation in two real P2P to serve P2P traffic. This, however, requires modifying the networks: BitTorrent and Gnutella which are currently P2P protocols to wrap their messages in HTTP format and among the most popular P2P systems. (A recent white to discover the location of the nearest web cache. Given the paper indicates the five most popular P2P protocols are: distributed and autonomous nature of the communities devel- BitTorrent, eDonkey, Gnutella, DirectConnect, and Ares oping P2P client software, incorporating these modifications [12]). BitTorrent and Gnutella are chosen because they into actual clients may not be practical. In addition, several are different in their design and operation: The former is measurement studies have shown that the characteristics of swarm-based and uses built-in incentive schemes, while P2P traffic are quite different from those of web traffic [1], the latter uses a two-tier overlay network. Our results [2], [8], [10], [11]. These different characteristics indicate that show that our pCache is scalable and it benefits both the web proxy caches may yield poor performance if they were clients and the ISP in which it is deployed, without hurt- to be used for P2P traffic. The experimental results presented ing the performance of the P2P networks. Specifically, in this paper confirm this intuition. clients behind the cache achieve much higher download In order to store and serve P2P traffic, proxy caches need speeds than other clients running in the same conditions to identify connections belonging to P2P systems and extract without the cache. In addition, a significant portion of the needed information from them. Several previous works address traffic is served from the cache, which reduces the load the problem of identifying P2P traffic, including [15]–[17]. on the expensive WAN links for the ISP. Our results also The authors of [15] identify P2P traffic based on application show that the cache does not reduce the connectivity of signatures appearing in the TCP stream. The authors of [17] clients behind it, nor does it reduce their upload speeds. analyze the behavior of different P2P protocols, and identify This is important for the whole P2P network, because patterns specific to each protocol. The authors of [16] identify reduced connectivity could lead to decreased availability P2P traffic using only transport layer information. A compar- of peers and the content stored on them, while reduced ison among different methods is conducted in [18]. Unlike upload speeds could degrade the P2P network scalability. our work, all previous identification approaches only detect the presence of P2P traffic: they just decide whether a packet In addition, this paper summarizes our experiences and or a stream of packets belongs to one of the known P2P lessons learned during designing and implementing a running systems. This is useful in applications such as traffic shaping system for caching P2P traffic, which was demonstrated at and blocking, capacity planning, and service differentiation. [13]. These lessons could be of interest to other researchers P2P traffic caching, however, does need to go beyond just and to companies developing products for managing P2P identifying P2P traffic; it requires not only the exchanged traffic. We make the source code of pCache (more than 11,000 object ID, but also the requested byte range of that object. In lines of C++ code) and our P2P traffic traces available to the some P2P protocols, this information can easily be obtained research community [14]. Because it is open-source, pCache by reading a few bytes from the payload, while in others it is could stimulate more research on developing methods for not straightforward.
  • 3. 3 Several P2P caching products have been introduced to the pCache market. Oversi’s OverCache [19] implements P2P caching, Storage System Manager but takes a quite different approach compared to pCache. An Block In−memory Replacement OverCache server participates in P2P networks and only serves Allocation Structures Policy Method peers within the ISP. This approach may negatively affect fair- ness in P2P networks because peers in ISPs with OverCache deployed would never contribute as they can always receive P2P Traffic Processor Connection data from the cache servers without uploading to others. This Manager Parser Composer Analyzer in turns degrades the performance of P2P networks. PeerApp’s UltraBand [20] supports multiple P2P protocols, such as Bit- P2P Traffic Torrent, Gnutella, and FastTrack. These commercial products Gateway Router demonstrate the importance and timeliness of the problem addressed in this paper. The details of these products, however, P2P Traffic Identifier are not publicly available, thus the research community cannot use them to evaluate new P2P caching algorithms. In contrast, Internet Traffic the pCache system is open-source and could be used to develop Transparent Proxy algorithms for effective handling of P2P traffic in order to reduce loads on ISPs and the Internet. That is, we develop Fig. 1. The design of pCache. pCache software as a proof of concept, rather than yet another commercial P2P cache. Therefore, we do not compare our pCache against these commercial products. multimedia object into segments using segmentation strategies Storage Management for Proxy Caches. Several storage such as uniform, exponential, and frame-based. P2P proxy management systems have been proposed in the literature caches do not have the freedom to choose a segmentation for web proxy caches [21]–[24] and for multimedia proxy strategy; it is imposed by P2P software clients. Rate-split caches [25]–[27]. These systems offer services that may not caching employs scalable video coding that encodes a video be useful for P2P proxy caches, e.g., minimizing startup delay into several substreams, and selectively caches some of these and clustering of multiple objects coming from the same substreams. This requires scalable video coding structures, web server, and they lack other services, e.g., serving partial which renders rate-split caching useless for P2P applications. requests with arbitrary byte ranges, which are essential in P2P proxy caches. We describe some examples to show why such systems are not suitable for P2P caches. 3 OVERVIEW OF P C ACHE Most storage systems of web proxy caches are designed for The proposed pCache is to be used by autonomous systems small objects. For example, the Damelo system [28] supports (ASes) or ISPs that are interested in reducing the burden objects that have sizes less than a memory page, which is of P2P traffic. Caches in different ASes work independently rather small for P2P systems. The UCFS system [23] maintains from each other; we do not consider cooperative caching data structures to combine several tiny web objects together in in this paper. pCache would be deployed at or near the order to fill a single disk block and to increase disk utilization. gateway router of an AS. The main components of pCache This adds overhead and is not needed in P2P systems, because are illustrated in Fig. 1. At high-level, a client participating in a segment of an object is typically much larger than a disk a P2P network issues a request to download an object. This block. Clustering of web objects downloaded from the same request is transparently intercepted by pCache. If the requested web server is also common in web proxy caches [22], [24]. object or parts of it are stored in the cache, they are served This clustering exploits the temporal correlation among these to the requesting client. This saves bandwidth on the external objects in order to store them near to each other on the disk. (expensive) links to the Internet. If no part of the requested This clustering is not useful in P2P systems, because even a object is found in the cache, the request is forwarded to the single object is typically downloaded from multiple senders. P2P network. When the response comes back, pCache may Proxy caches for multimedia streaming systems [25]–[27], store a copy of the object for future requests from other clients on the other hand, could store large objects and serve byte in its AS. Clients inside the AS as well as external clients are ranges. Multimedia proxy caches can be roughly categorized not aware of pCache, i.e., pCache is fully transparent in both into four classes [25]: sliding-interval, prefix, segment-based, directions. and rate-split caching. Sliding-interval caching employs a As shown in Fig. 1, the Transparent Proxy and P2P Traffic sliding-window for each cached object to take advantage of Identifier components reside on the gateway router. They the sequential access pattern that is common in multimedia transparently inspect traffic going through the router and systems. Sliding-interval caching is not applicable to P2P forward only P2P connections to pCache. Traffic that does traffic, because P2P applications do not request segments in not belong to any P2P system is processed by the router a sequential manner. Prefix caching stores the initial portion in the regular way and is not affected by the presence of of multimedia objects to minimize client start-up delays. P2P pCache. Once a connection is identified as belonging to a applications seek shorter total download times rather than P2P system, it is passed to the Connection Manager, which start-up delays. Segment-based multimedia caching divides a coordinates different components of pCache to store and serve
  • 4. 4 requests from this connection. pCache has a custom-designed inspecting the byte stream of the connection, the Parser Storage System optimized for P2P traffic. In addition, pCache determines the boundaries of messages exchanged between needs to communicate with peers from different P2P systems. peers, and it extracts the request and response messages that For each supported P2P system, the P2P Traffic Processor are of interest to the cache. The Parser returns the ID of provides three modules to enable this communication: Parser, the object being downloaded in the session, as well as the Composer, and Analyzer. The Parser performs functions such requested byte range (start and end bytes). The byte range is as identifying control and payload messages, and extracting relative to the whole object. The Composer prepares protocol- messages that could be of interest to the cache such as specific messages, and may combine data stored in the cache object request messages. The Composer constructs properly- with data obtained from the network into one message to be formatted messages to be sent to peers. The Analyzer is a sent to a peer. While the Parser and Composer are mostly place holder for any auxiliary functions that may need to be implementation details, the Analyzer is not. The Analyzer performed on P2P traffic from some systems. For example, contains an algorithm to infer information required by the in BitTorrent the Analyzer infers information (piece length) cache to store and serve P2P traffic, but is not included in needed by pCache that is not included in messages exchanged the traffic itself. This is not unusual (as demonstrated below between peers. for the widely-deployed BitTorrent), since P2P protocols are The design of the proposed P2P proxy cache is the re- not standardized and their designers did not conceive/provision sult of several iterations and refinements based on extensive for caching P2P traffic. We propose an algorithm to infer this experimentation. Given the diverse and complex nature of information with a quantifiable confidence. P2P systems, proposing a simple, well-structured, design that is extensible to support various P2P systems is indeed a nontrivial systems research problem. Our running prototype 4.2 Inference Algorithm for Information Required by currently serves BitTorrent and Gnutella traffic at the same the Cache time. To support a new P2P system, two things need to be The P2P proxy cache requires specifying the requested byte done: (i) installing the corresponding application identifier in range such that it can correctly store and serve segments. the P2P Traffic Identifier, and (ii) loading the appropriate Some P2P protocols, most notably BitTorrent, do not provide Parser, Composer, and optionally Analyzer modules of the this information in the traffic itself. Rather, they indirectly P2P Traffic Processor. Both can be done in run-time without specify the byte range, relative to information held only by end recompiling or impacting other parts of the system. peers and not known to the cache. Thus, the cache needs to Finally, we should mention that the proxy cache design employ an algorithm to infer the required information, which in Fig. 1 does not require users of P2P systems to perform we propose in this subsection. We describe our algorithm in any special configurations of their client software, nor does it the context of the BitTorrent protocol, which is currently the need the developers of P2P systems to cooperate and modify most-widely used P2P protocol [12]. Nonetheless, the ideas any parts of their protocols: It caches and serves current P2P and analysis of our algorithm are fairly general and could be traffic as is. Our design, however, can further be simplified applied to other P2P protocols, after collecting the appropriate to support cases in which P2P users and/or developers may statistics and making minor adjustments. cooperate with ISPs for mutual benefits—a trend that has Objects exchanged in BitTorrent are divided into equal- recently seen some interests with projects such as P4P [29], size pieces. A single piece is downloaded by issuing multiple [30]. For example, if users were to configure their P2P clients requests for byte ranges, i.e., segments, within the piece. to use proxy caches in their ISPs listening on a specific port, Thus, a piece is composed of multiple segments. In request the P2P Traffic Identifier would be much simpler. messages, the requested byte range is specified by an offset within the piece and the number of bytes in the range. Peers 4 P2P T RAFFIC I DENTIFIER AND P ROCESSOR know the length of the piece of the object being downloaded, because it is included in the metafile (torrent file) held by This section describes the P2P Traffic Identifier and Processor them. The cache needs the piece length for three reasons. components of pCache, and presents a new algorithm to infer First, the piece length is needed to perform segment merging, information needed by the cache to store and serve P2P traffic. which can reduce the overhead on the cache. For example, assuming the cache has received byte range [0, 16] KB of 4.1 Overview piece 1 and range [0, 8] KB of piece 2; without knowing the The P2P Traffic Identifier determines whether a connection precise piece length, the cache can not merge these two byte belongs to any P2P system known to the cache. This is done ranges into a continuous byte range. Second, the piece length is by comparing a number of bytes from the connection stream required to support cross-torrent caching of the same content, against known P2P application signatures. The details are as different torrent files can have different piece length. Third, similar to the work in [15], and therefore omitted. We have the piece length is needed for cross-system caching. For implemented identifiers for BitTorrent and Gnutella. example, a file cached for in BitTorrent networks may be To store and serve P2P traffic, the cache needs to per- used to serve Gnutella users requesting the same file, while form several functions beyond identifying the traffic. These Gnutella protocol has no concept of piece length. One way functions are provided by the P2P Traffic Processor, which for the cache to obtain this information is to capture metafiles has three components: Parser, Composer, and Analyzer. By and match them with objects. This, however, is not always
  • 5. 5 possible, because peers frequently receive metafiles by various 40% means/applications including emails and web downloads. Fraction of Total Torrents Our algorithm infers the piece length of an object when the 30% object is first seen by the cache. The algorithm monitors a few segment requests, and then it makes an informed guess 20% about the piece length. To design the piece length inference algorithm, we collected statistics on the distribution of the 10% piece length in actual BitTorrent objects. We wrote a script 0% to gather and analyze a large number of torrent files. We 16KB 64KB 256KB 1MB 4MB 16MB Piece Length collected more than 43,000 unique torrent files from popular torrent websites on the Internet. Our analysis revealed that Fig. 2. Piece length distribution in BitTorrent. more than 99.8% of the pieces have sizes that are power of 2. The histogram of the piece length distribution is given in Fig. 2, which we will use in the algorithm. Using these results, we define the set of all possible piece lengths as Fig. 2 to find the required number of samples n in order to S = {2kmin , 2kmin +1 , . . . , 2kmax }, where 2kmin and 2kmax achieve a given probability of correctness. are the minimum and maximum possible piece lengths. We The assumption that samples are equally likely to be in denote by L the actual piece length that we are trying to infer. [0, L] is realistic, because the cache observes requests from A request for a segment comes in the form {offset, size} within many clients at the same time for an object, and the clients this piece, which tells us that the piece length is at least as are not synchronized. In few cases, however, there could be large as offset + size. There are multiple requests within the one client and that client may issue requests sequentially. This piece. We denote each request by xi = offseti + sizei , where depends on the actual implementation of the BitTorrent client 0 < xi ≤ L. The inference algorithm observes a set of n software. To address this case, we use the following heuristic. samples x1 , x2 , . . . , xn and returns a reliable guess for L. We maintain the maximum value seen in the samples taken After observing the n-th sample, the algorithm computes so far. If a new sample increases this maximum value, we k such that 2k is the minimum power of two integer that is reset the counters of the observed samples. In Sec. 7.5, we greater than or equal to xi (i = 1, 2, . . . , n). For example, if empirically validate the above inference algorithm and we n = 5 and the xi values are 9, 6, 13, 4, 7, then the estimated show that the simple heuristic improves its accuracy. piece length would be 2k = 16. Our goal is to determine the Handling Incorrect Inferences. In the evaluation section, minimum number of samples n such that the probability that we experimentally show that the accuracy of our inference the estimated piece length is correct, i.e., P r(2k = L), exceeds algorithm is about 99.7% when we use six segments in the a given threshold, say 99%. We compute the probability that inference, and it is almost 100% if we use 10 segments. The the estimated piece length is correct as follows. We define E very rare incorrect inferences are observed and handled by as the event that the n samples are in the range [0, 2k ] and the cache as follows. An incorrect inference is detected by at least one of them does not fall in [0, 2k−1 ], where 2k is observing a request exceeding the inferred piece boundary, the estimated value returned by the algorithm. Assuming that because we use the minimum possible estimate for the piece each sample is equally likely to be anywhere within the range length. This may happen at the beginning of a download [0, L], we have: session for an object that was never seen by the cache before. The cache would have likely stored very few segments of this P r(2k = L) object and most of them are still in the buffer, not flushed to P r(2k = L | E) = P r(E | 2k = L) P r(E) the disk yet. Thus, all the cache needs to do is to re-adjust a P r(2k = L) few pointers in the memory with the correct piece length. In =[1 − P r(all n samples in [0, 2k−1 ] | 2k = L)] the worst case, a few disk blocks will also need to be moved P r(E) to other disk locations. An even simpler solution is possible: 1 n P r(2k = L) =[1 − ( ) ] . (1) discard from the cache these few segments, which has a 2 P r(E) negligible cost. Finally, we note that the cache starts storing In the above equation, P r(2k = L) can be directly known segments after the inference algorithm returns an estimate, but from the empirically-derived histogram in Fig. 2. Furthermore, it does not immediately serve these segments to other clients P r(E) is given by: until a sufficient number of segments (e.g., 100) have been downloaded to make the probability of incorrect inference P r(E) = P r(l = L)P r(E | l = L) practically 0. Therefore, the consequence of a rare incorrect l∈S inference is a slight overhead in the storage system and for a 1 1 1 temporary period (till the correct piece length is computed), =P r(2k = L)(1 − ( )n ) + P r(2k+1 = L)(( )n − ( )n )+ 2 2 4 and no wrong data is served to the clients at anytime. kmax 1 kmax −k n 1 kmax −k+1 n · · · + P r(2 = L)((( ) ) − (( ) ) ). 2 2 5 S TORAGE M ANAGEMENT (2) In this section, we elaborate on the need for a new storage Our algorithm solves Eqs. (2) and (1) using the histogram in system, in addition to what we mentioned in Sec. 2. Then, we
  • 6. 6 Segment Lookup Table Object #0 present the design of the proposed storage system. ID Hash Segments Segment #0 (disjoint and sorted) Offset Len RefCnt Buffer* Block* Object #0 Segments* 5.1 The Need for a New Storage System Segment #1 Object #1 Segments* Offset Len RefCnt Buffer* Block* In P2P systems, a receiving peer requests an object from Segment#n multiple sending peers in the form of segments, where a Offset Len RefCnt Buffer* Block* segment is a range of bytes. Object segmentation is protocol dependent and even implementation dependent. Moreover, seg- ment sizes could vary based on the number and type of senders Segment* Size Data in the download session, as in the case of Gnutella. Therefore, successive requests of the same object can be composed of Segment* Size Data segments with different sizes. For example, a request comes Memory Page Buffers Disk Blocks to the cache for the byte range [128—256 KB] as one segment, which is then stored locally in the cache for future requests. Fig. 3. The proposed storage management system. However, a later request may come for the byte range [0—512 KB] of the same object and again as one segment. The cache should be able to identify and serve the cached portion of yield very close performance to the kernel-space ones. A user- the second request. Furthermore, previous work [11] showed space storage management system can be built on top of a that to improve the cache performance, objects should be large file created by the file system of the underlying operating incrementally admitted in the cache because objects in P2P system, or it can be built directly on a raw disk partition. We systems are fairly large, their popularity follows a flattened- implement and analyze two versions of the proposed storage head model [1], [11], and they may not be even downloaded in management system: on top of the ext2 Linux file system their entirety since users may abort the download sessions [1]. (denoted by pCache/Fs) and on a raw disk partition (denoted This means that the cache will usually store random fragments by pCache/Raw). of objects, not complete contiguous objects, and the cache As shown in Fig. 3, the proposed storage system maintains should be able to serve these partial objects. two structures in the memory: metadata and page buffers. While web proxy caches share some characteristics with The metadata is a two-level lookup table designed to enable P2P proxy caches, their storage systems can yield poor per- efficient segment lookups. The first level is a hash table keyed formance for P2P proxy caches, as we show in Sec. 7.3. on object IDs; collisions are resolved using common chaining The most important reason is that most web proxy caches techniques. Every entry points to the second level of the table, consider objects with unique IDs, such as URLs, in their which is a set of cached segments belonging to the same entirety. P2P applications almost never request entire objects, object. Every segment entry consists of a few fields: Offset instead, segments of objects are exchanged among peers. A indicates the absolute segment location within the object, simple treatment of reusing web proxy caches for P2P traffic Len represents the number of bytes in this segment, RefCnt is to define a hash function on both object ID and segment keeps track of how many connections are currently using this range, and consider segments as independent entities. This segment, Buffer points to the allocated page buffers, and Block simple approach, however, has two major flaws. First, the hash points to the assigned disk blocks. Each disk is divided into function destroys the correlation among segments belonging to fixed-size disk blocks, which are the smallest units of disk the same objects. Second, this approach cannot support partial operations. Therefore, block size is a system parameter that hits, because different segment ranges are hashed to different, may affect caching performance: larger block sizes are more unrelated, values. vulnerable to segmentations, while smaller block sizes may The proposed storage system supports efficient lookups lead to high space overhead. RefCnt is used to prevent evicting for partial hits. It also improves the scalability of the cache a buffer page if there are connections currently using it. by reducing the number of I/O operations. This is done Notice that segments do not arrive to the cache sequentially, by preserving the correlation among segments of the same and not all segments of an object will be stored in the objects, and dynamically merging segments. To the best of our cache. Thus, a naive contiguous allocation of all segments knowledge, there are no other storage systems proposed in the will waste memory, and will not efficiently find partial hits. literature for P2P proxy caches, even though the majority of We implement the set of cached segments as a balanced (red- the Internet traffic comes from P2P systems [1]–[3]. black) binary tree, which is sorted based on the Offset field. Using this structure, partial hits can be found in at most O(log S) steps, where S is the number of segments in the 5.2 The Proposed Storage System object. This is done by searching on the offset field. Segment A simplified view of the proposed storage system is shown in insertions and deletions are also done in logarithmic number Fig. 3. We implement the proposed system in the user space, of steps. Since this structure supports partial hits, the cached so that it is not tightly coupled with the kernels of operating data is never obtained from the P2P network again, and only systems, and can be easily ported to different kernel versions, mutually disjoint segments are stored in the cache. operating system distributions, and even to various operating The second part of the in-memory structures is the page systems. As indicated by [24], user-space storage systems buffers. Page buffers are used to reduce disk I/O operations as
  • 7. 7 well as to perform segment merging. As shown in Fig. 3, we While partial transparency is sufficient for most web proxy propose to use multiple sizes of page buffers, because requests caches, it is not enough and will not work for P2P proxy come to the cache from different P2P systems in variable sizes. caches. This is because external peers may not respond to We also pre-allocate these pages in memory for efficiency. requests coming from the IP address of the cache, since the Dynamic memory allocation may not be suitable for proxy cache is not part of the P2P network and it does not participate caches, since it imposes processing overheads. We maintain in the P2P protocol. Implementing many P2P protocols in unoccupied pages of the same size in the same free-page list. If the cache to make it participate in P2P networks is a tedious peers request segments that are in the buffers, they are served task. More important, participation in P2P networks imposes from memory and no disk I/O operations are issued. If the significant overhead on the cache itself, because it will have requested segments are on the disk, they need to be swapped to process protocol messages and potentially serve too many in some free memory buffers. When all free buffers are used objects to external peers. For example, if the cache was to up, the least popular data in some of the buffers are swapped participate in the tit-for-tat BitTorrent network, it would have out to the disk if this data has been modified since it was to upload data to other external peers proportional to data brought in memory, and it is overwritten otherwise. downloaded by the cache on behalf of all internal BitTorrent peers. Furthermore, some P2P systems require registration and authentication steps that must be done by users, and the cache 6 T RANSPARENT C ONNECTION M ANAGE - cannot do these steps. MENT Supporting full transparency in P2P proxy caches is not This section starts by clarifying the importance and com- trivial, because it requires the cache to process and modify plexity of providing full transparency in P2P proxy caches. packets destined to remote peers, i.e., its network stack accepts Then, it presents our proposed method for splicing non-P2P packets with non-local IP addresses and port numbers. connections in order to reduce the processing and memory overhead on the cache. We implement the Connection Manager 6.2 Efficient Splicing of non-P2P Connections in Linux 2.6. The details on the implementation of connection Since P2P systems use dynamic ports, the proxy process may redirection and the operation of the Connection Manager [14] initially intercept some connections that do not belong to are not presented due to the space limitations. P2P systems. This can only be discovered after inspecting a few packets using the P2P Traffic Identification module. Each 6.1 Importance of Full Transparency intercepted connection is split into a pair of connections, and all packets have to go through the proxy process. This imposes Based on our experience of developing a running caching overhead on the proxy cache and may increase the end-to-end system, we argue that full transparency is important in P2P delay of the connections. To reduce this overhead, we propose proxy caches. This is because non-transparent proxies may to splice each pair of non-P2P connections using TCP splicing not take full advantage of the deployed caches, since they techniques [32], which have been used in layer-7 switching. require users to manually configure their applications. This For spliced connections, the sockets in the proxy process are is even worse in the P2P traffic caching case due to the closed and packets are relayed in the kernel stack instead existence of multiple P2P systems, where each has many of passing them up to the proxy process (in the application different client implementations. Transparent proxies, on the layer). Since packets do not traverse through the application other hand, actively intercept connections between internal and layer, TCP splicing reduces overhead on maintaining a large external hosts, and they do not require error-prone manual number of open sockets as well as forwarding threads. Several configurations of the software clients. implementation details had to be addressed. For example, Transparency in many web caches, e.g., Squid [31], is since different connections start from different initial sequence achieved as follows. The cache intercepts the TCP connec- numbers, the sequence numbers of packets over the spliced tion between a local client and a remote web server. This TCP connections need to be properly changed before being interception is done by connection redirection, which forwards forwarded. This is done by keeping track of the sequence connection setup packets traversing through the gateway router number difference between two spliced TCP connections, and to a proxy process. The proxy process accepts this connection updating sequence numbers before forwarding packets. and serves requests on it. This may require the proxy to create a connection with the remote server. The web cache uses its IP address when communicating with the web server. Thus, 7 E XPERIMENTAL E VALUATION OF P C ACHE the server can detect the existence of the cache and needs In this section, we conduct extensive experiments to evaluate to communicate with it. We call this type of caches partially all aspects of the proposed pCache system with real P2P transparent, because the remote server is aware of the cache. In traffic. We start our evaluation, in Sec. 7.1, by validating contrast, we refer to proxies that do not reveal their existence that our implementation of pCache is fully transparent, and as fully-transparent proxies. When a fully-transparent proxy it serves actual non-corrupted data to clients as well as communicates with the internal host, it uses the IP address saves bandwidth for ISPs. In Sec. 7.2, we show that pCache and port number of the external host, and similarly when it improves the performance of P2P clients without reducing communicates with the external host it uses the information the connectivity in P2P networks. In Sec. 7.3, we rigorously of the internal host. analyze the performance of the proposed storage management
  • 8. 8 3 system and show that it outperforms others. In Sec. 7.4, we 10 Number of Scheduled Downloads Total Number of Download: 2729 demonstrate that pCache is scalable and could easily serve most customer ASes in the Internet using a commodity PC. 2 10 In Sec. 7.5, we validate the analysis of the inference algorithm and show that its estimates are correct in more than 99.7% of the cases with low overhead. Finally, in Sec. 7.6, we show 10 1 that the proposed connection splicing scheme improves the performance of the cache. 0 10 0 The setup of the testbed used in the experiments consists 10 10 1 10 2 3 10 of two separate IP subnets. Traffic from client machines in Object Rank subnet 1 goes through a Linux router on which we install Fig. 5. Download sessions per object. pCache. Client machines in subnet 2 are directly attached to the campus network. All internal links in the testbed have 100 Mb/s bandwidth. All machines are configured with static IP addresses, and appropriate route entries are added in BitTorrent as follows. We developed a crawler to contact in the campus gateway router in order to forward traffic to popular torrent search engines such as TorrentSpy, MiniNova, subnet 1. Our university normally filters traffic from some and IsoHunt. We collected numerous torrent files. Each torrent P2P applications and shapes traffic from others. Machines in file contains information about one object; an object can our subnets were allowed to bypass these filters and shapers. be a single file or multiple files grouped together. We also pCache is running on a machine with an Intel Core 2 Duo contacted trackers to collect the total number of downloads 1.86 GHz processor, 1 GB RAM, and two hard drives: one that each object received, which indicates the popularity of for the operating system and another for the storage system the object. We randomly took 500 sample objects from all the of the cache. The operating system is Linux 2.6. In our torrent files collected by our crawler, which were downloaded experiments, we concurrently run 10 P2P software clients 111,620 times in total. The size distribution of the chosen in each subnet. We could not deploy more clients because 500 objects ranges from several hundred kilobytes to a few of the excessive volume of traffic they generate, which is hundred megabytes. To conduct experiments within a reason- problematic in a university setting (during our experiments, we able amount of time, we scheduled about 2,700 download were notified by the network administrator that our BitTorrent sessions. The number of download sessions assigned to an clients exchanged more than 300 GB in one day!). object is proportional to its popularity. Fig. 5 shows the distribution of the scheduled download sessions per objects. This figure indicates that our empirical popularity data follows 7.1 Validation of pCache and ISP Benefits Mandelbrot-Zipf distribution, which is a generalized form of pCache is a fairly complex software system with many com- Zipf-like distributions with an extra parameter to capture the ponents. The experiments in this section are designed to flattened head nature of the popularity distribution observed show that the whole system actually works. This verification near the most popular objects in our popularity data. The shows that: (i) P2P connections are identified and transparently popularity distribution collect by our crawler is similar to those split, (ii) non-P2P connections are tunneled through the cache observed in previous measurement studies [1], [8]. and are not affected by its existence, (iii) segment lookup, After distributing the 2,700 scheduled downloads among the buffering, storing, and merging all work properly and do not 500 objects, we randomly shuffled them such that downloads corrupt data, (iv) the piece-length inference algorithm yields for the same object do not come back to back with each correct estimates, and (v) data is successfully obtained from other, but rather they are interleaved with downloads for other external peers, assembled with locally cached data, and then objects. We then equally divided the 2,700 downloads into served to internal peers in the appropriate message format. ten lists. Each of these ten lists, along with the corresponding All of these are shown for real BitTorrent traffic with many torrent files, was given to a pair of CTorrent clients to execute: objects that have different popularities. The experiments also one in subnet 1 (behind the cache) and another in subnet 2. To compare the performance of P2P clients with and without the conduct fair comparisons between the performance of clients proxy cache. with and without the cache, we made each pair of clients start We modified an open-source BitTorrent client called CTor- a scheduled download session at the same time and with the rent. CTorrent is a light-weight (command-line) C++ program; same information. Specifically, only one of the two clients we compiled it on both Linux and Windows. We did not contacted the BitTorrent tracker to obtain the addresses of configure CTorrent to contact or be aware of the cache in seeders and leechers of the object that need to be downloaded. anyway. We deployed 20 instances of the modified CTorrent This information was shared with the other client. Also, a new client on the four client machines in the testbed. Ten clients (in download session was not started unless the other client either subnet 1) were behind the proxy cache and ten others were finished or aborted its current session. We let the 20 clients directly connected to the campus network. All clients were run for two days, collecting detailed statistics from each client controlled by a script to coordinate the download of many every 20 seconds. objects. The objects and the number of downloads per object Several sanity checks were performed and passed. First, were carefully chosen to reflect the actual relative popularity all downloaded objects passed the BitTorrent checksum test.
  • 9. 9 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 CDF CDF CDF 0.4 0.4 0.4 0.2 0.2 0.2 Without pCache Without pCache Without pCache With pCache With pCache With pCache 0 0 0 0 100 200 300 400 0 10 20 30 40 50 0 5 10 15 20 25 30 Download Speed (KB) Upload Speed (KB) Max Number of Connected Peers (a) Download Speed (b) Upload Speed (c) Number of Connected Peers Fig. 4. The impact of the cache on the performance of clients. Second, the total number of download sessions that completed clients, in addition to benefiting the ISP deploying the cache was almost the same for the clients with and without the by saving WAN traffic. Furthermore, the increased download cache. Third, clients behind the cache were regular desktops speed did not require clients behind the cache to significantly participating in the campus network, and other non-P2P net- increase their upload speeds, as shown in Fig. 4(b). This is also work applications were running on them. These applications beneficial for both clients and the ISP deploying the cache, included web browsers, user authentication modules using cen- while it does not hurt the global BitTorrent network, as clients tralized database (LDAP), and file system backup applications. behind the cache are still uploading to other external peers. All applications worked fine through the cache. Finally, a The difference between the upload and download speeds can significant portion of the total traffic was served from the be attributed mostly to the contributions of the cache. Fig. 4(c) cache; up to 90%. This means that the cache identified, stored, demonstrates that the presence of the cache did not reduce the and served P2P traffic. This is all happened transparently connectivity of clients behind the cache; they have roughly without changing the P2P protocol or configuring the client the same number of connected peers. This is important for the software. Also, since this is BitTorrent traffic, the piece-length local clients as well as the whole network, because reduced inference algorithm behaved as expected, and our cache did connectivity could lead to decreased availability of peers and not interfere with the incentive scheme of BitTorrent. We the content stored on them. Finally, we notice that the cache note that the unusually high 90% byte hit rate seen in our may add some latency in the beginning of the download experiments is due to the limited number of clients that we session. This latency was in the order of milliseconds in our have; the clients did not actually generate enough traffic to experiments. This small latency is really negligible in P2P fill the whole storage capacity of the cache. In full-scale systems in which sessions last for minutes if not hours. deployment of the cache, the byte hit rate is expected to be smaller, in the range of 30 to 60% [11], which would still yield significant savings in bandwidth given the huge volume 7.3 Performance of the Storage System of P2P traffic. Experimental Setup. We compare the performance of the proposed storage management system against the storage system used in the widely-deployed Squid proxy cache [31], 7.2 Performance of P2P Clients after modifying it to serve partial hits needed for P2P traffic. Next, we analyze the performance of the clients and the impact This is done to answer the question: “What happens if we were on the P2P network. We are interested in the download speed, to use the already-deployed web caches for P2P traffic?” To the upload speed, and number of peers to which each of our best of our knowledge, we are not aware of any other storage clients is connected. These statistics were collected every 20 systems designed for P2P proxy caches, and as described seconds from all 20 clients. The download (upload) speed in Sec. 2, storage systems proposed in [21]–[24] for web was measured as the number of bytes downloaded (uploaded) proxy caches and in [25]–[27] for multimedia caches are not during the past period divided by the length of that period. readily applicable to P2P proxy caches. Therefore, we could Then, the average download (upload) speed for each completed not conduct a meaningful comparison with them. session was computed. For the number of connected peers, We implemented the Squid system, which organizes files we took the maximum number of peers that each of our into a two-level directory structure: 16 directories in the first clients was able to connect to during the sessions. We then level and 256 subdirectories in every first-level directory. This used the maximum among all clients (in the same subnet) to is done to reduce the number of files in each directory in order compare the connectivity of clients with and without the cache. to accelerate the lookup process and to reduce the number The results are summarized in Fig. 4. Fig. 4(a) shows that of i-nodes touched during the lookup. We implemented two clients behind the cache achieved much higher download speed versions of our proposed storage system, which are denoted than other clients. This is expected as a portion of the traffic by pCache/Fs and pCache/Raw in the plots. pCache/Fs is comes from the cache. Thus, the cache would benefit the P2P implemented on top of the ext2 Linux file system by opening
  • 10. 10 a large file that is never closed. To write a segment, we move system with a large number of requests. We also needed the the file pointer to the appropriate offset and then write that stream of requests to be reproducible such that comparisons segment to the file. The offset is determined by our disk block among different storage systems are fair. To satisfy these two allocation algorithm. Reading a segment is done in a similar requirements, we collected traces from an operational P2P way. The actual writing/reading to the disk is performed by network instead of just generating synthetic traces. Notice that the ext2 file system, which might perform disk buffering and running the cache with real P2P clients (as in the previous pre-fetching. pCache/Raw is implemented on a raw partition section) would not give us enough traffic to stress the storage and it has complete control of reading/writing blocks from/to system, nor would it create identical situations across repeated the disk. pCache/Raw uses direct disk operations. experiments with different storage systems because of the high In addition to the Squid storage system, we also imple- dynamics in P2P systems. To collect this trace, we modified an mented a storage system that uses a multi-directory structure, open-source Gnutella client to run in super-peer mode and to where segments of the same object are grouped together under simultaneously connect to up to 500 other super-peers in the one directory. We denote this storage system as Multi-dir. This network (the default number of connections is up to only 16). is similar to the previous proposals on clustering correlated Gnutella was chosen because it has a two-tier structure, where web objects together for better disk performance, such as in ordinary peers connect to super peers and queries/replies are [21]. We extended Multi-dir for P2P traffic to maintain the forwarded among super peers for several hops. This enabled us correlation among segments of the same object. Finally, we to passively monitor the network without injecting traffic. Our also implemented the Cyclic Object Storage System (COSS) monitoring super peer ran continuously for several months, which has recently been proposed to improve the performance and because of its high connectivity it was able to record of the Squid proxy cache [33, Sec. 8.4]. COSS sequentially a large portion of the query and reply messages exchanged stores web objects in a large file, and wraps around when it in the Gnutella network. It observed query/reply messages in reaches the end of the file. COSS overwrites the oldest objects thousands of ASes across the globe, accounting to more than once the disk space is used up. 6,000 tera bytes of P2P traffic. We processed the collected data We isolate the storage component from the rest of the to separate object requests coming from individual ASes. We pCache system. All other components (including the traffic used the IP addresses of the receiving peers and an IP-to-AS identification, transparent proxy, and connection splicing mod- mapping tool in this separation. We chose one large AS (AS ules) are disabled to avoid any interactions with them. We also 9406) with a significant amount of P2P traffic. The created avoid interactions with the underlying operating system by trace contains: timestamp of the request, ID of the requested installing two hard drives: one for the operating system and object, size of the object, and the IP address of the receiver. the other is dedicated to the storage system of pCache. No Some of these traces were used in our previous work [11], processes, other than pCache, have access to the second hard and are available online [14]. drive. The capacity of the second hard drive is 230 GB, and The trace collected from Gnutella provides realistic object it is formatted into 4 KB blocks. The replacement policy used sizes, relative popularities, and temporal correlation among in the experiments is segment-based LRU, where the least- requests in the same AS. However, it has a limitation: in- recently used segment is evicted from the cache. formation about how objects are segmented and when exactly We subject a specific storage system (e.g., pCache/Fs) to a each segment is requested is not known. We could not obtain long trace of 1.5 million segment requests; the trace collection this information because it is held by the communicating peers and processing are described below. During the execution of and transferred directly between them without going through the trace, we periodically collect statistics on the low-level super peers. To mitigate this limitation, we divided objects operations performed on the disk. These statistics include the into segments with typical sizes, which we can know either total number of: read operations, write operations, and head from the protocol specifications or from analyzing a small movements (seek length in sectors). We also measure the trace sample of files. In addition, we generated the request times for completion time, which is the total time it takes the storage segments as follows. The timestamp in the trace marks the start system to process all requests in the trace. These low-level disk of downloading an object. A completion time for this object is statistics are collected using the blktrace tool, which is an randomly generated to represent peers with different network optional module in Linux kernels. Then, the whole process is connections and the dynamic conditions of the P2P network. repeated for a different storage system, but with the same trace The completion time can range from minutes to hours. Then, of segment requests. During these experiments, we fix the total the download time of each segment of the object is randomly size of memory buffers at 0.5 GB. In addition, because they are scheduled between the start and end times of downloading built on top of the ext2 file system, Squid and pCache/Fs have that object. This random scheduling is not unrealistic, because access to the disk cache internally maintained by the operating this is actually what is being done in common P2P systems system. Due to the intensive I/O nature of these experiments, such as BitTorrent and Gnutella. Notice that using this random Linux may substantially increase the size of the disk cache scheduling, requests for segments from different objects will by stealing memory from other parts of the pCache system, be interleaved, which is also realistic. We sort all requests for which can degrade the performance of the whole system. To segments based on their scheduled times and take the first solve this problem, we modified the Linux kernel to limit the 1.6 million requests to evaluate the storage systems. With this size of the disk cache to a maximum of 0.5 GB. number of requests, some of our experiments took two days P2P Traffic Traces. We needed to stress the storage to finish.
  • 11. 11 6 14 x 10 x 10 3.5 10 30 Squid Squid Squid Multi-dir Number of Read Operations 3 Multi-dir Multi-dir pCache/Fs 8 pCache/Fs 25 pCache/Fs Completion Time (hr) Seek Length (sectors) pCache/Raw pCache/Raw pCache/Raw 2.5 20 COSS 2 6 15 1.5 4 10 1 2 5 0.5 0 0 0 0 5 10 15 0 5 10 15 0 5 10 15 Number of Requests x 10 5 Number of Requests x 10 5 Number of Requests x 10 5 (a) Read Operations (b) Head Movements (c) Completion Time Fig. 6. Comparing the performance of the proposed storage management system on top of a raw disk partition (pCache/Raw) and on top of a large file (pCache/Fs) versus the Squid and other storage systems. Main Results. Some of our results are presented in Fig. 6 to our storage system. This is because Squid uses a hash for a segment size of 0.5 MB. Results for other segment sizes function to determine the subdirectory of each segment, which are similar. We notice that the disk I/O operations resulted destroys the locality among segments of the same object. by the COSS storage system are not reported in Figs. 6(a) This leads to many head jumps between subdirectory to serve and 6(b). This is because COSS storage system has a built-in segments of the same object. In contrast, our storage system replacement policy that leads to fewer cache hits, thus fewer performs segment merging in order to store segments of the read operations and many more write operations, than other same object near to each other on the disk, which reduces the storage systems. Since the COSS storage system results in number of head movements. different disk access pattern than that of other storage sys- Fig. 6(b) also reveals that Multi-dir can reduce the number tems, we cannot conduct a meaningful low-level comparison of head movements compared to the Squid storage system. between them in Figs. 6(a) and 6(b). We will discuss more This is because the Multi-dir storage system exploits the about the replacement policy of the COSS storage system correlation among segments of the same P2P object by storing in a moment. Fig. 6 demonstrates that the proposed storage these segments in one directory. This segment placement struc- system is much more efficient in handling the P2P traffic ture enables the underlying file system to cluster correlated than other storage systems, including Squid, Multi-dir, and segments closer as most file systems cluster files in the same COSS. The efficiency is apparent in all aspects of the disk subdirectory together. Nevertheless, the overhead of creating I/O operations: The proposed system issues a smaller number many files/directories and maintaining their i-nodes was non- of read and write operations and it requires a much smaller trivial. This can be observed in Fig. 6(a) where the average number of disk head movements. Because of the efficiency in number of read operations of Multi-dir is slightly smaller than each element, the total time required to complete the whole that of pCache/Raw when the number of cached objects is trace (Fig. 6(c)) under the proposed system is less than 5 smaller (in the warm-up period), but rapidly increases once hours, while it is 25 hours under the Squid storage system. more objects are cached. We note that the Multi-dir results in That is, the average service time per request using pCache/Fs fewer read operations than pCache/Raw in the warm-up period, or pCache/Raw is almost 1/5th of that time using Squid. This because of the Linux disk buffer and prefectch mechanism. experiment also shows that COSS and Multi-dir improves the This advantage of Multi-dir quickly diminishes when the performance of the Squid system, but they have an inferior number of i-nodes increases. performance compared to our pCache storage system. Notice We observe that COSS performs better than Squid but also that, unlike the case for Squid, Multi-dir, and COSS, the worse than Multi-dir in Fig. 6(c). The main cause of its bad average time per request of our proposed storage system does performance is because COSS uses the large file to mimic an not rapidly increase with number of requests. Therefore, the LRU queue for object replacement, which requires it to write proposed storage system can support more concurrent requests, a segment to the disk whenever there is a cache hit. This not and it is more scalable than other storage systems, including only increases the number of write operations but also reduces the widely-deployed Squid storage system. the effective disk utilization, because a segment may be stored Because it is optimized for web traffic, Squid anticipates a on the disk several times although only one of these copies large number of small web objects to be stored in the cache, (the most recent one) is accessible. We plot the effective disk which justifies the creation of many subdirectories to reduce utilization of all considered storage systems in Fig. 7, where the search time. Objects in P2P systems, on the other hand, storage systems except COSS employ a segment-based LRU have various sizes and can be much larger. Thus the cache with high/low-watermarks at 90% and 80%, respectively. In could store a smaller number of P2P objects. This means this figure, we observe that the disk utilization for COSS is that maintaining many subdirectories may actually add more less than 50% of that for other storage systems. Since COSS overhead to the storage system. In addition, as Fig. 6(b) shows, tightly couples the object replacement policy with the storage Squid requires a larger number of head movements compared system, it is not desirable for P2P proxy caches, because
  • 12. 12 100 and we stress the storage system by continuously submitting requests. We measure the average throughput (in Mbps) of the Disk Utilization (%) 80 storage system every eight minutes, and we plot the average results in Fig. 8. We ignore the first one hour (warm-up 60 period), because the cache was empty and few data swapping 40 operations between memory and disk occur. The figure shows Squid that in the steady state an average throughput of more than Multi-dir 20 pCache/Fs 300 Mbps can easily be provided by our cache running on pCache/Raw a commodity PC. This kind of throughput is probably more COSS 0 0 5 10 15 than enough for the majority of customer ASes such as Number of Requests x 10 5 universities and small ISPs, because the total capacities of their Internet access links are typically smaller than the maximum Fig. 7. Disk utilization for various storage systems. throughput that can be achieved by pCache. For large ISPs with Gbps links, a high-end server with high-speed disk array could be used. Notice that our pCache code is not highly their performance may benefit from replacement policies that optimized for performance. Notice also that the 300 Mbps capitalize on the unique characteristics of P2P traffic, such as is the throughput for P2P traffic only, not the whole Internet the ones observed in [6], [8]. traffic, and it represents a worst-case performance because the Additional Results and Comments. We analyzed the disk is continuously stressed. performance of pCache/Fs and pCache/Raw, and compared them against each other. As mentioned before, pre-fetching by the Linux file system may negatively impact the perfor- 7.5 Evaluation of the Inference Algorithm mance of pCache/Fs. We verified this by disabling this pre- We empirically evaluate the algorithm proposed in Sec. 4 for fetching using the hdparm utility. By comparing pCache/Fs inferring the piece length in BitTorrent traffic. We deployed ten versus pCache/Raw using variable segment sizes, we found CTorrent clients on the machines behind the cache. Each client that pCache/Fs outperforms pCache/Raw when the segment was given a few thousands torrent files. For each torrent file, sizes are small (16KB or smaller). pCache/Raw, however, the client contacted a BitTorrent tracker to get potential peers. is more efficient for larger segment sizes. This is because The client re-connected to the tracker if the number of peers when segment sizes are small, the buffering performed by dropped below 10 to obtain more peers. After knowing the the Linux file system helps pCache/Fs by grouping several peers, the client started issuing requests and receiving traffic. small read/write operations together. pCache/Raw bypasses The cache logged all requests during all download sessions. this buffering and uses direct disk operations. The benefit of Many of the sessions did not have any traffic exchanged, buffering diminishes as the segment size increases. Therefore, because there were not enough active peers trading pieces of we recommend using pCache/Fs if pCache will mostly serve those objects anymore, which is normal in BitTorrent. These P2P traffic with small segments such as BitTorrent. If the sessions were dropped after a timeout period. We ran the traffic is dominated by larger segments, as in the case of experiment for several days. The cache collected information Gnutella, pCache/Raw is recommended. from more than 2,100 download sessions. We applied the inference algorithm on the logs and we compared the estimated 7.4 Scalability of pCache piece lengths against the actual ones (known from the torrent To analyze the scalability of pCache, ideally we should deploy files). thousands of P2P clients and configure them to incrementally The results of this experiment are presented in Fig. 9. request objects from different P2P networks. While this would The x-axis shows the number of samples taken to infer test the whole system, it was not possible to conduct in the piece length, and the y-axis shows the corresponding our university setting, because it would have consumed too accuracy achieved. The accuracy is computed as the number much bandwidth. Another less ideal option is to create many of correct inferences over the total number of estimations, clients and connect them in local P2P networks. However, which is 2,100. We plot the theoretical results computed emulating the high dynamic behavior of realistic P2P networks from Eqs. (1) and (2) and the actual results achieved by is not straightforward, and will make our results questionable our algorithm with and without the improvement heuristic. no matter what we do. Instead, we focus on the scalability The figure shows that the performance of our basic algorithm of the slowest part, the bottleneck, of pCache, which is the using actual traffic is very close to the theoretical results, storage system according to previous works in the literature which validates our analysis in Sec. 4. In addition, the simple such as [34]. All other components of pCache perform simple heuristic improved the inference algorithm significantly. The operations on data stored in the main memory, which is orders improved algorithm infers the piece length correctly in about of magnitude faster than the disk. We acknowledge that this 99.7% of the cases, which is done by using only 6 samples is only a partial scalability test, but we believe it is fairly on average. From further analysis, we found that the samples representative. used by the algorithm correspond to less than 3% of each We use the large trace of 1.5 million requests used in the object on average, which shows how fast the inference is done previous section. We run the cache for about seven hours, in terms of the object’s size. Keeping this portion small is