SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
BITTORRENT
                                     Seminar Report

                   Submitted in partial fulfilment of the requirements
                                   for the award of the degree of
                                     Bachelor of Technology
                                                 in
                              Computer Science Engineering
                                                 of
                    Cochin University Of Science And Technology
                                                by

                                   SHYAM PRAKASH
                                          (12080079)




                        DIVISION OF COMPUTER SCIENCE
                              SCHOOL OF ENGINEERING
        COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
                                         KOCHI-682022
                                       SEPETEMBER 2010



Division of Computer Engineering                                         Page 1
DIVISION OF COMPUTER SCIENCE
                             SCHOOL OF ENGINEERING
     COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
                                     KOCHI-682022




                                    Certificate

          Certified that this is a bonafide record of the seminar entitled
                                    “BITTORRENT”
                            presented by the following student
                                   “SHYAM PRAKASH”


 of the VII semester, Computer Science and Engineering in the year 2010
     in partial fulfillment of the requirements in the award of Degree of
 Bachelor of Technology in Computer Science and Engineering of Cochin
                         University of Science and Technology.


    Ms. SHEKHA S                                         Dr. DAVID PETER S
    SEMINAR GUIDE                                         HEAD OF DIVISION




Division of Computer Engineering                                             Page 2
ACKNOWLEDGEMENT


I thank GOD almighty for guiding me throughout the seminar. I would like to thank all those
who have contributed to the completion of the seminar and helped me with valuable
suggestions for improvement.
I am extremely grateful to Dr. David Peter, Head Of Division, Division of Computer
Science, for providing me with best facilities and atmosphere for the creative work guidance
and encouragement. I would like to thank my        coordinator Mr.Sudeep Elayidom and
Seminar guide Ms. Shekha S Lecturer, Division of Computer Science, for all help and
support extend to me. I thank all Staff members of my college and friends for extending their
cooperation during my seminar.


Above all I would like to thank my parents without whose blessings, I would not have been
able to accomplish my goal.




                                                                   SHYAM PRAKASH




Division of Computer Engineering                                                      Page 3
ABSTRACT


BitTorrent is the name of a peer-to-peer (P2P) file distribution protocol, and is
the name of a free software implementation of that protocol. The protocol was
originally designed and created by programmer Bram Cohen, and is now
maintained by BitTorrent Inc. BitTorrent is designed to distribute large amounts
of data widely without incurring the corresponding consumption in costly server
and bandwidth resources. CacheLogic suggests that BitTorrent traffic accounts
for 55% of all traffic on the Internet, while other sources are skeptical.The
original BitTorrent client was written in Python. Its source code, as of version
has been released under the BitTorrent Open Source License, which is a
modified version of the Jabber Open Source License. There are numerous
compatible clients, written in a variety of programming languages, and running
on a variety of computing platforms.




Division of Computer Engineering                                           Page 4
Table contents                                                                          page no

CHAPTER 1                                  ------------------------------------------------------- 1
INTRODUCTION                            -------------------------------------------------------1
   1.1 OVERVIEW                                  -------------------------------------------------- 1
   1.2 HISTORY                                ----------------------------------------------------- 1
CHAPTER 2                               ----------------------------------------------------------- 2
BITTORRENT AND OTHER APPROACHES                                              ----------------------- 3
   2.1 OTHER P2P METHODS                                             ---------------------------------- 3
   2.2 A TYPICAL HTTP FILE TRANSFER                                                   ---------------- 3
   2.3 THE DAP METHOD                       --------                ----------------------------------- 4
   2.4 THE BITTORRENT APPROACH                                                     -------------------- 5
CHAPTER 3                                  ------------------------------------------------------- .6
WORKING OF BITTORRENT                                                  -------------------------------- 8
CHAPTER 4                               ----------------------------------------------------------- .8
TERMINOLOGY                                     ------------------------------------------------- 12
CHAPTER 5                                  ------------------------------------------------------- 12
ARCHITECTURE OF BITTORRENT                                                     ---------------------- .14
 5.1 METAINFO FILE                                      ------- ------------------------------------ 14
     5.1.1 BENCODING :                                 ------- ------------------------------------- 15
     5.1.2 METAINFO FILE DISTRIBUTION                                          -----------------------16
 5.2 TRACKER .                                  --------- ----------------------------------------17
     5.2.1 SCRAPING                            ---------- --------------------------------------- 18
 5.3 PEERS                              ---------------- ---------------------------------------- 20
     5.3.1 PIECE SELECTION                                        ----------------------------------- 21
     5.3.2 RANDOM FIRST PIECE                                            ---------------------------- 21
     5.3.3 RAREST FIRST                                    ---------------------------------------- 22
     5.3.4 ENDGAME MODE                                     -      ------------------------------------22
     5.3.5 PEER DISTRIBUTION                                          ------------------------------- 22
     5.3.6 CHOKING                                ----------------------------------------------- 22
     5.3.7 OPTIMISTIC UNCHOKING                                               ------------------------23
     5.3.8 COMMUNICATION BETWEEN PEERS                                                       ------- 24
     5.3.9 HANDSHAKING                                          -------------------------------------- 24

Division of Computer Engineering                                                                 Page 5
5.3.10 MESSAGE STREAM                                       ------------------------------- 24
  5.4 DATA                              --------------------------------------------------------- 27
     5.4.1 PIECE SIZE                               --------------------------------------------- 27
 5.5 BITTORRENT CLIENTS                                        --------------------------------- 28
 5.6 SUB PROTOCOLS :                                    ---------------------------------------- 29
    5.6.1 THP: TRACKER HTTP PROTOCOL                                                ------------ 29
    5.6.2 PWP: PEER WIRE PROTOCOL                                            ------------------- 31
CHAPTER 6                           ------------------------------------------------------------- .34
VULNERABILITIES OF BITTORRENT                                     ------------------------------ .34
   6.1ATTACKS ON BITTORRENT                                          --------------------------- 34
      6.1.1 POLLUTION ATTACK                                      ------------------------------ 34
      6.1.2 DDOS ATTACK                                   --------------------------------------- 34
      6.1.3 BANDWIDTH SHAPING                                         --------------------------- 35
      6.2 SOLUTIONS                              ------------------------------------------------ 35
      6.2.1 POLLUTION ATTACK                                      ------------------------------ 35
      6.2.2 DDOS ATTACK                                   --------------------------------------- 37
      6.2.3 BANDWIDTH SHAPING                                         --------------------------- 39
CHAPTER 7                                -------------------------------------------------------- 40
CONCLUSION                                  ----------------------------------------------------- 40
CHAPTER 8                              --------------------------------------------------------- .41
REFERENCES                                  ------------------------------------------------------41




Division of Computer Engineering                                                            Page 6
Chapter 1
                                   INTRODUCTION


1.1 Overview

        BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts
of data. BitTorrent is one of the most common protocols for transferring large files. Its main
usage is for the transfer of large sized files. It makes transfer of such files easier by
implementing a different approach. A user can obtain multiple files simultaneously without
any considerable loss of the transfer rate. It is said to be a lot better than the conventional file
transfer methods because of a different principle that is followed by this protocol. It also
evens out the way a file is shared by allowing a user not just to obtain it but also to share it
with others. This is what has made a big difference between this and the conventional file
transfer methods. It makes a user to share the file he is obtaining so that the other users who
are trying to obtain the same file would find it easier and also in turn making these users to
involve themselves in the file sharing process. Thus the larger the number of users the more
is the demand and more easily a file can be transferred between them.
        BitTorrent protocol has been built on a technology which makes it possible to
distribute large amounts of data without the need of a high capacity server, and expensive
bandwidth. This is the most striking feature of this file transfer protocol. The transferring of
files will never depend on a single source which is supposed the original copy of the file but
instead the load will be distributed across a number of such sources. Here not just the sources
are responsible for file transfer but also the clients or users who want to obtain the file are
involved in this process. This makes the load get distributed evenly across the users and thus
making the main source partially free from this process which will reduce the network traffic
imposed on it. Because of this, BitTorrent has become one of the most popular file transfer
mechanisms in today’s world. Though the mechanism itself is not as simple as an ordinary
file transfer protocol, it has gained its popularity because of the sharing policy that it imposes
on its users. This fact is quite obvious, since the recent surveys made by various
organizations show that 35% of the overall internet traffic is because of BitTorrent. This
shows that the amount of files that are being transferred and shared by users through
BitTorrent is very huge.



Division of Computer Engineering                                                             Page 7
1.2 History
        BitTorrent was created by a programmer named Bram Cohen. After inventing this
new technology he said, "I decided I finally wanted to work on a project that people would
actually use, would actually work and would actually be fun". Before this was invented, there
were other techniques for file sharing but they were not utilizing the bandwidth effectively.
The bandwidth had become a bottleneck in such methods. Even other peer to peer file sharing
systems like Napster and Kazaa had the capability of sharing files by making the users
involve in the sharing process, but they required only a subset of users to share the files not
all. This meant that most of the users can simply download the files without being needed to
upload. So this again put a lot of network load on the original sources and on small number of
users. This led to inefficient usage of bandwidth of the remaining users. This was the main
intention behind Cohen’s invention, i.e., to make the maximum utilization of all the users’
bandwidth who are involved in the sharing of files. By doing so, every person who wants to
download a file had to contribute towards the uploading process also. This new and novel
concept of Cohen gave birth to a new peer to peer file sharing protocol called BitTorrent.
Cohen invented this protocol in April 2001. The first usable version of BitTorrent appeared in
October 2002, but the system needed a lot of fine-tuning. BitTorrent really started to take off
in early 2003 when it was used to distribute a new version of Linux and fans of Japanese
anime started relying on it to share cartoons. The most important part of this protocol that
matters a lot about this is that it makes it possible for people with limited bandwidth to supply
very popular files. This means that if you are a small software developer you can put up a
package, and if it turns out that millions of people want it, they can get it from each other in
an automated way. Thus the bandwidth which used to be a bottleneck in previous systems no
longer poses a problem.




Division of Computer Engineering                                                          Page 8
Chapter 2
                  BITTORRENT AND OTHER APPROACHES

2.1 Other P2P methods
    The most common method by which files are transferred on the Internet is the client-
server model. A central server sends the entire file to each client that requests it, this is how
both http and ftp work. The clients only speak to the server, and never to each other. The
main advantages of this method are that it's simple to set up, and the files are usually always
available since the servers tend to be dedicated to the task of serving, and are always on and
connected to the Internet. However, this model has a significant problem with files that are
large or very popular, or both. Namely, it takes a great deal of bandwidth and server
resources to distribute such a file, since the server must transmit the entire file to each client.
Perhaps you may have tried to download a demo of a new game just released, or CD images
of a new Linux distribution, and found that all the servers report "too many users," or there is
a long queue that you have to wait through. The concept of mirrors partially addresses this
shortcoming by distributing the load across multiple servers. But it requires a lot of
coordination and effort to set up an efficient network of mirrors, and it's usually only feasible
for the busiest of sites.
        Another method of transferring files has become popular recently: the peer-to-peer
network, systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In most of these
networks, ordinary Internet users trade files by directly connecting one-to-one. The advantage
here is that files can be shared without having access to a proper server, and because of this
there is little accountability for the contents of the files. Hence, these networks tend to be
very popular for illicit files such as music, movies, pirated software, etc. Typically, a
downloader receives a file from a single source, however the newest version of some clients
allow downloading a single file from multiple sources for higher speeds. The problem
discussed above of popular downloads is somewhat mitigated, because there's a greater
chance that a popular file will be offered by a number of peers. The breadth of files available
tends to be fairly good, though download speeds for obscure files tend to be low. Another
common problem sometimes associated with these systems is the significant protocol
overhead for passing search queries amongst the peers, and the number of peers that one can
reach is often limited as a result. Partially downloaded files are usually not available to other
peers, although some newer clients may offer this functionality. Availability is generally


Division of Computer Engineering                                                            Page 9
dependent on the goodwill of the users, to the extent that some of these networks have tried to
enforce rules or restrictions regarding send/receive ratios.Use of the Usenet binary
newsgroups is yet another method of file distribution, one that is substantially different from
the other methods. Files transferred over Usenet are often subject to miniscule windows of
opportunity. Typical retention time of binary news servers are often as low as 24 hours, and
having a posted file available for a week is considered a long time. However, the Usenet
model is relatively efficient, in that the messages are passed around a large web of peers from
one news server to another, and finally fanned out to the end user from there. Often the end
user connects to a server provided by his or her ISP, resulting in further bandwidth savings.
Usenet is also one of the more anonymous forms of file sharing, and it too is often used for
illicit files of almost any nature. Due to the nature of NNTP, a file's popularity has little to do
with its availability and hence downloads from Usenet tend to be quite fast regardless of
content. The downsides of this method include a set of rules and procedures, and requires a
certain amount of effort and understanding from the user. Patience is often required to get a
complete file due to the nature of splitting big files into a huge number of smaller posts.
Finally, access to Usenet often must be purchased due to the extremely high volume of
messages in the binary groups.
        BitTorrent is closest to Usenet. It is best suited to newer files, of which a number of
people have interest in. Obscure or older files tend to not be available. Perhaps as the
software matures a more suitable means of keeping torrents seeded will emerge, but currently
the client is quite resource-intensive, making it cumbersome to share a number of files.
BitTorrent also deals well with files that are in high demand, especially compared to the other
methods.



2.2 A Typical HTTP File Transfer
  The most common type of file transfer is through a HTTP server. In this method, a HTTP
server listens to the client’s requests and serves them. Here the client can only depend on the
lone server that is providing the file. The overall download scheme will be limited to the
limitations of that server. Also this kind of transfer of file is subjected to single point of
failure, where if the server crashes then the whole download process will seize. A single
server can handle many such clients and serve the requested file simultaneously to all the
clients. The file being served will be available as one single piece, which means that if the
download process stops abruptly in the middle the whole file has to be downloaded again.

Division of Computer Engineering                                                          Page 10
BitTorrent protocol has overcome all these shortcomings seen in this type and thus it is more
robust due to which it is chosen by many people over this traditional method of file transfer.




        Fig 2.1 : HTTP/FTP File Transfer



2.3 The DAP method

        Download Accelerator Plus (DAP) is the world's most popular download accelerator.
DAP's key features include the ability to accelerate downloading of files in FTP and HTTP
protocols, to pause and resume downloads, and to recover from dropped internet connections.
On the Internet the same file is often hosted on numerous mirror sites, such as at universities
and on ISP servers. DAPimmediately senses when a user begins downloading a file and
identifies available mirror sites that host the requested file. As soon as it is
triggered, DAP's client side optimization begins to determine - in real time - which mirror
sites offer the fastest response for the specific user's location. The file is downloaded in
several segments simultaneously through multiple connections from the most responsive
server(s) and reassembled at the user's PC. This results in better utilization of the user's
available bandwidth.
        This ensures that each available mirror server is utilized to serve the users that most
benefit. This in turn effects an efficient balancing of the load among available
servers across the entire World Wide Web, and reduces download times for users while
allowing them to receive maximum benefit from their available bandwidth. DAP'sResume

Division of Computer Engineering                                                          Page 11
functionality and the ability to continue downloading even when one of the participating
connections has dropped also provides users with a more reliable download experience.




2.4 The BitTorrent Approach

        In BitTorrent, the data to be shared is divided into many equal-sized portions called
pieces. Each piece is further sub-divided into equal-sized sub-pieces called blocks. All clients
interested in sharing this data are grouped into a swarm, each of which is managed by a
central entity called the tracker. BitTorrent has revolutionized the way files are shared
between people. It does not require a user to download a file completely from a single server.
Instead a file can be downloaded from many such users who are indeed downloading the
same file. A user who has the complete file, called the seed will initiate the download by
transferring pieces of file to the users. Once a user has some considerable number of such
pieces of a file then even he can start sharing them with other users who are yet to receive
those pieces. This concept enables a client not to depend on a server completely and also it
reduces overall load on the server.




        Fig 2.2 : BitTorrent File Transfer


        Each client independently sends a file, called a torrent, that contains the location of
the tracker along with a hash of each piece. Clients keep each other updated on the status of
their download. Clients download blocks from other (randomly chosen) clients who claim
they have the corresponding data. Accordingly, clients also send data that they have
previously downloaded to other clients. Once a client receives all the blocks for a given

Division of Computer Engineering                                                        Page 12
piece, he can verify the hash of that piece against the provided hash in the torrent. Thus once
a client has downloaded and verified all pieces, he can be confident that he has the complete
data.
        Both BitTorrent and DAP download files from multiple sources. Also the files are
divided into pieces in both approaches. But BitTorrent has many such features that DAP
doesn’t, which has made it the most popular one. In BitTorrent the users participate actively
in sharing files along with servers. This is the uniqueness of this protocol. Also this needs an
implementation of a dedicated server called tracker to handle the peers connected in the
network. The file transfer in DAP takes place through the traditional HTTP or FTP protocol
which means that the transfer rate will always be limited by the server’s bandwidth. If these
servers are flooded with requests then the breakdown and the transaction will terminate. This
is not the case in BitTorrent since the whole process is not depending on servers alone. The
load is distributed across the network between peers and servers. This makes BitTorrent far
better than its competing peers like DAP and others.




Division of Computer Engineering                                                        Page 13
Chapter 3
                          WORKING OF BITTORRENT

        As previously explained, BitTorrent’s design makes it extremely efficient in the
sharing of large data files among interested peers. Looking under the hood, BitTorrent is a
protocol with some complexity where modeling is useful to gain a better understanding of its
performance. BitTorrent scales well and is a superior method for transferring and
disseminating files between interested peers while limiting free riding (peers who download
but do not upload) between those same peers. BitTorrent’s is based on a “tit for tat”
reciprocity agreement between users that ultimately results in pareto efficiency. Pareto
efficiency is an important economic concept that maximizes resource allocation among peers
to their mutual advantage. Pareto efficiency is the crown jewel of BitTorrent and is the
driving force behind the protocol’s popularity and success. Cohen’s vision of peers
simultaneously helping each other by uploading and downloading has been realized by the
BitTorrent system.




        Fig 3.1 : A Typical BitTorrent System




        The protocol shares data through what are known as torrents. For a torrent to be alive
or active it must have several key components to function. These components include a
tracker server, a .torrent file, a web server where the .torrent file is stored and a complete
copy of the file being exchanged. Each of these components is described in the following

Division of Computer Engineering                                                      Page 14
paragraphs.The file being exchanged is the essence of the torrent and a complete copy is
referred to as a seed. A seed is a peer in the BitTorrent network willing to share a file with
other peers in the network. Why seed owners choose to share their files is debatable, as the
BitTorrent protocol does not reward seed behavior. In fact, some researchers believe the
protocol lacks any incentive mechanism for encouraging seeds to remain in torrents. Some
argue that the lack of incentive in the protocol is a fundamental design flaw that leads to the
punishment of seeds.
        Peers lacking the file and seeking it from seeds are called leechers. While seeds only
upload to leechers, leechers may both download from seeds and upload to other leechers.
BitTorrent’s protocol is designed so leeching peers seek each other out for data transfer in a
process known as “optimistic unchoking”. Together seeds and leechers engaged in file
transfer are referred to as a swarm. A swarm is coordinated by a tracker server serving the
particular torrent and interested peers find the tracker via metadata known as a .torrent file.
Since BitTorrent has no built in search functionality, .torrent files are usually located via
HTTP through search engines or trackers.
        The first step in the BitTorrent exchange occurs when a peer downloads a .torrent file
from a server. The role of .torrent files is to provide the metadata that allows the protocol to
function; .torrent files can be viewed as surrogates for the files being shared. These .torrent
files contain key pieces of data to function correctly including file length, assigned name,
hashing information about the file and the URL of the tracker coordinating the torrent
activity. Torrent files can be created using a program such as MakeTorrent, another open
source tool available under the free software model.
        When a .torrent file is opened by the peer’s client software, the peer then connects to
the tracker server responsible for coordinating activity for that specific torrent. The tracker
and client communicate by a protocol layered on top of HTTP and the tracker’s key role is to
coordinate peers seeking the same file for Cohen envisioned “The tracker’s responsibilities
are strictly limited to helping peers find each other”. In reality the tracker’s role is a bit more
complex as many trackers collect data about peers engaged in a swarm. Additionally, some of
the newer tracker software being released has integrated the functions of the tracker and
.torrent server.
        Leechers and seeds are coordinated by the tracker server and the peers periodically
update the tracker on their status allowing the tracker to have a global view of the system.
The data monitored by the tracker can include peer IP addresses, amount of data
uploaded/downloaded for specific peers, data transfer rates among peers, the percentage of

Division of Computer Engineering                                                          Page 15
the total file downloaded, length of time connected to the tracker, and the ratio of sharing
among peers. Usually a tracker coordinates multiple torrents and the most popular trackers
are busy coordinating thousands of swarms simultaneously.
        It should be noted that .torrent files are not the actual file being shared; rather .torrent
files are the metadata information which allow which trackers and peers to coordinate their
activities. As previously mentioned, the complete file is actually stored on peer seed nodes
and not the tracker server. Since .torrent files are small and require little space to store, one
server can easily host thousands of .torrent files without prohibitive server or bandwidth
requirements. There is some issue with bandwidth usage to host a tracker, however,
especially if the tracker becomes popular and begins to see heavy usage. Regardless, the
tracker’s bandwidth requirements are much less than hosting the complete files in a
traditional client-server model such as one would encounter with an FTP site. While trackers
and .torrent files serve as mechanisms to assist the BitTorrent protocol, the process of
actually transferring data is handled by the peers engaged in the swarm. Cohen’s vision of “tit
for tat” is the sole incentive measure he saw necessary for the protocol’s success. Peers seek
tit for tat behavior from others and discourage free riding by a “choke/unchoke” policy. This
choke policy uses a process known as “optimistic unchoking” to constantly seek other swarm
peers who may have more beneficial connections to offer an interested peer. There has been
some research of the tit for tat algorithm by modeling rational users whose behavior is then
studied. This work defined rational users as those peer nodes manipulating their client
software beyond default settings. The fact that many newer BitTorrent clients allow for
custom tweaking of specific upload or download speed indicates that perhaps the original tit
for tat coding was too good, and thus detrimental to other peer node functions such as normal
HTTP traffic. Some BitTorrent FAQs recommend limiting uploads to approximately 80% of
known capacity and personal tests indicate this strategy does benefit download speeds. The
final important aspect of the BitTorrent protocol’s architecture is its use of a “rarest piece
first” algorithm when a peer begins a file download. The rarest first algorithm has as its goal
the uniform distribution of data across peers, also known as the “endgame mode”. A rarest
first policy requires a seed to upload new file chunks (those not yet uploaded to a swarm) to
the newest peer connecting to a torrent. This policy encourages distribution of the file further
across peer nodes.. The rarest first algorithm is an interesting aspect of BitTorrent that when
combined with optimistic unchoking may explain why the protocol has achieved such
success.


Division of Computer Engineering                                                           Page 16
Chapter 4
                                    TERMINOLOGY

        These are the common terms that one would come across while making a typical
BitTorrent file transfer.


                Torrent : this refers to the small metadata file you receive from the web server
                (the one that ends in .torrent.) Metadata here means that the file contains
                information about the data you want to download, not the data itself.
                Peer : A peer is another computer on the internet that you connect to and
                transfer data. Generally a peer does not have the complete file.
                Leeches : They are similar to peers in that they won’t have the complete file.
                But the main difference between the two is that a leech will not upload once
                the file is downloaded.
                Seed : A computer that has a complete copy of a certain torrent. Once a client
                downloads a file completely, he can continue to upload the file which is called
                as seeding. This is a good practice in the BitTorrent world since it allows other
                users to have the file easily.
                Reseed : When there are zero seeds for a given torrent, then eventually all the
                peers will get stuck with an incomplete file, since no one in the swarm has the
                missing pieces. When this happens, a seed must connect to the swarm so that
                those missing pieces can be transferred. This is called reseeding.
                Swarm : The group of machines that are collectively connected for a particular
                file.
                Tracker : A server on the Internet that acts to coordinate the action of
                BitTorrent clients. The clients are in constant touch with this server to know
                about the peers in the swarm.
                Share ratio : This is ratio of amount of a file downloaded to that of uploaded.
                A ratio of 1 means that one has uploaded the same amount of a file that has
                been downloaded.
                Distributed copies : Sometimes the peers in a swarm will collectively have a
                complete file. Such copies are called distributed copies.



Division of Computer Engineering                                                         Page 17
Choked : It is a state of an uploader where he does not want to send anything
                on his link. In such cases, the connection is said to be choked.
                Interested : This is the state of a downloader which suggests that the other end
                has some pieces that the downloader wants. Then the downloader is said to be
                interested in the other end.
                Snubbed : If the client has not received anything after a certain period, it
                marks a connection as snubbed, in that the peer on the other end has chosen
                not to send in a while.
                Optimistic unchoking : Periodically, the client shakes up the list of uploaders
                and tries sending on different connections that were previously choked, and
                choking the connections it was just using. This is called optimistic unchoking.




Division of Computer Engineering                                                        Page 18
Chapter 5
                     ARCHITECTURE OF BITTORRENT


    The BitTorrent protocol can be split into the following five main components:

        Metainfo File - a file which contains all details necessary for the protocol to operate.
        Tracker - A server which helps manage the BitTorrent protocol.
        Peers - Users exchanging data via the BitTorrent protocol.
        Data - The files being transferred across the protocol.
        Client - The program which sits on a peers computer and implements the protocol.

    Peers use TCP (Transport Control Protocol) to communicate and send data. This protocol
is preferable over other protocols such as UDP (User Datagram Protocol) because TCP
guarantees reliable and in-order delivery of data from sender to receiver. UDP cannot give
                           order
such guarantees, and data can become scrambled, or lost all together.
   h




                               Fig 5.1 : Architecture of a BitTorrent System

The tracker allows peers to query which peers have what data, and allows them to begin
communication. Peers communicate with the tracker via the plain text via HTTP (Hypertext


Division of Computer Engineering                                                         Page 19
Transfer Protocol) The following diagram illustrates how peers interact with each other, and
also communicate with a central tracker




5.1 Metainfo File

    When someone wants to publish data using the BitTorrent protocol, they must create a
metainfo file. This file is specific to the data they are publishing, and contains all the
information about a torrent, such as the data to be included, and IP address of the tracker to
connect to. A tracker is a server which 'manages' a torrent, and is discussed in the next
section. The file is given a '.torrent' extension, and the data is extracted from the file by a
BitTorrent client. This is a program which runs on the user computer, and implements the
bittorrent protocol. Every metainfo file must contain the following information, (or 'keys'):

    •   info: A dictionary which describes the file(s) of the torrent. Either for the single file,
        or the directory structure for more files. Hashes for every data piece, in SHA 1 format
        are stored here.
    •   announce: The announce URL of the tracker as a string

The following are optional keys which can also be used:

    •   announce-list: Used to list backup trackers
    •   creation date: The creation time of the torrent by way of UNIX time stamp (integer
        seconds since 1-Jan-1970 00:00:00 UTC)
    •   comment: Any comments by the author
    •   created by: Name and Version of programme used to create the metainfo file

These keys are structured in the metainfo file as follows:



{'info': {'piece length': 131072, 'length': 38190848L, 'name':
'Cory_Doctorow_Microsoft_Research_DRM_talk.mp3', 'pieces':
'xcbxfazrx9bxe1x9axe1x83x91~xed@.....', } 'announce':
'http://tracker.var.cc:6969/announce', 'creation date': 1089749086L }




Division of Computer Engineering                                                         Page 20
Instead of transmitting the keys in plan text format, the keys contained in the metainfo
file are encoded before they are sent. Encoding is done using bittorrent specific method
known as 'bencoding'.




5.1.1 Bencoding :

        Bencoding is used by bittorrent to send loosely structured data between the BitTorrent
client and a tracker. Bencoding supports byte strings, integers, lists and dictionaries.
Bencoding uses the beginning delimiters 'i' / 'l' / 'd' for integers, lists and dictionaries
respectively. Ending delimiters are always 'e'. Delimiters are not used for byte strings.

Bencoding Structure:

    •   Byte Strings : <string length in base ten ASCII> : <string data>
    •   Integers: i<base ten ASCII>e
    •   Lists: l<bencoded values>e
    •   Dictionaries: d<bencoded string><bencoded element>e

        Minus integers are allowed, but prefixing the number with a zero is not permitted.
However '0' is allowed.

Examples of bencoding:



4:spam // represents the string "spam"
i3e // represents the integer "3"
l4:spam4:eggse // represents the list of two strings: ["spam","eggs"]
d4:spaml1:a1:bee // represents the dictionary {"spam" => ["a" , "b"] }

5.1.2 Metainfo File Distribution :

Because all information which is needed for the torrent is included in a single file, this file
can easily be distributed via other protocols, and as the file is replicated, the number of peers
can increase very quickly. The most popular method of distribution is using a public indexing
site which hosts the metainfo files. A seed will upload the file, and then others can download
a copy of the file over the HTTP protocol and participate in the torrent.

Division of Computer Engineering                                                            Page 21
5.2 Tracker

         A tracker is used to manage users participating in a torrent (know as peers). It stored
statistics about the torrent, but its main role is allow peers to 'find each other' and start




                                             Fig 5.2 : Tracker




communication, i.e. to find peers with the data they require. Peers know nothing of each other
until a response is received from the tracker. Whenever a peer contacts the tracker, it reports
which pieces of a file they have. That way, when another peer queries the tracker, it can
provide a random list of peers who are participating in the torrent, and have the required
piece.

    A tracker is a HTTP/HTTPS service and typically works on port 6969. The address of the
tracker managing a torrent is specified in the metainfo file, a single tracker can manage
                                                                single

Division of Computer Engineering                                                                Page 22
multiple torrents. Multiple trackers can also be specified, as backups, which are handled by
the BitTorrent client running on the users computer. BitTorrent clients communicate with the
tracker using HTTP GET requests, which is a standard CGI method. This consists of
appending a "?" to the URL, and separating parameters with a "&".The parameters accepted
by the tracker are:

    •   info_hash: 20-byte SHA1 hash of the info key from the metainfo file.
    •   peer_id: 20-byte string used as a unique ID for the client.
    •   port: The port number the client is listed on.
    •   uploaded: The total amount uploaded since the client sent the 'started' event to the
        tracker in base ten ASCII.
    •   downloaded: The total amount downloaded since the client sent the 'started' event to
        the tracker in base ten ASCII.
    •   left: The number of bytes the client till has to download, in base ten ASCII.
    •   compact: Indicates that the client accepts compacted responses. The peer list can then
        be replaced by a 6 bytes per peer. The first 4 bytes are the host, and the last 2 bytes
        are port.
    •   event: If specified, must be one of the following: started, stopped, completed.
    •   ip: (optional) The IP address of the client machine, in dotted format.
    •   numwant: (optional) The number of peers the client wishes to receive from the
        tracker.
    •   key: (optional) Allows a client to identify itself if their IP address changes.
    •   trackerid: (optional) If previous announce contained a tracker id, it should be set
        here.

The tracker then responds with a "text/plain" document with the following keys:

    •   failure message: If present, then no other keys are included. The value is a human
        readable error message as to why the request failed.
    •   warning message: Similar to failure message, but response still gets processed.
    •   interval: The number of seconds a client should wait between sending regular
        requests to the tracker.
    •   min interval: Minimum announce interval.
    •   tracker id: A string that the client should send back with its next announce.
    •   complete: Number of peers with the complete file.

Division of Computer Engineering                                                          Page 23
•   incomplete: number of non-seeding peers (leechers)
    •   peers: A list of dictionaries including: peer id, IP and ports of all the peers.




5.2.1 Scraping

        Scraping is the process of querying the state of a given torrent (or all torrents) that the
tracker is managing. The result is known as a "scrape page". To get the scrape, you must start
with the announce URL, find the last '/' and if the text immediately following the '/' is
'announce', then this can be substituted for 'scrape' to find the scrape page.

Examples:


 Announce URL                                               Scrape URL




 http://example.com/annnounce                               http://example.com/scrape


 http://example.com/a/annnounce                             http://example.com/a/scrape


 http://example.com/announce.php                            http://example.com/scrape.php




The tracker then responds with a "text/plain" document with the following bencoded keys:

    •   files: A dictionary containing one key pair for each torrent. Each key is made up of a
        20-byte binary hash value. The value of that key is then a nested dictionary with the
        following keys:
    •   complete: number of peers with the entire file (seeds)
    •   downloaded: total number of times the entire file has been downloaded.
    •   incomplete: the number of active downloaders (lechers)
    •   name: (optional) the torrent name




Division of Computer Engineering                                                            Page 24
5.3 Peers

        Peers are other users participating in a torrent, and have the partial file, or the
complete file (known as a seed). Pieces are requested from peers, but are not guaranteed to be
sent, depending on the status of the peer. BitTorrent uses TCP (Transmission Control
Protocol) ports 6881-6889 to send messages and data between peers, and unlike other
protocols, does not use UDP (User Datagram Protocol)

5.3.1 Piece Selection

        Peers continuously queue up the pieces for download which they require. Therefore
the tracker is constantly replying to the peer with a list of peers who have the requested
pieces. Which piece is requested depends upon the BitTorrent client. There are three stages of
piece selection, which change depending on which stage of completion a peer is at.

5.3.2 Random First Piece

        When downloading first begins, as the peer has nothing to upload, a piece is selected
at random to get the download started. Random pieces are then chosen until the first piece is
completed and checked. Once this happens, the 'rarest first' strategy begins.

5.3.3 Rarest First

        When a peer selects which piece to download next, the rarest piece will be chosen
from the current swarm, i.e. the piece held by the lowest number of peers. This means that the
most common pieces are left until later, and focus goes to replication of rarer pieces.

        At the beginning of a torrent, there will be only one seed with the complete file. There
would be a possible bottle neck if multiple downloaders were trying to access the same piece.
rarest first avoids this because different peers have different pieces. As more peers connect,
rarest first will the some load off of the tracker, as peers begin to download from one another.

        Eventually the original seed will disappear from a torrent. This could be because of
cost reasons, or most commonly because of bandwidth issues. Losing a seed runs the risk of
pieces being lost if no current downloaders have them. Rarest first works to prevent the loss
of pieces by replicating the pieces most at risk as quickly as possible. If the original seed goes



Division of Computer Engineering                                                          Page 25
before at least one other peer has the complete file, then no one will reach completion, unless
a seed re-connects.

5.3.4 Endgame Mode

        When a download nears completion, and waiting for a piece from a peer with slow
transfer rates, completion may be delayed. To prevent this, the remaining sub-pieces are
request from all peers in the current swarm.

5.3.5 Peer Distribution

        The role of the tracker ends once peers have 'found each other'. From then on,
communication is done directly between peers, and the tracker is not involved. The set of
peers a BitTorrent client is in communication with is known as a swarm.To maintain the
integrity of the data which has been downloaded, a peer does not report that they have a piece
until they have performed a hash check with the one contained in the metainfo file.Peers will
continue to download data from all available peers that they can, i.e. peers that posses the
required pieces. Peers can block others from downloading data if necessary. This is known as
choking.

5.3.6 Choking

        When a peer receives a request for a piece from another peer, it can opt to refuse to
transmit that piece. If this happens, the peer is said to be choked. This can be done for
different reasons, but the most common is that by default, a client will only maintain a default
number of simultaneous uploads (max_uploads) All further requests to the client will be
marked as choked. Usually the default for max_uploads is 4.




Division of Computer Engineering                                                        Page 26
Fig 5.3 : Choking by a peer

        The peer will then remain choked until an unchoke message is sent. Another example
of when a peer is choked would be when downloading from a seed, and the seed requires no
pieces. To ensure fairness between peers, there is a system in place which rotates which peers
are downloading. This is know as optimistic unchoking.

5.3.7 Optimistic Unchoking

        To ensure that connections with the best data transfer rates are not favoured, each peer
has a reserved 'optimistic unchoke' which is left unchoked regardless of the current transfer
rate. The peer which is assigned to this is rotated every 30 seconds. This is enough time for
the upload / download rates to reach maximum capacity.The peers then cooperate using the
tit for tat strategy, where the downloader responds in one period with the same action the
uploader used in the last period.

5.3.8 Communication Between Peers

        Peers which are exchanging data are in constant communication. Connections are
symmetrical, and therefore messages can be exchanged in both directions. These messages
are made up of a handshake, followed by a never-ending stream of length-prefixed messages.

5.3.9 Handshaking

Handshaking is performed as follows:




Division of Computer Engineering                                                        Page 27
1. The handshake starts with character 19 (base 10) followed by the string 'BitTorrent
          Protocol'.
     2. A 20 byte SHA1 hash of the bencoded info value from the metainfo is then sent. If
          this does not match between peers the connection is closed.
     3. A 20 byte peer id is sent which is then used in tracker requests and included in peer
          requests. If the peer id does not match the one expected, the connection is closed.


5.3.10 Message Stream

          This constant stream of messages allows all peers in the swarm to send data, and
control interactions with other peers.




                                                                               Additional
 Prefix      Message      Structure
                                                                               Information




                                                                               Fixed length, no
                                                                               payload. This
                                                                               enables a peer
 0           choke        <len=0001><id=0>
                                                                               to block another
                                                                               peers request
                                                                               for data.



                                                                               Fixed length, no
                                                                               payload.
                                                                               Unblock peer,
                                                                               and if they are
 1           unchoke      <len=0001><id=1>
                                                                               still interested
                                                                               in the data,
                                                                               upload will
                                                                               begin.



                                                                               Fixed length, no
                                                                               payload. A user
                                                                               is interested if a
 2           interested   <len=0001><id=2>
                                                                               peer has the
                                                                               data they
                                                                               require.



Division of Computer Engineering                                                             Page 28
Fixed length, no
                                                                    payload. The
            not
 3                        <len=0001><id=3>                          peer does not
            interested
                                                                    have any data
                                                                    required.



                                                                    Fixed length.
                                                                    Payload is the
                                                                    zero-based
                                                                    index of the
 4          have          <len=0005><id=4><piece index>
                                                                    piece. Details
                                                                    the pieces that
                                                                    peer currently
                                                                    has.



                                                                    Sent
                                                                    immediately
                                                                    after
                                                                    handshaking.
                                                                    Optional, and
                                                                    only sent if
                                                                    client has
                                                                    pieces. Variable
 5          bitfield      <len=0001+X><id=5><bitfield>              length, X is the
                                                                    length of
                                                                    bitfield.
                                                                    Payload
                                                                    represents
                                                                    pieces that have
                                                                    been
                                                                    successfully
                                                                    downloaded.



                                                                    Fixed length,
                                                                    used to request
                                                                    a block of
                                                                    pieces. The
                                                                    payload
 6          request       <len=0013><id=6><index><begin><length>    contains integer
                                                                    values
                                                                    specifying the
                                                                    index, begin
                                                                    location and
                                                                    length.



 7          piece         <len=0009+X><id=7><index><begin><block>   Sent together
                                                                    with request


Division of Computer Engineering                                                 Page 29
messages. Fixed
                                                                                length, X is the
                                                                                length of the
                                                                                block. The
                                                                                payload
                                                                                contains integer
                                                                                values
                                                                                specifying the
                                                                                index, begin
                                                                                location and
                                                                                length.



                                                                                Fixed length,
                                                                                used to cancel
                                                                                block requests.
                                                                                payload is the
 8          cancel        <len=13><id=8><index><begin><length>                  same as
                                                                                ‘request’.
                                                                                Typically used
                                                                                during ‘end
                                                                                game’ mode.




        A peer will be 'interested' in data if there is a peer which has the required pieces. If the
peer which has this data is not choked, then data will be transferred. After handshaking, by
default, connections start out as choked, and not interested.


5.4 Data

        BitTorrent is very versatile, and can be used to transfer a single file, of multiple files
of any type, contained within any number of directories. File sizes can vary hugely, from
kilobytes to hundreds of gigabytes.




5.4.1 Piece Size

        Data is split into smaller pieces which sent between peers using the bittorrent
protocol. These pieces are of a fixed size, which enables the tracker to keep tabs on who has
which pieces of data. This also breaks the file into verifiable pieces, each piece can then be




Division of Computer Engineering                                                            Page 30
assigned a hash code, which can be checked by the downloader for data integrity. These
hashes are stored as part of the 'metinfo file' which is discussed in the next section.

        The size of the pieces remains constant throughout all files in the torrent except for
the final piece which is irregular. The piece size a torrent is allocated depends on the amount
of data. Piece sizes which are too large will cause inefficiency when downloading (larger risk
of data corruption in larger pieces due to fewer integrity checks), whereas if the piece sizes
are too small, more hash checks will need to be run.

        As the number of pieces increase, more hash codes need to be stored in the metainfo
file. Therefore, as a rule of thumb, pieces should be selected so that the metainfo file is no
larger than 50 - 75kb. The main reason for this is to limit the amount of hosting storage and
bandwidth needed by indexing servers. The most common piece sizes are 256kb, 512kb and
1mb. The number of pieces is therefore: total length / piece size. Pieces may overlap file
boundaries.

        For example, a 1.4Mb file could be split into the following pieces. This shows
5 * 256kb pieces, and a final piece of 120kb.




        Fig 5.4 : Pieces of a file




5.5 BitTorrent Clients

        A BitTorrent client is an executable program which implements the BitTorrent
protocol. It runs together with the operating system on a users machine, and handles
interactions with the tracker and peers. The client is sits on the operating system and is
responsible for controlling the reading / writing of files, opening sockets etc.

        A metainfo file must be opened by the client to start partaking in a torrent. Once the
file is read, the necessary data is extracted, and a socket must be opened to contact the
tracker. BitTorrent clients use TCP ports 6881-6999. To find an available port, the client will

Division of Computer Engineering                                                          Page 31
start at the lowest port, and work upwards until it finds one it can use. This means the client
will only use one port, and opening another BitTorrent client will use another port. A client
can handle multiple torrents running concurrently.

         Clients come in many flavours, and can range from basic applications with few
features to very advanced, customisable ones. For example, some advanced features are
metainfo file wizards and inbuilt trackers. These additional features means different clients
behave very differently, and may use multiple ports, depending on the number of processes it
is running. As all applications implement the same protocol, there is no incompatibility
issues, however because of various tweaks and improvements between clients, a peer may
experience better performance from peers running the same client.




5.6 Sub Protocols :

         BitTorrent can be described in terms of two sub-protocols: one which describes
interactions between the tracker and all clients, and one which describes all client-to-client
interactions.



5.6.1 THP: Tracker HTTP Protocol
         The tracker protocol is implemented on top of HTTP/HTTPS. This means that the
machine running the tracker runs a HTPP or HTTPS server, and has the behaviour described
below:

1. The client sends a GET request to the tracker URL, with certain CGI variables and
values added to the URL. This is done in the standard way, i.e., if the base URL is
“http://some.url.com/announce”, the full URL would be of this form:
“http://some.url.com/announce?var1=value1&var2=value2&var3=value3”.
2. The tracker responds with a “text/plain” document, containing a bencoded dictionary.
This dictionary has all the information required for the client.
3. The client then sends re-requests, either on regular intervals, or when an event occurs,
and the tracker responds.




Division of Computer Engineering                                                        Page 32
The CGI variables and values added to the base URL by the client sending a GET
request are:

        info_hash: The 20 byte SHA1 hash calculated from whatever value the info key maps
        to in the metainfo file.
        peer_id: A 20 character long id of the downloading client, random generated at start
        of every download. There is no formal definition on how to generate this id, but some
        client applications have adapted some semiformal standards on how to generate this
        id.
        ip: This is an optional variable, giving the IP address of the client. This can usually be
        extracted from the TCP connection, but this field is useful if the client and tracker are
        on the same machine, or behind the same NAT gateway. In both cases, the tracker
        then might publish an unroutable IP address to the client.
        port: The port number that the client is listening on. This is usually in the range 6881-
        6889.
        uploaded: The amount of data uploaded so far by the client. There is no official
        definition on the unit, but generally bytes are used
        left: How much the user has left for the download to be complete, in bytes.
        event: An optional variable, corresponding to one of four possibilities:
               •   started: Sent when the client starts the download
               •   stopped: Sent when the client stops downloading
               •   completed: Sent when the download is complete. If the download is complete
                   at start up, this message should not be sent.
              •    empty: Has the same effect as if the event key is nonexistent. In either case,
                   the message in question is one of the messages sent with regular intervals.


        There are some optional variables that can be sent along with the GET request that are
not specified in the official description of the protocol, but are implemented by some tracker
servers:
        numwant: The number of peers the client wants in the response.
        key: An identification key that is not published to other peers. peer_id is public, and
        is thus useless as authorization. key is used if the peer changes IP number to prove it’s
        identity to the tracker.
        trackerid: If a tracker previously gave its trackerid, this should be given here.

Division of Computer Engineering                                                            Page 33
As mentioned earlier, the response is a “text/plain” response with a bencoded dictionary.
This dictionary contains the following keys:
        failure reason: If this key is present, no other keys are included. The value mapped to
        this key is a human readable string with the reason to why the connection failed.
        interval: The number of seconds that the client should wait between regular
        rerequests.
        peers: Maps to a list of dictionaries, that each represent a peer, where each dictionary
        has the keys:
            •   peer_id: The id of the peer in question. The tracker obtained this by the
                peer_id variable in the GET request sent to the tracker.
            •   ip: The address of the peer, either the IP address or the DNS domain name.
            •   port: The port number that the peer listens on.
        These are the keys required by the official protocol specification, but here as well
there are optional extensions:
        min interval: If present, the client must do rereqests more often than this.
        warning message: Has the same information as failure reason, but the other keys in
        the dictionary are present.
        tracker id: A string identificating the tracker. A client should resend it in the
        trackerid variable to the tracker.
        complete: This is the number of peers that have the complete file available for upload.


        incomplete: The number of peers that not have the complete file yet.


5.6.2 PWP: Peer Wire Protocol
        The peer wire (peer to peer) protocol runs over TCP. Message passing is symmetric,
i.e. messages are the same sent in both directions. When a client wants to initiate a
connection, it sets up the TCP connection and sends a handshake message to the other peer. If
the message is acceptable, the receiving side sends a handshake message back. If the initiator
accepts this handshake, message passing can initiate, and continues indefinitely. All integers
are encoded as four byte big-endian, except the first length prefix in the handshake.

Handshake message
The handshake message consists of five parts:

Division of Computer Engineering                                                            Page 34
A single byte, containing the decimal value 19. This is the length of the character
        string following this byte.
        A character string “BitTorrent protocol”, which describes the protocol. Newer
        protocols should follow this convention to facilitate easy identification of protocols.
        Eight reserved bytes for further extension of the protocol. All bytes are zero in current
        implementations.
        A 20 byte SHA1 hash of the value mapping to the info key in the torrent file. This is
        the same hash sent to the tracker in the info_hash variable.
        The 20 byte character string representing the peer id. This is the same value sent to
        the tracker.
        If a peer is the first recipient to a handshake, and the info_hash doesn’t match any
torrent it is serving, it should break the connection. If the initiator of the connection receives a
handshake where the peer id doesn’t match with the id received from the tracker, the
connection should be dropped. Each peer needs to keep the state of each connection. The
state consists of two values, interested and choking. A peer can be either interested or not in
another peer, and either choke or not choke the other peer. Choking means that no requests
will be answered, and interested means that the peer is interested in downloading pieces of
the file from the other peer.
        This means that each peer needs four Boolean values for each connection to keep
track of the state.
    •   am_interested
    •   am_choking
    •   peer_interested
    •   peer_choking

    All connections start out as not interested and choking for both peers. Clients should keep
the am_interested value updated continuously, and report changes to the other peer. The
messages sent after the handshaking are structured as: [message length as an integer] [single
byte describing message type] [payload] Keep alive messages are sent with regular intervals,
and they are simply a message with length 0, and no type or payload.
        Type 0, 1, 2, 3 are choke, unchoke, interested and not interested respectively. All of
them have length 1 and no payload. These messages simply describe changes in state.
        Type 4 is a have. This message has length = 5, and a payload that is a single integer,
giving the integer index of which piece of the file the peer has successfully downloaded and
verified.

Division of Computer Engineering                                                           Page 35
Type 5 is bitfield. This message is only sent directly after handshake. It contains a
bitfield representation of which pieces the peer has. The payload is of variable length, and
consists of a bitmap, where byte 0 corresponds to piece 0-7, byte 1 to piece 8-15 etc. A bit set
to 1 represents having the piece. Peers that have no pieces can neglect to send this message.
        Type 6 is a request. The payload consists of three integers, piece index, begin and
length. The piece index decides within which piece the client wants to download, begin gives
the byte offset within the piece, and length gives the number of bytes the client wants to
download. Length is usually a power of two.
        Type 7 is a block. This message follows a request. The payload contains piece index,
length and the data itself that was requested. Type 8 is cancel. This message has the same
payload as request messages, and it is used to cancel requests made. Peers should
continuously update their interested status to neighbours, so that clients know which peers
will begin downloading when unchoked.




Division of Computer Engineering                                                        Page 36
Chapter 6
                     VULNERABILITIES OF BITTORRENT



6.1Attacks on BitTorrent


        As we have seen so far, BitTorrent is one of most favoured file transfer protocol in
today’s world. But it has been exposed to various attacks in the recent past due to the
vulnerabilities that are being exploited by the hacker community. Here are some of the
attacks that are commonly seen.



6.1.1 Pollution attack
            1. The peers receive the peer list from the tracker.
            2. One peer contacts the attacker for a chunk of the file.
            3. The attacker sends back a false chunk.
            4. This false chunk will fail its hash and will be discarded.
            5. Attacker requests all chunks from swarm and wastes their upload bandwidth.


        Pollution attacks have become increasingly popular and have been used by
anti-piracy groups. In 2005 HBO used pollution attacks to prevent people from downloading
their show Rome.

6.1.2 DDOS attack
        DDOS stands for Distributed denial of service. This attack is possible because
of the fact that BitTorrent Tracker has no mechanism for validating peers. This means there is
no way to trace the culprit in these kind of attacks. Also attacks of this stature are possible
because of the modifications that can be done to the client software.
            1. The attacker downloads a large number of torrent files from a web server.
            2. The attacker parses the torrent files with a modified BitTorrent client and
                spoofs his IP address and port number with the victims as he announces he is
                joining the swarm.




Division of Computer Engineering                                                         Page 37
3. As the tracker receives requests for a list of participating peers from other
                  clients it sends the victims IP and port number.


            4. The peers then attempt to connect to the victim to try and download a chunk of
                  the file.




6.1.3 Bandwidth Shaping
         Many ISPs don’t encourage the use of BitTorrent from their users. This is because
         BitTorrent is usually used to transfer large sized files due to which the traffic over the
         ISPs increase to a large extent. To avoid such exploding traffic on their servers many
         ISPs have started to avoid the traffic caused by BitTorrent. This can be done by
         sniffing the packets that pass through and detecting whether they oblige BitTorrent
         protocol. ISPs make use of filters to find out such packets and block them from
         passing their servers. This has resulted in many file transfer breakdowns across the
         world.



6.2 Solutions
 Many of the attacks that BitTorrent suffers have been dealt with and some measures have
been taken to avoid such attacks. Here are a few solutions to the attacks that were discussed
above.

6.2.1 Pollution attack
         The peers which perform such attacks are identified by tracing their IPs. Then, such
IPs are blacklisted to avoid further communication with them. These blacklisted IPs are
blocked by denying them connections with other peers. This is done by using software like
Peer Guardian or moBlock, which download the list of blacklisted IPs from internet

6.2.2 DDOS attack
         The main solution to this kind of attack is to have clients parse the response from the
tracker. In the case where a host (tracker) does not respond to a peer’s request with a valid
BitTorrent protocol message it should be inferred that this host is not running BitTorrent. The
peer should then exclude hat address from its tracker list, or set a high retry interval for that
specific tracker. Another fix would be for web sites hosting torrents to check and report

Division of Computer Engineering                                                          Page 38
whether all trackers are active, or even remove the on-responding trackers from the tracker
list in the torrent. Another measure could be to restrict the size of the tracker list to reduce the
effectiveness of such an attack

6.2.3 Bandwidth Shaping
        There are broadly two approaches followed to counter this type of attacks. The first
method is to encrypt the packets sent by the means of BitTorrent protocol. By doing this, the
filters that sniff packets will not be able to detect such packets belonging to BitTorrent
protocol. This means that the filters are fooled by the encrypted packets and thus packets can
sneak through such filters. Another approach is to make use of tunnels. Tunnels are dedicated
paths where the filters are avoided by using VPN software which connects to the unfiltered
networks. This results in successfully bypassing the filters and thus the packets are
guaranteed to be transmitted across networks.




Division of Computer Engineering                                                           Page 39
Chapter 7
                                    CONCLUSION

BitTorrent pioneered mesh-based file distribution that effectively utilizes all the uplinks of
participating nodes. Most followon research used similar distributed and randomized
algorithms for peer and piece selection, but with different emphasis or twists. This work takes
a different approach to the mesh-based file distribution problem by considering it as a
scheduling problem, and strives to derive an optimal schedule that could minimize the total
elapsed time. By comparing the total elapsed time of BitTorrent and CSFD in a wide variety
of scenarios, we are able to determine how close BitTorrent is to the theoretical optimum. In
addition, the study of applicability of BitTorrent to real-time media streaming applications,
shows that with minor modifications, BitTorrent can serve as an effective media streaming
tool as well. BitTorrent’s application in this information sharing age is almost priceless.
However, it is still not perfected as it is still prone to malicious attacks and acts of misuse.
Moreover, the lifespan of each torrent is still not satisfactory, which means that the length of
file distribution can only survive for a limited period of time. Thus, further analysis and a
more thorough study in the protocol will enable one to discover more ways to improve it.




Division of Computer Engineering                                                        Page 40
Chapter 8
                                      REFERENCES

            1. BitTorrent Inc. (2006) http://www.bittorrent.com
            2. BitTorrent.Org (2006) http://www.bittorrent.org/protocol.htm
            3. Cohen, Bram (2003) Incentives Build Robustness in BitTorrent, May 22 2003
                http://www.bitconjurer.org/BitTorrent/bittorrentecon.pdf
            4. Cachelogic, BitTorrent bandwidth usage
                http://www.cachelogic.com/research/2005_slide06.php
            5. Information on BitTorrent Protocol
                en.wikipedia.org/wiki/BitTorrent_(protocol)
            6. BitTorrent FAQ: http://btfaq.com
            7. BitTorrent Specifications http://wiki.theory.org/BitTorrentSpecification
            8. Other Information http://www.dessent.net/btfaq/#compare




Division of Computer Engineering                                                   Page 41

Mais conteúdo relacionado

Destaque

Authority To Refill Existing Prescriptions Mo
Authority To Refill Existing Prescriptions MoAuthority To Refill Existing Prescriptions Mo
Authority To Refill Existing Prescriptions MoSameh Khalfia
 
6th Grade Orientation
6th Grade Orientation6th Grade Orientation
6th Grade Orientationmplibrarylady
 
Dyu各學系網站2014招生行銷力提升對策 20140207
Dyu各學系網站2014招生行銷力提升對策 20140207Dyu各學系網站2014招生行銷力提升對策 20140207
Dyu各學系網站2014招生行銷力提升對策 20140207gctarng gctarng
 
Group red 4 presentation.pptx
Group red 4 presentation.pptxGroup red 4 presentation.pptx
Group red 4 presentation.pptxmariogomezprieto
 
Introduction to Educational Media Production
Introduction to Educational Media ProductionIntroduction to Educational Media Production
Introduction to Educational Media ProductionRachabodin Suwannakanthi
 
Sustainability, More Than Survival - ISA Workshop, June 2009, with notes
Sustainability, More Than Survival - ISA Workshop, June 2009,  with notesSustainability, More Than Survival - ISA Workshop, June 2009,  with notes
Sustainability, More Than Survival - ISA Workshop, June 2009, with notesMason International Business Group
 
Progress on NECTEC’s e-Museum activities: a field experience
Progress on NECTEC’s e-Museum activities: a field experienceProgress on NECTEC’s e-Museum activities: a field experience
Progress on NECTEC’s e-Museum activities: a field experienceRachabodin Suwannakanthi
 
JdbcTemplate aus Spring
JdbcTemplate aus SpringJdbcTemplate aus Spring
JdbcTemplate aus Springtutego
 
Technologies for Modern Museums and Libraries
Technologies for Modern Museums and LibrariesTechnologies for Modern Museums and Libraries
Technologies for Modern Museums and LibrariesRachabodin Suwannakanthi
 
Summary of Digital Archive Package Tools Research and Development Project
Summary of Digital Archive Package Tools Research and Development ProjectSummary of Digital Archive Package Tools Research and Development Project
Summary of Digital Archive Package Tools Research and Development ProjectRachabodin Suwannakanthi
 

Destaque (20)

Authority To Refill Existing Prescriptions Mo
Authority To Refill Existing Prescriptions MoAuthority To Refill Existing Prescriptions Mo
Authority To Refill Existing Prescriptions Mo
 
6th Grade Orientation
6th Grade Orientation6th Grade Orientation
6th Grade Orientation
 
Coniche Ppt
Coniche PptConiche Ppt
Coniche Ppt
 
Beekman5 std ppt_04
Beekman5 std ppt_04Beekman5 std ppt_04
Beekman5 std ppt_04
 
Beekman5 std ppt_14
Beekman5 std ppt_14Beekman5 std ppt_14
Beekman5 std ppt_14
 
Dyu各學系網站2014招生行銷力提升對策 20140207
Dyu各學系網站2014招生行銷力提升對策 20140207Dyu各學系網站2014招生行銷力提升對策 20140207
Dyu各學系網站2014招生行銷力提升對策 20140207
 
Erasmus+ uppgift
Erasmus+ uppgiftErasmus+ uppgift
Erasmus+ uppgift
 
Group red 4 presentation.pptx
Group red 4 presentation.pptxGroup red 4 presentation.pptx
Group red 4 presentation.pptx
 
Technology For Botanical Garden
Technology For Botanical GardenTechnology For Botanical Garden
Technology For Botanical Garden
 
Technology For Museum
Technology For MuseumTechnology For Museum
Technology For Museum
 
Introduction to Educational Media Production
Introduction to Educational Media ProductionIntroduction to Educational Media Production
Introduction to Educational Media Production
 
Sustainability, More Than Survival - ISA Workshop, June 2009, with notes
Sustainability, More Than Survival - ISA Workshop, June 2009,  with notesSustainability, More Than Survival - ISA Workshop, June 2009,  with notes
Sustainability, More Than Survival - ISA Workshop, June 2009, with notes
 
Presentaci ã³n4 (1) (1)
Presentaci ã³n4 (1) (1)Presentaci ã³n4 (1) (1)
Presentaci ã³n4 (1) (1)
 
Moral Psychology
Moral PsychologyMoral Psychology
Moral Psychology
 
Progress on NECTEC’s e-Museum activities: a field experience
Progress on NECTEC’s e-Museum activities: a field experienceProgress on NECTEC’s e-Museum activities: a field experience
Progress on NECTEC’s e-Museum activities: a field experience
 
JdbcTemplate aus Spring
JdbcTemplate aus SpringJdbcTemplate aus Spring
JdbcTemplate aus Spring
 
Real Time Image Processing
Real Time Image ProcessingReal Time Image Processing
Real Time Image Processing
 
Technologies for Modern Museums and Libraries
Technologies for Modern Museums and LibrariesTechnologies for Modern Museums and Libraries
Technologies for Modern Museums and Libraries
 
Archives and Digital Archives
Archives and Digital ArchivesArchives and Digital Archives
Archives and Digital Archives
 
Summary of Digital Archive Package Tools Research and Development Project
Summary of Digital Archive Package Tools Research and Development ProjectSummary of Digital Archive Package Tools Research and Development Project
Summary of Digital Archive Package Tools Research and Development Project
 

Semelhante a Bittorrent Seminar Report by Shyam Prakash

TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdf
TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdfTITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdf
TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdfBuddyGeneral
 
Electronics for-you-projects-and-ideas-2000
Electronics for-you-projects-and-ideas-2000Electronics for-you-projects-and-ideas-2000
Electronics for-you-projects-and-ideas-2000nonshahid
 
Electronics for you projects and ideas 2000 (malestrom)
Electronics for you projects and ideas 2000 (malestrom)Electronics for you projects and ideas 2000 (malestrom)
Electronics for you projects and ideas 2000 (malestrom)Rohit Chintu
 
Digital underground cable fault locator (dufcl).
Digital underground cable fault locator (dufcl).Digital underground cable fault locator (dufcl).
Digital underground cable fault locator (dufcl).ITODO Victory
 
Oracle 10-g-recommendations-v1 2
Oracle 10-g-recommendations-v1 2Oracle 10-g-recommendations-v1 2
Oracle 10-g-recommendations-v1 2unixadminrasheed
 
Configuração modbus yokogawa
Configuração modbus yokogawaConfiguração modbus yokogawa
Configuração modbus yokogawaJohn de Carvalho
 
Tr electronic products assembly and servicing nc ii
Tr electronic products assembly and servicing nc iiTr electronic products assembly and servicing nc ii
Tr electronic products assembly and servicing nc iiMarlon Sibayan
 
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC II
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC IITR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC II
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC IIReden Pagdato
 
NTC 409 RANK Become Exceptional--ntc409rank.com
NTC 409 RANK Become Exceptional--ntc409rank.comNTC 409 RANK Become Exceptional--ntc409rank.com
NTC 409 RANK Become Exceptional--ntc409rank.comshanaabe69
 
NTC 409 RANK Introduction Education--ntc409rank.com
NTC 409 RANK Introduction Education--ntc409rank.comNTC 409 RANK Introduction Education--ntc409rank.com
NTC 409 RANK Introduction Education--ntc409rank.comGVlaxmi16
 
Microsoft Word Mobile Multi Media Applications
Microsoft Word   Mobile Multi Media ApplicationsMicrosoft Word   Mobile Multi Media Applications
Microsoft Word Mobile Multi Media Applicationskkkseld
 
Femtocells wp architecture_1009_qualcomm
Femtocells wp architecture_1009_qualcommFemtocells wp architecture_1009_qualcomm
Femtocells wp architecture_1009_qualcommVarun Katial
 
ใบงานที่ 4
ใบงานที่ 4ใบงานที่ 4
ใบงานที่ 4KaRn Tik Tok
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRSAbhishek Nadkarni
 

Semelhante a Bittorrent Seminar Report by Shyam Prakash (20)

Project report1
Project report1Project report1
Project report1
 
TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdf
TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdfTITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdf
TITLE_PAGE_DESIGN_AND_IMPLEMENTATION_OF.pdf
 
Electronics for-you-projects-and-ideas-2000
Electronics for-you-projects-and-ideas-2000Electronics for-you-projects-and-ideas-2000
Electronics for-you-projects-and-ideas-2000
 
Electronics for you projects and ideas 2000 (malestrom)
Electronics for you projects and ideas 2000 (malestrom)Electronics for you projects and ideas 2000 (malestrom)
Electronics for you projects and ideas 2000 (malestrom)
 
Digital underground cable fault locator (dufcl).
Digital underground cable fault locator (dufcl).Digital underground cable fault locator (dufcl).
Digital underground cable fault locator (dufcl).
 
iPDC Report Nitesh
iPDC Report NiteshiPDC Report Nitesh
iPDC Report Nitesh
 
Final Report
Final ReportFinal Report
Final Report
 
Oracle 10-g-recommendations-v1 2
Oracle 10-g-recommendations-v1 2Oracle 10-g-recommendations-v1 2
Oracle 10-g-recommendations-v1 2
 
Configuração modbus yokogawa
Configuração modbus yokogawaConfiguração modbus yokogawa
Configuração modbus yokogawa
 
Tr electronic products assembly and servicing nc ii
Tr electronic products assembly and servicing nc iiTr electronic products assembly and servicing nc ii
Tr electronic products assembly and servicing nc ii
 
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC II
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC IITR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC II
TR ELECTRONICS PRODUCTS ASSEMBLY AND SERVICING NC II
 
thesis_SaurabhPanda
thesis_SaurabhPandathesis_SaurabhPanda
thesis_SaurabhPanda
 
NTC 409 RANK Become Exceptional--ntc409rank.com
NTC 409 RANK Become Exceptional--ntc409rank.comNTC 409 RANK Become Exceptional--ntc409rank.com
NTC 409 RANK Become Exceptional--ntc409rank.com
 
NTC 409 RANK Introduction Education--ntc409rank.com
NTC 409 RANK Introduction Education--ntc409rank.comNTC 409 RANK Introduction Education--ntc409rank.com
NTC 409 RANK Introduction Education--ntc409rank.com
 
24319102
2431910224319102
24319102
 
Microsoft Word Mobile Multi Media Applications
Microsoft Word   Mobile Multi Media ApplicationsMicrosoft Word   Mobile Multi Media Applications
Microsoft Word Mobile Multi Media Applications
 
Femtocells wp architecture_1009_qualcomm
Femtocells wp architecture_1009_qualcommFemtocells wp architecture_1009_qualcomm
Femtocells wp architecture_1009_qualcomm
 
iPDC Report Kedar
iPDC Report KedariPDC Report Kedar
iPDC Report Kedar
 
ใบงานที่ 4
ใบงานที่ 4ใบงานที่ 4
ใบงานที่ 4
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRS
 

Último

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Último (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Bittorrent Seminar Report by Shyam Prakash

  • 1. BITTORRENT Seminar Report Submitted in partial fulfilment of the requirements for the award of the degree of Bachelor of Technology in Computer Science Engineering of Cochin University Of Science And Technology by SHYAM PRAKASH (12080079) DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022 SEPETEMBER 2010 Division of Computer Engineering Page 1
  • 2. DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022 Certificate Certified that this is a bonafide record of the seminar entitled “BITTORRENT” presented by the following student “SHYAM PRAKASH” of the VII semester, Computer Science and Engineering in the year 2010 in partial fulfillment of the requirements in the award of Degree of Bachelor of Technology in Computer Science and Engineering of Cochin University of Science and Technology. Ms. SHEKHA S Dr. DAVID PETER S SEMINAR GUIDE HEAD OF DIVISION Division of Computer Engineering Page 2
  • 3. ACKNOWLEDGEMENT I thank GOD almighty for guiding me throughout the seminar. I would like to thank all those who have contributed to the completion of the seminar and helped me with valuable suggestions for improvement. I am extremely grateful to Dr. David Peter, Head Of Division, Division of Computer Science, for providing me with best facilities and atmosphere for the creative work guidance and encouragement. I would like to thank my coordinator Mr.Sudeep Elayidom and Seminar guide Ms. Shekha S Lecturer, Division of Computer Science, for all help and support extend to me. I thank all Staff members of my college and friends for extending their cooperation during my seminar. Above all I would like to thank my parents without whose blessings, I would not have been able to accomplish my goal. SHYAM PRAKASH Division of Computer Engineering Page 3
  • 4. ABSTRACT BitTorrent is the name of a peer-to-peer (P2P) file distribution protocol, and is the name of a free software implementation of that protocol. The protocol was originally designed and created by programmer Bram Cohen, and is now maintained by BitTorrent Inc. BitTorrent is designed to distribute large amounts of data widely without incurring the corresponding consumption in costly server and bandwidth resources. CacheLogic suggests that BitTorrent traffic accounts for 55% of all traffic on the Internet, while other sources are skeptical.The original BitTorrent client was written in Python. Its source code, as of version has been released under the BitTorrent Open Source License, which is a modified version of the Jabber Open Source License. There are numerous compatible clients, written in a variety of programming languages, and running on a variety of computing platforms. Division of Computer Engineering Page 4
  • 5. Table contents page no CHAPTER 1 ------------------------------------------------------- 1 INTRODUCTION -------------------------------------------------------1 1.1 OVERVIEW -------------------------------------------------- 1 1.2 HISTORY ----------------------------------------------------- 1 CHAPTER 2 ----------------------------------------------------------- 2 BITTORRENT AND OTHER APPROACHES ----------------------- 3 2.1 OTHER P2P METHODS ---------------------------------- 3 2.2 A TYPICAL HTTP FILE TRANSFER ---------------- 3 2.3 THE DAP METHOD -------- ----------------------------------- 4 2.4 THE BITTORRENT APPROACH -------------------- 5 CHAPTER 3 ------------------------------------------------------- .6 WORKING OF BITTORRENT -------------------------------- 8 CHAPTER 4 ----------------------------------------------------------- .8 TERMINOLOGY ------------------------------------------------- 12 CHAPTER 5 ------------------------------------------------------- 12 ARCHITECTURE OF BITTORRENT ---------------------- .14 5.1 METAINFO FILE ------- ------------------------------------ 14 5.1.1 BENCODING : ------- ------------------------------------- 15 5.1.2 METAINFO FILE DISTRIBUTION -----------------------16 5.2 TRACKER . --------- ----------------------------------------17 5.2.1 SCRAPING ---------- --------------------------------------- 18 5.3 PEERS ---------------- ---------------------------------------- 20 5.3.1 PIECE SELECTION ----------------------------------- 21 5.3.2 RANDOM FIRST PIECE ---------------------------- 21 5.3.3 RAREST FIRST ---------------------------------------- 22 5.3.4 ENDGAME MODE - ------------------------------------22 5.3.5 PEER DISTRIBUTION ------------------------------- 22 5.3.6 CHOKING ----------------------------------------------- 22 5.3.7 OPTIMISTIC UNCHOKING ------------------------23 5.3.8 COMMUNICATION BETWEEN PEERS ------- 24 5.3.9 HANDSHAKING -------------------------------------- 24 Division of Computer Engineering Page 5
  • 6. 5.3.10 MESSAGE STREAM ------------------------------- 24 5.4 DATA --------------------------------------------------------- 27 5.4.1 PIECE SIZE --------------------------------------------- 27 5.5 BITTORRENT CLIENTS --------------------------------- 28 5.6 SUB PROTOCOLS : ---------------------------------------- 29 5.6.1 THP: TRACKER HTTP PROTOCOL ------------ 29 5.6.2 PWP: PEER WIRE PROTOCOL ------------------- 31 CHAPTER 6 ------------------------------------------------------------- .34 VULNERABILITIES OF BITTORRENT ------------------------------ .34 6.1ATTACKS ON BITTORRENT --------------------------- 34 6.1.1 POLLUTION ATTACK ------------------------------ 34 6.1.2 DDOS ATTACK --------------------------------------- 34 6.1.3 BANDWIDTH SHAPING --------------------------- 35 6.2 SOLUTIONS ------------------------------------------------ 35 6.2.1 POLLUTION ATTACK ------------------------------ 35 6.2.2 DDOS ATTACK --------------------------------------- 37 6.2.3 BANDWIDTH SHAPING --------------------------- 39 CHAPTER 7 -------------------------------------------------------- 40 CONCLUSION ----------------------------------------------------- 40 CHAPTER 8 --------------------------------------------------------- .41 REFERENCES ------------------------------------------------------41 Division of Computer Engineering Page 6
  • 7. Chapter 1 INTRODUCTION 1.1 Overview BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts of data. BitTorrent is one of the most common protocols for transferring large files. Its main usage is for the transfer of large sized files. It makes transfer of such files easier by implementing a different approach. A user can obtain multiple files simultaneously without any considerable loss of the transfer rate. It is said to be a lot better than the conventional file transfer methods because of a different principle that is followed by this protocol. It also evens out the way a file is shared by allowing a user not just to obtain it but also to share it with others. This is what has made a big difference between this and the conventional file transfer methods. It makes a user to share the file he is obtaining so that the other users who are trying to obtain the same file would find it easier and also in turn making these users to involve themselves in the file sharing process. Thus the larger the number of users the more is the demand and more easily a file can be transferred between them. BitTorrent protocol has been built on a technology which makes it possible to distribute large amounts of data without the need of a high capacity server, and expensive bandwidth. This is the most striking feature of this file transfer protocol. The transferring of files will never depend on a single source which is supposed the original copy of the file but instead the load will be distributed across a number of such sources. Here not just the sources are responsible for file transfer but also the clients or users who want to obtain the file are involved in this process. This makes the load get distributed evenly across the users and thus making the main source partially free from this process which will reduce the network traffic imposed on it. Because of this, BitTorrent has become one of the most popular file transfer mechanisms in today’s world. Though the mechanism itself is not as simple as an ordinary file transfer protocol, it has gained its popularity because of the sharing policy that it imposes on its users. This fact is quite obvious, since the recent surveys made by various organizations show that 35% of the overall internet traffic is because of BitTorrent. This shows that the amount of files that are being transferred and shared by users through BitTorrent is very huge. Division of Computer Engineering Page 7
  • 8. 1.2 History BitTorrent was created by a programmer named Bram Cohen. After inventing this new technology he said, "I decided I finally wanted to work on a project that people would actually use, would actually work and would actually be fun". Before this was invented, there were other techniques for file sharing but they were not utilizing the bandwidth effectively. The bandwidth had become a bottleneck in such methods. Even other peer to peer file sharing systems like Napster and Kazaa had the capability of sharing files by making the users involve in the sharing process, but they required only a subset of users to share the files not all. This meant that most of the users can simply download the files without being needed to upload. So this again put a lot of network load on the original sources and on small number of users. This led to inefficient usage of bandwidth of the remaining users. This was the main intention behind Cohen’s invention, i.e., to make the maximum utilization of all the users’ bandwidth who are involved in the sharing of files. By doing so, every person who wants to download a file had to contribute towards the uploading process also. This new and novel concept of Cohen gave birth to a new peer to peer file sharing protocol called BitTorrent. Cohen invented this protocol in April 2001. The first usable version of BitTorrent appeared in October 2002, but the system needed a lot of fine-tuning. BitTorrent really started to take off in early 2003 when it was used to distribute a new version of Linux and fans of Japanese anime started relying on it to share cartoons. The most important part of this protocol that matters a lot about this is that it makes it possible for people with limited bandwidth to supply very popular files. This means that if you are a small software developer you can put up a package, and if it turns out that millions of people want it, they can get it from each other in an automated way. Thus the bandwidth which used to be a bottleneck in previous systems no longer poses a problem. Division of Computer Engineering Page 8
  • 9. Chapter 2 BITTORRENT AND OTHER APPROACHES 2.1 Other P2P methods The most common method by which files are transferred on the Internet is the client- server model. A central server sends the entire file to each client that requests it, this is how both http and ftp work. The clients only speak to the server, and never to each other. The main advantages of this method are that it's simple to set up, and the files are usually always available since the servers tend to be dedicated to the task of serving, and are always on and connected to the Internet. However, this model has a significant problem with files that are large or very popular, or both. Namely, it takes a great deal of bandwidth and server resources to distribute such a file, since the server must transmit the entire file to each client. Perhaps you may have tried to download a demo of a new game just released, or CD images of a new Linux distribution, and found that all the servers report "too many users," or there is a long queue that you have to wait through. The concept of mirrors partially addresses this shortcoming by distributing the load across multiple servers. But it requires a lot of coordination and effort to set up an efficient network of mirrors, and it's usually only feasible for the busiest of sites. Another method of transferring files has become popular recently: the peer-to-peer network, systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In most of these networks, ordinary Internet users trade files by directly connecting one-to-one. The advantage here is that files can be shared without having access to a proper server, and because of this there is little accountability for the contents of the files. Hence, these networks tend to be very popular for illicit files such as music, movies, pirated software, etc. Typically, a downloader receives a file from a single source, however the newest version of some clients allow downloading a single file from multiple sources for higher speeds. The problem discussed above of popular downloads is somewhat mitigated, because there's a greater chance that a popular file will be offered by a number of peers. The breadth of files available tends to be fairly good, though download speeds for obscure files tend to be low. Another common problem sometimes associated with these systems is the significant protocol overhead for passing search queries amongst the peers, and the number of peers that one can reach is often limited as a result. Partially downloaded files are usually not available to other peers, although some newer clients may offer this functionality. Availability is generally Division of Computer Engineering Page 9
  • 10. dependent on the goodwill of the users, to the extent that some of these networks have tried to enforce rules or restrictions regarding send/receive ratios.Use of the Usenet binary newsgroups is yet another method of file distribution, one that is substantially different from the other methods. Files transferred over Usenet are often subject to miniscule windows of opportunity. Typical retention time of binary news servers are often as low as 24 hours, and having a posted file available for a week is considered a long time. However, the Usenet model is relatively efficient, in that the messages are passed around a large web of peers from one news server to another, and finally fanned out to the end user from there. Often the end user connects to a server provided by his or her ISP, resulting in further bandwidth savings. Usenet is also one of the more anonymous forms of file sharing, and it too is often used for illicit files of almost any nature. Due to the nature of NNTP, a file's popularity has little to do with its availability and hence downloads from Usenet tend to be quite fast regardless of content. The downsides of this method include a set of rules and procedures, and requires a certain amount of effort and understanding from the user. Patience is often required to get a complete file due to the nature of splitting big files into a huge number of smaller posts. Finally, access to Usenet often must be purchased due to the extremely high volume of messages in the binary groups. BitTorrent is closest to Usenet. It is best suited to newer files, of which a number of people have interest in. Obscure or older files tend to not be available. Perhaps as the software matures a more suitable means of keeping torrents seeded will emerge, but currently the client is quite resource-intensive, making it cumbersome to share a number of files. BitTorrent also deals well with files that are in high demand, especially compared to the other methods. 2.2 A Typical HTTP File Transfer The most common type of file transfer is through a HTTP server. In this method, a HTTP server listens to the client’s requests and serves them. Here the client can only depend on the lone server that is providing the file. The overall download scheme will be limited to the limitations of that server. Also this kind of transfer of file is subjected to single point of failure, where if the server crashes then the whole download process will seize. A single server can handle many such clients and serve the requested file simultaneously to all the clients. The file being served will be available as one single piece, which means that if the download process stops abruptly in the middle the whole file has to be downloaded again. Division of Computer Engineering Page 10
  • 11. BitTorrent protocol has overcome all these shortcomings seen in this type and thus it is more robust due to which it is chosen by many people over this traditional method of file transfer. Fig 2.1 : HTTP/FTP File Transfer 2.3 The DAP method Download Accelerator Plus (DAP) is the world's most popular download accelerator. DAP's key features include the ability to accelerate downloading of files in FTP and HTTP protocols, to pause and resume downloads, and to recover from dropped internet connections. On the Internet the same file is often hosted on numerous mirror sites, such as at universities and on ISP servers. DAPimmediately senses when a user begins downloading a file and identifies available mirror sites that host the requested file. As soon as it is triggered, DAP's client side optimization begins to determine - in real time - which mirror sites offer the fastest response for the specific user's location. The file is downloaded in several segments simultaneously through multiple connections from the most responsive server(s) and reassembled at the user's PC. This results in better utilization of the user's available bandwidth. This ensures that each available mirror server is utilized to serve the users that most benefit. This in turn effects an efficient balancing of the load among available servers across the entire World Wide Web, and reduces download times for users while allowing them to receive maximum benefit from their available bandwidth. DAP'sResume Division of Computer Engineering Page 11
  • 12. functionality and the ability to continue downloading even when one of the participating connections has dropped also provides users with a more reliable download experience. 2.4 The BitTorrent Approach In BitTorrent, the data to be shared is divided into many equal-sized portions called pieces. Each piece is further sub-divided into equal-sized sub-pieces called blocks. All clients interested in sharing this data are grouped into a swarm, each of which is managed by a central entity called the tracker. BitTorrent has revolutionized the way files are shared between people. It does not require a user to download a file completely from a single server. Instead a file can be downloaded from many such users who are indeed downloading the same file. A user who has the complete file, called the seed will initiate the download by transferring pieces of file to the users. Once a user has some considerable number of such pieces of a file then even he can start sharing them with other users who are yet to receive those pieces. This concept enables a client not to depend on a server completely and also it reduces overall load on the server. Fig 2.2 : BitTorrent File Transfer Each client independently sends a file, called a torrent, that contains the location of the tracker along with a hash of each piece. Clients keep each other updated on the status of their download. Clients download blocks from other (randomly chosen) clients who claim they have the corresponding data. Accordingly, clients also send data that they have previously downloaded to other clients. Once a client receives all the blocks for a given Division of Computer Engineering Page 12
  • 13. piece, he can verify the hash of that piece against the provided hash in the torrent. Thus once a client has downloaded and verified all pieces, he can be confident that he has the complete data. Both BitTorrent and DAP download files from multiple sources. Also the files are divided into pieces in both approaches. But BitTorrent has many such features that DAP doesn’t, which has made it the most popular one. In BitTorrent the users participate actively in sharing files along with servers. This is the uniqueness of this protocol. Also this needs an implementation of a dedicated server called tracker to handle the peers connected in the network. The file transfer in DAP takes place through the traditional HTTP or FTP protocol which means that the transfer rate will always be limited by the server’s bandwidth. If these servers are flooded with requests then the breakdown and the transaction will terminate. This is not the case in BitTorrent since the whole process is not depending on servers alone. The load is distributed across the network between peers and servers. This makes BitTorrent far better than its competing peers like DAP and others. Division of Computer Engineering Page 13
  • 14. Chapter 3 WORKING OF BITTORRENT As previously explained, BitTorrent’s design makes it extremely efficient in the sharing of large data files among interested peers. Looking under the hood, BitTorrent is a protocol with some complexity where modeling is useful to gain a better understanding of its performance. BitTorrent scales well and is a superior method for transferring and disseminating files between interested peers while limiting free riding (peers who download but do not upload) between those same peers. BitTorrent’s is based on a “tit for tat” reciprocity agreement between users that ultimately results in pareto efficiency. Pareto efficiency is an important economic concept that maximizes resource allocation among peers to their mutual advantage. Pareto efficiency is the crown jewel of BitTorrent and is the driving force behind the protocol’s popularity and success. Cohen’s vision of peers simultaneously helping each other by uploading and downloading has been realized by the BitTorrent system. Fig 3.1 : A Typical BitTorrent System The protocol shares data through what are known as torrents. For a torrent to be alive or active it must have several key components to function. These components include a tracker server, a .torrent file, a web server where the .torrent file is stored and a complete copy of the file being exchanged. Each of these components is described in the following Division of Computer Engineering Page 14
  • 15. paragraphs.The file being exchanged is the essence of the torrent and a complete copy is referred to as a seed. A seed is a peer in the BitTorrent network willing to share a file with other peers in the network. Why seed owners choose to share their files is debatable, as the BitTorrent protocol does not reward seed behavior. In fact, some researchers believe the protocol lacks any incentive mechanism for encouraging seeds to remain in torrents. Some argue that the lack of incentive in the protocol is a fundamental design flaw that leads to the punishment of seeds. Peers lacking the file and seeking it from seeds are called leechers. While seeds only upload to leechers, leechers may both download from seeds and upload to other leechers. BitTorrent’s protocol is designed so leeching peers seek each other out for data transfer in a process known as “optimistic unchoking”. Together seeds and leechers engaged in file transfer are referred to as a swarm. A swarm is coordinated by a tracker server serving the particular torrent and interested peers find the tracker via metadata known as a .torrent file. Since BitTorrent has no built in search functionality, .torrent files are usually located via HTTP through search engines or trackers. The first step in the BitTorrent exchange occurs when a peer downloads a .torrent file from a server. The role of .torrent files is to provide the metadata that allows the protocol to function; .torrent files can be viewed as surrogates for the files being shared. These .torrent files contain key pieces of data to function correctly including file length, assigned name, hashing information about the file and the URL of the tracker coordinating the torrent activity. Torrent files can be created using a program such as MakeTorrent, another open source tool available under the free software model. When a .torrent file is opened by the peer’s client software, the peer then connects to the tracker server responsible for coordinating activity for that specific torrent. The tracker and client communicate by a protocol layered on top of HTTP and the tracker’s key role is to coordinate peers seeking the same file for Cohen envisioned “The tracker’s responsibilities are strictly limited to helping peers find each other”. In reality the tracker’s role is a bit more complex as many trackers collect data about peers engaged in a swarm. Additionally, some of the newer tracker software being released has integrated the functions of the tracker and .torrent server. Leechers and seeds are coordinated by the tracker server and the peers periodically update the tracker on their status allowing the tracker to have a global view of the system. The data monitored by the tracker can include peer IP addresses, amount of data uploaded/downloaded for specific peers, data transfer rates among peers, the percentage of Division of Computer Engineering Page 15
  • 16. the total file downloaded, length of time connected to the tracker, and the ratio of sharing among peers. Usually a tracker coordinates multiple torrents and the most popular trackers are busy coordinating thousands of swarms simultaneously. It should be noted that .torrent files are not the actual file being shared; rather .torrent files are the metadata information which allow which trackers and peers to coordinate their activities. As previously mentioned, the complete file is actually stored on peer seed nodes and not the tracker server. Since .torrent files are small and require little space to store, one server can easily host thousands of .torrent files without prohibitive server or bandwidth requirements. There is some issue with bandwidth usage to host a tracker, however, especially if the tracker becomes popular and begins to see heavy usage. Regardless, the tracker’s bandwidth requirements are much less than hosting the complete files in a traditional client-server model such as one would encounter with an FTP site. While trackers and .torrent files serve as mechanisms to assist the BitTorrent protocol, the process of actually transferring data is handled by the peers engaged in the swarm. Cohen’s vision of “tit for tat” is the sole incentive measure he saw necessary for the protocol’s success. Peers seek tit for tat behavior from others and discourage free riding by a “choke/unchoke” policy. This choke policy uses a process known as “optimistic unchoking” to constantly seek other swarm peers who may have more beneficial connections to offer an interested peer. There has been some research of the tit for tat algorithm by modeling rational users whose behavior is then studied. This work defined rational users as those peer nodes manipulating their client software beyond default settings. The fact that many newer BitTorrent clients allow for custom tweaking of specific upload or download speed indicates that perhaps the original tit for tat coding was too good, and thus detrimental to other peer node functions such as normal HTTP traffic. Some BitTorrent FAQs recommend limiting uploads to approximately 80% of known capacity and personal tests indicate this strategy does benefit download speeds. The final important aspect of the BitTorrent protocol’s architecture is its use of a “rarest piece first” algorithm when a peer begins a file download. The rarest first algorithm has as its goal the uniform distribution of data across peers, also known as the “endgame mode”. A rarest first policy requires a seed to upload new file chunks (those not yet uploaded to a swarm) to the newest peer connecting to a torrent. This policy encourages distribution of the file further across peer nodes.. The rarest first algorithm is an interesting aspect of BitTorrent that when combined with optimistic unchoking may explain why the protocol has achieved such success. Division of Computer Engineering Page 16
  • 17. Chapter 4 TERMINOLOGY These are the common terms that one would come across while making a typical BitTorrent file transfer. Torrent : this refers to the small metadata file you receive from the web server (the one that ends in .torrent.) Metadata here means that the file contains information about the data you want to download, not the data itself. Peer : A peer is another computer on the internet that you connect to and transfer data. Generally a peer does not have the complete file. Leeches : They are similar to peers in that they won’t have the complete file. But the main difference between the two is that a leech will not upload once the file is downloaded. Seed : A computer that has a complete copy of a certain torrent. Once a client downloads a file completely, he can continue to upload the file which is called as seeding. This is a good practice in the BitTorrent world since it allows other users to have the file easily. Reseed : When there are zero seeds for a given torrent, then eventually all the peers will get stuck with an incomplete file, since no one in the swarm has the missing pieces. When this happens, a seed must connect to the swarm so that those missing pieces can be transferred. This is called reseeding. Swarm : The group of machines that are collectively connected for a particular file. Tracker : A server on the Internet that acts to coordinate the action of BitTorrent clients. The clients are in constant touch with this server to know about the peers in the swarm. Share ratio : This is ratio of amount of a file downloaded to that of uploaded. A ratio of 1 means that one has uploaded the same amount of a file that has been downloaded. Distributed copies : Sometimes the peers in a swarm will collectively have a complete file. Such copies are called distributed copies. Division of Computer Engineering Page 17
  • 18. Choked : It is a state of an uploader where he does not want to send anything on his link. In such cases, the connection is said to be choked. Interested : This is the state of a downloader which suggests that the other end has some pieces that the downloader wants. Then the downloader is said to be interested in the other end. Snubbed : If the client has not received anything after a certain period, it marks a connection as snubbed, in that the peer on the other end has chosen not to send in a while. Optimistic unchoking : Periodically, the client shakes up the list of uploaders and tries sending on different connections that were previously choked, and choking the connections it was just using. This is called optimistic unchoking. Division of Computer Engineering Page 18
  • 19. Chapter 5 ARCHITECTURE OF BITTORRENT The BitTorrent protocol can be split into the following five main components: Metainfo File - a file which contains all details necessary for the protocol to operate. Tracker - A server which helps manage the BitTorrent protocol. Peers - Users exchanging data via the BitTorrent protocol. Data - The files being transferred across the protocol. Client - The program which sits on a peers computer and implements the protocol. Peers use TCP (Transport Control Protocol) to communicate and send data. This protocol is preferable over other protocols such as UDP (User Datagram Protocol) because TCP guarantees reliable and in-order delivery of data from sender to receiver. UDP cannot give order such guarantees, and data can become scrambled, or lost all together. h Fig 5.1 : Architecture of a BitTorrent System The tracker allows peers to query which peers have what data, and allows them to begin communication. Peers communicate with the tracker via the plain text via HTTP (Hypertext Division of Computer Engineering Page 19
  • 20. Transfer Protocol) The following diagram illustrates how peers interact with each other, and also communicate with a central tracker 5.1 Metainfo File When someone wants to publish data using the BitTorrent protocol, they must create a metainfo file. This file is specific to the data they are publishing, and contains all the information about a torrent, such as the data to be included, and IP address of the tracker to connect to. A tracker is a server which 'manages' a torrent, and is discussed in the next section. The file is given a '.torrent' extension, and the data is extracted from the file by a BitTorrent client. This is a program which runs on the user computer, and implements the bittorrent protocol. Every metainfo file must contain the following information, (or 'keys'): • info: A dictionary which describes the file(s) of the torrent. Either for the single file, or the directory structure for more files. Hashes for every data piece, in SHA 1 format are stored here. • announce: The announce URL of the tracker as a string The following are optional keys which can also be used: • announce-list: Used to list backup trackers • creation date: The creation time of the torrent by way of UNIX time stamp (integer seconds since 1-Jan-1970 00:00:00 UTC) • comment: Any comments by the author • created by: Name and Version of programme used to create the metainfo file These keys are structured in the metainfo file as follows: {'info': {'piece length': 131072, 'length': 38190848L, 'name': 'Cory_Doctorow_Microsoft_Research_DRM_talk.mp3', 'pieces': 'xcbxfazrx9bxe1x9axe1x83x91~xed@.....', } 'announce': 'http://tracker.var.cc:6969/announce', 'creation date': 1089749086L } Division of Computer Engineering Page 20
  • 21. Instead of transmitting the keys in plan text format, the keys contained in the metainfo file are encoded before they are sent. Encoding is done using bittorrent specific method known as 'bencoding'. 5.1.1 Bencoding : Bencoding is used by bittorrent to send loosely structured data between the BitTorrent client and a tracker. Bencoding supports byte strings, integers, lists and dictionaries. Bencoding uses the beginning delimiters 'i' / 'l' / 'd' for integers, lists and dictionaries respectively. Ending delimiters are always 'e'. Delimiters are not used for byte strings. Bencoding Structure: • Byte Strings : <string length in base ten ASCII> : <string data> • Integers: i<base ten ASCII>e • Lists: l<bencoded values>e • Dictionaries: d<bencoded string><bencoded element>e Minus integers are allowed, but prefixing the number with a zero is not permitted. However '0' is allowed. Examples of bencoding: 4:spam // represents the string "spam" i3e // represents the integer "3" l4:spam4:eggse // represents the list of two strings: ["spam","eggs"] d4:spaml1:a1:bee // represents the dictionary {"spam" => ["a" , "b"] } 5.1.2 Metainfo File Distribution : Because all information which is needed for the torrent is included in a single file, this file can easily be distributed via other protocols, and as the file is replicated, the number of peers can increase very quickly. The most popular method of distribution is using a public indexing site which hosts the metainfo files. A seed will upload the file, and then others can download a copy of the file over the HTTP protocol and participate in the torrent. Division of Computer Engineering Page 21
  • 22. 5.2 Tracker A tracker is used to manage users participating in a torrent (know as peers). It stored statistics about the torrent, but its main role is allow peers to 'find each other' and start Fig 5.2 : Tracker communication, i.e. to find peers with the data they require. Peers know nothing of each other until a response is received from the tracker. Whenever a peer contacts the tracker, it reports which pieces of a file they have. That way, when another peer queries the tracker, it can provide a random list of peers who are participating in the torrent, and have the required piece. A tracker is a HTTP/HTTPS service and typically works on port 6969. The address of the tracker managing a torrent is specified in the metainfo file, a single tracker can manage single Division of Computer Engineering Page 22
  • 23. multiple torrents. Multiple trackers can also be specified, as backups, which are handled by the BitTorrent client running on the users computer. BitTorrent clients communicate with the tracker using HTTP GET requests, which is a standard CGI method. This consists of appending a "?" to the URL, and separating parameters with a "&".The parameters accepted by the tracker are: • info_hash: 20-byte SHA1 hash of the info key from the metainfo file. • peer_id: 20-byte string used as a unique ID for the client. • port: The port number the client is listed on. • uploaded: The total amount uploaded since the client sent the 'started' event to the tracker in base ten ASCII. • downloaded: The total amount downloaded since the client sent the 'started' event to the tracker in base ten ASCII. • left: The number of bytes the client till has to download, in base ten ASCII. • compact: Indicates that the client accepts compacted responses. The peer list can then be replaced by a 6 bytes per peer. The first 4 bytes are the host, and the last 2 bytes are port. • event: If specified, must be one of the following: started, stopped, completed. • ip: (optional) The IP address of the client machine, in dotted format. • numwant: (optional) The number of peers the client wishes to receive from the tracker. • key: (optional) Allows a client to identify itself if their IP address changes. • trackerid: (optional) If previous announce contained a tracker id, it should be set here. The tracker then responds with a "text/plain" document with the following keys: • failure message: If present, then no other keys are included. The value is a human readable error message as to why the request failed. • warning message: Similar to failure message, but response still gets processed. • interval: The number of seconds a client should wait between sending regular requests to the tracker. • min interval: Minimum announce interval. • tracker id: A string that the client should send back with its next announce. • complete: Number of peers with the complete file. Division of Computer Engineering Page 23
  • 24. incomplete: number of non-seeding peers (leechers) • peers: A list of dictionaries including: peer id, IP and ports of all the peers. 5.2.1 Scraping Scraping is the process of querying the state of a given torrent (or all torrents) that the tracker is managing. The result is known as a "scrape page". To get the scrape, you must start with the announce URL, find the last '/' and if the text immediately following the '/' is 'announce', then this can be substituted for 'scrape' to find the scrape page. Examples: Announce URL Scrape URL http://example.com/annnounce http://example.com/scrape http://example.com/a/annnounce http://example.com/a/scrape http://example.com/announce.php http://example.com/scrape.php The tracker then responds with a "text/plain" document with the following bencoded keys: • files: A dictionary containing one key pair for each torrent. Each key is made up of a 20-byte binary hash value. The value of that key is then a nested dictionary with the following keys: • complete: number of peers with the entire file (seeds) • downloaded: total number of times the entire file has been downloaded. • incomplete: the number of active downloaders (lechers) • name: (optional) the torrent name Division of Computer Engineering Page 24
  • 25. 5.3 Peers Peers are other users participating in a torrent, and have the partial file, or the complete file (known as a seed). Pieces are requested from peers, but are not guaranteed to be sent, depending on the status of the peer. BitTorrent uses TCP (Transmission Control Protocol) ports 6881-6889 to send messages and data between peers, and unlike other protocols, does not use UDP (User Datagram Protocol) 5.3.1 Piece Selection Peers continuously queue up the pieces for download which they require. Therefore the tracker is constantly replying to the peer with a list of peers who have the requested pieces. Which piece is requested depends upon the BitTorrent client. There are three stages of piece selection, which change depending on which stage of completion a peer is at. 5.3.2 Random First Piece When downloading first begins, as the peer has nothing to upload, a piece is selected at random to get the download started. Random pieces are then chosen until the first piece is completed and checked. Once this happens, the 'rarest first' strategy begins. 5.3.3 Rarest First When a peer selects which piece to download next, the rarest piece will be chosen from the current swarm, i.e. the piece held by the lowest number of peers. This means that the most common pieces are left until later, and focus goes to replication of rarer pieces. At the beginning of a torrent, there will be only one seed with the complete file. There would be a possible bottle neck if multiple downloaders were trying to access the same piece. rarest first avoids this because different peers have different pieces. As more peers connect, rarest first will the some load off of the tracker, as peers begin to download from one another. Eventually the original seed will disappear from a torrent. This could be because of cost reasons, or most commonly because of bandwidth issues. Losing a seed runs the risk of pieces being lost if no current downloaders have them. Rarest first works to prevent the loss of pieces by replicating the pieces most at risk as quickly as possible. If the original seed goes Division of Computer Engineering Page 25
  • 26. before at least one other peer has the complete file, then no one will reach completion, unless a seed re-connects. 5.3.4 Endgame Mode When a download nears completion, and waiting for a piece from a peer with slow transfer rates, completion may be delayed. To prevent this, the remaining sub-pieces are request from all peers in the current swarm. 5.3.5 Peer Distribution The role of the tracker ends once peers have 'found each other'. From then on, communication is done directly between peers, and the tracker is not involved. The set of peers a BitTorrent client is in communication with is known as a swarm.To maintain the integrity of the data which has been downloaded, a peer does not report that they have a piece until they have performed a hash check with the one contained in the metainfo file.Peers will continue to download data from all available peers that they can, i.e. peers that posses the required pieces. Peers can block others from downloading data if necessary. This is known as choking. 5.3.6 Choking When a peer receives a request for a piece from another peer, it can opt to refuse to transmit that piece. If this happens, the peer is said to be choked. This can be done for different reasons, but the most common is that by default, a client will only maintain a default number of simultaneous uploads (max_uploads) All further requests to the client will be marked as choked. Usually the default for max_uploads is 4. Division of Computer Engineering Page 26
  • 27. Fig 5.3 : Choking by a peer The peer will then remain choked until an unchoke message is sent. Another example of when a peer is choked would be when downloading from a seed, and the seed requires no pieces. To ensure fairness between peers, there is a system in place which rotates which peers are downloading. This is know as optimistic unchoking. 5.3.7 Optimistic Unchoking To ensure that connections with the best data transfer rates are not favoured, each peer has a reserved 'optimistic unchoke' which is left unchoked regardless of the current transfer rate. The peer which is assigned to this is rotated every 30 seconds. This is enough time for the upload / download rates to reach maximum capacity.The peers then cooperate using the tit for tat strategy, where the downloader responds in one period with the same action the uploader used in the last period. 5.3.8 Communication Between Peers Peers which are exchanging data are in constant communication. Connections are symmetrical, and therefore messages can be exchanged in both directions. These messages are made up of a handshake, followed by a never-ending stream of length-prefixed messages. 5.3.9 Handshaking Handshaking is performed as follows: Division of Computer Engineering Page 27
  • 28. 1. The handshake starts with character 19 (base 10) followed by the string 'BitTorrent Protocol'. 2. A 20 byte SHA1 hash of the bencoded info value from the metainfo is then sent. If this does not match between peers the connection is closed. 3. A 20 byte peer id is sent which is then used in tracker requests and included in peer requests. If the peer id does not match the one expected, the connection is closed. 5.3.10 Message Stream This constant stream of messages allows all peers in the swarm to send data, and control interactions with other peers. Additional Prefix Message Structure Information Fixed length, no payload. This enables a peer 0 choke <len=0001><id=0> to block another peers request for data. Fixed length, no payload. Unblock peer, and if they are 1 unchoke <len=0001><id=1> still interested in the data, upload will begin. Fixed length, no payload. A user is interested if a 2 interested <len=0001><id=2> peer has the data they require. Division of Computer Engineering Page 28
  • 29. Fixed length, no payload. The not 3 <len=0001><id=3> peer does not interested have any data required. Fixed length. Payload is the zero-based index of the 4 have <len=0005><id=4><piece index> piece. Details the pieces that peer currently has. Sent immediately after handshaking. Optional, and only sent if client has pieces. Variable 5 bitfield <len=0001+X><id=5><bitfield> length, X is the length of bitfield. Payload represents pieces that have been successfully downloaded. Fixed length, used to request a block of pieces. The payload 6 request <len=0013><id=6><index><begin><length> contains integer values specifying the index, begin location and length. 7 piece <len=0009+X><id=7><index><begin><block> Sent together with request Division of Computer Engineering Page 29
  • 30. messages. Fixed length, X is the length of the block. The payload contains integer values specifying the index, begin location and length. Fixed length, used to cancel block requests. payload is the 8 cancel <len=13><id=8><index><begin><length> same as ‘request’. Typically used during ‘end game’ mode. A peer will be 'interested' in data if there is a peer which has the required pieces. If the peer which has this data is not choked, then data will be transferred. After handshaking, by default, connections start out as choked, and not interested. 5.4 Data BitTorrent is very versatile, and can be used to transfer a single file, of multiple files of any type, contained within any number of directories. File sizes can vary hugely, from kilobytes to hundreds of gigabytes. 5.4.1 Piece Size Data is split into smaller pieces which sent between peers using the bittorrent protocol. These pieces are of a fixed size, which enables the tracker to keep tabs on who has which pieces of data. This also breaks the file into verifiable pieces, each piece can then be Division of Computer Engineering Page 30
  • 31. assigned a hash code, which can be checked by the downloader for data integrity. These hashes are stored as part of the 'metinfo file' which is discussed in the next section. The size of the pieces remains constant throughout all files in the torrent except for the final piece which is irregular. The piece size a torrent is allocated depends on the amount of data. Piece sizes which are too large will cause inefficiency when downloading (larger risk of data corruption in larger pieces due to fewer integrity checks), whereas if the piece sizes are too small, more hash checks will need to be run. As the number of pieces increase, more hash codes need to be stored in the metainfo file. Therefore, as a rule of thumb, pieces should be selected so that the metainfo file is no larger than 50 - 75kb. The main reason for this is to limit the amount of hosting storage and bandwidth needed by indexing servers. The most common piece sizes are 256kb, 512kb and 1mb. The number of pieces is therefore: total length / piece size. Pieces may overlap file boundaries. For example, a 1.4Mb file could be split into the following pieces. This shows 5 * 256kb pieces, and a final piece of 120kb. Fig 5.4 : Pieces of a file 5.5 BitTorrent Clients A BitTorrent client is an executable program which implements the BitTorrent protocol. It runs together with the operating system on a users machine, and handles interactions with the tracker and peers. The client is sits on the operating system and is responsible for controlling the reading / writing of files, opening sockets etc. A metainfo file must be opened by the client to start partaking in a torrent. Once the file is read, the necessary data is extracted, and a socket must be opened to contact the tracker. BitTorrent clients use TCP ports 6881-6999. To find an available port, the client will Division of Computer Engineering Page 31
  • 32. start at the lowest port, and work upwards until it finds one it can use. This means the client will only use one port, and opening another BitTorrent client will use another port. A client can handle multiple torrents running concurrently. Clients come in many flavours, and can range from basic applications with few features to very advanced, customisable ones. For example, some advanced features are metainfo file wizards and inbuilt trackers. These additional features means different clients behave very differently, and may use multiple ports, depending on the number of processes it is running. As all applications implement the same protocol, there is no incompatibility issues, however because of various tweaks and improvements between clients, a peer may experience better performance from peers running the same client. 5.6 Sub Protocols : BitTorrent can be described in terms of two sub-protocols: one which describes interactions between the tracker and all clients, and one which describes all client-to-client interactions. 5.6.1 THP: Tracker HTTP Protocol The tracker protocol is implemented on top of HTTP/HTTPS. This means that the machine running the tracker runs a HTPP or HTTPS server, and has the behaviour described below: 1. The client sends a GET request to the tracker URL, with certain CGI variables and values added to the URL. This is done in the standard way, i.e., if the base URL is “http://some.url.com/announce”, the full URL would be of this form: “http://some.url.com/announce?var1=value1&var2=value2&var3=value3”. 2. The tracker responds with a “text/plain” document, containing a bencoded dictionary. This dictionary has all the information required for the client. 3. The client then sends re-requests, either on regular intervals, or when an event occurs, and the tracker responds. Division of Computer Engineering Page 32
  • 33. The CGI variables and values added to the base URL by the client sending a GET request are: info_hash: The 20 byte SHA1 hash calculated from whatever value the info key maps to in the metainfo file. peer_id: A 20 character long id of the downloading client, random generated at start of every download. There is no formal definition on how to generate this id, but some client applications have adapted some semiformal standards on how to generate this id. ip: This is an optional variable, giving the IP address of the client. This can usually be extracted from the TCP connection, but this field is useful if the client and tracker are on the same machine, or behind the same NAT gateway. In both cases, the tracker then might publish an unroutable IP address to the client. port: The port number that the client is listening on. This is usually in the range 6881- 6889. uploaded: The amount of data uploaded so far by the client. There is no official definition on the unit, but generally bytes are used left: How much the user has left for the download to be complete, in bytes. event: An optional variable, corresponding to one of four possibilities: • started: Sent when the client starts the download • stopped: Sent when the client stops downloading • completed: Sent when the download is complete. If the download is complete at start up, this message should not be sent. • empty: Has the same effect as if the event key is nonexistent. In either case, the message in question is one of the messages sent with regular intervals. There are some optional variables that can be sent along with the GET request that are not specified in the official description of the protocol, but are implemented by some tracker servers: numwant: The number of peers the client wants in the response. key: An identification key that is not published to other peers. peer_id is public, and is thus useless as authorization. key is used if the peer changes IP number to prove it’s identity to the tracker. trackerid: If a tracker previously gave its trackerid, this should be given here. Division of Computer Engineering Page 33
  • 34. As mentioned earlier, the response is a “text/plain” response with a bencoded dictionary. This dictionary contains the following keys: failure reason: If this key is present, no other keys are included. The value mapped to this key is a human readable string with the reason to why the connection failed. interval: The number of seconds that the client should wait between regular rerequests. peers: Maps to a list of dictionaries, that each represent a peer, where each dictionary has the keys: • peer_id: The id of the peer in question. The tracker obtained this by the peer_id variable in the GET request sent to the tracker. • ip: The address of the peer, either the IP address or the DNS domain name. • port: The port number that the peer listens on. These are the keys required by the official protocol specification, but here as well there are optional extensions: min interval: If present, the client must do rereqests more often than this. warning message: Has the same information as failure reason, but the other keys in the dictionary are present. tracker id: A string identificating the tracker. A client should resend it in the trackerid variable to the tracker. complete: This is the number of peers that have the complete file available for upload. incomplete: The number of peers that not have the complete file yet. 5.6.2 PWP: Peer Wire Protocol The peer wire (peer to peer) protocol runs over TCP. Message passing is symmetric, i.e. messages are the same sent in both directions. When a client wants to initiate a connection, it sets up the TCP connection and sends a handshake message to the other peer. If the message is acceptable, the receiving side sends a handshake message back. If the initiator accepts this handshake, message passing can initiate, and continues indefinitely. All integers are encoded as four byte big-endian, except the first length prefix in the handshake. Handshake message The handshake message consists of five parts: Division of Computer Engineering Page 34
  • 35. A single byte, containing the decimal value 19. This is the length of the character string following this byte. A character string “BitTorrent protocol”, which describes the protocol. Newer protocols should follow this convention to facilitate easy identification of protocols. Eight reserved bytes for further extension of the protocol. All bytes are zero in current implementations. A 20 byte SHA1 hash of the value mapping to the info key in the torrent file. This is the same hash sent to the tracker in the info_hash variable. The 20 byte character string representing the peer id. This is the same value sent to the tracker. If a peer is the first recipient to a handshake, and the info_hash doesn’t match any torrent it is serving, it should break the connection. If the initiator of the connection receives a handshake where the peer id doesn’t match with the id received from the tracker, the connection should be dropped. Each peer needs to keep the state of each connection. The state consists of two values, interested and choking. A peer can be either interested or not in another peer, and either choke or not choke the other peer. Choking means that no requests will be answered, and interested means that the peer is interested in downloading pieces of the file from the other peer. This means that each peer needs four Boolean values for each connection to keep track of the state. • am_interested • am_choking • peer_interested • peer_choking All connections start out as not interested and choking for both peers. Clients should keep the am_interested value updated continuously, and report changes to the other peer. The messages sent after the handshaking are structured as: [message length as an integer] [single byte describing message type] [payload] Keep alive messages are sent with regular intervals, and they are simply a message with length 0, and no type or payload. Type 0, 1, 2, 3 are choke, unchoke, interested and not interested respectively. All of them have length 1 and no payload. These messages simply describe changes in state. Type 4 is a have. This message has length = 5, and a payload that is a single integer, giving the integer index of which piece of the file the peer has successfully downloaded and verified. Division of Computer Engineering Page 35
  • 36. Type 5 is bitfield. This message is only sent directly after handshake. It contains a bitfield representation of which pieces the peer has. The payload is of variable length, and consists of a bitmap, where byte 0 corresponds to piece 0-7, byte 1 to piece 8-15 etc. A bit set to 1 represents having the piece. Peers that have no pieces can neglect to send this message. Type 6 is a request. The payload consists of three integers, piece index, begin and length. The piece index decides within which piece the client wants to download, begin gives the byte offset within the piece, and length gives the number of bytes the client wants to download. Length is usually a power of two. Type 7 is a block. This message follows a request. The payload contains piece index, length and the data itself that was requested. Type 8 is cancel. This message has the same payload as request messages, and it is used to cancel requests made. Peers should continuously update their interested status to neighbours, so that clients know which peers will begin downloading when unchoked. Division of Computer Engineering Page 36
  • 37. Chapter 6 VULNERABILITIES OF BITTORRENT 6.1Attacks on BitTorrent As we have seen so far, BitTorrent is one of most favoured file transfer protocol in today’s world. But it has been exposed to various attacks in the recent past due to the vulnerabilities that are being exploited by the hacker community. Here are some of the attacks that are commonly seen. 6.1.1 Pollution attack 1. The peers receive the peer list from the tracker. 2. One peer contacts the attacker for a chunk of the file. 3. The attacker sends back a false chunk. 4. This false chunk will fail its hash and will be discarded. 5. Attacker requests all chunks from swarm and wastes their upload bandwidth. Pollution attacks have become increasingly popular and have been used by anti-piracy groups. In 2005 HBO used pollution attacks to prevent people from downloading their show Rome. 6.1.2 DDOS attack DDOS stands for Distributed denial of service. This attack is possible because of the fact that BitTorrent Tracker has no mechanism for validating peers. This means there is no way to trace the culprit in these kind of attacks. Also attacks of this stature are possible because of the modifications that can be done to the client software. 1. The attacker downloads a large number of torrent files from a web server. 2. The attacker parses the torrent files with a modified BitTorrent client and spoofs his IP address and port number with the victims as he announces he is joining the swarm. Division of Computer Engineering Page 37
  • 38. 3. As the tracker receives requests for a list of participating peers from other clients it sends the victims IP and port number. 4. The peers then attempt to connect to the victim to try and download a chunk of the file. 6.1.3 Bandwidth Shaping Many ISPs don’t encourage the use of BitTorrent from their users. This is because BitTorrent is usually used to transfer large sized files due to which the traffic over the ISPs increase to a large extent. To avoid such exploding traffic on their servers many ISPs have started to avoid the traffic caused by BitTorrent. This can be done by sniffing the packets that pass through and detecting whether they oblige BitTorrent protocol. ISPs make use of filters to find out such packets and block them from passing their servers. This has resulted in many file transfer breakdowns across the world. 6.2 Solutions Many of the attacks that BitTorrent suffers have been dealt with and some measures have been taken to avoid such attacks. Here are a few solutions to the attacks that were discussed above. 6.2.1 Pollution attack The peers which perform such attacks are identified by tracing their IPs. Then, such IPs are blacklisted to avoid further communication with them. These blacklisted IPs are blocked by denying them connections with other peers. This is done by using software like Peer Guardian or moBlock, which download the list of blacklisted IPs from internet 6.2.2 DDOS attack The main solution to this kind of attack is to have clients parse the response from the tracker. In the case where a host (tracker) does not respond to a peer’s request with a valid BitTorrent protocol message it should be inferred that this host is not running BitTorrent. The peer should then exclude hat address from its tracker list, or set a high retry interval for that specific tracker. Another fix would be for web sites hosting torrents to check and report Division of Computer Engineering Page 38
  • 39. whether all trackers are active, or even remove the on-responding trackers from the tracker list in the torrent. Another measure could be to restrict the size of the tracker list to reduce the effectiveness of such an attack 6.2.3 Bandwidth Shaping There are broadly two approaches followed to counter this type of attacks. The first method is to encrypt the packets sent by the means of BitTorrent protocol. By doing this, the filters that sniff packets will not be able to detect such packets belonging to BitTorrent protocol. This means that the filters are fooled by the encrypted packets and thus packets can sneak through such filters. Another approach is to make use of tunnels. Tunnels are dedicated paths where the filters are avoided by using VPN software which connects to the unfiltered networks. This results in successfully bypassing the filters and thus the packets are guaranteed to be transmitted across networks. Division of Computer Engineering Page 39
  • 40. Chapter 7 CONCLUSION BitTorrent pioneered mesh-based file distribution that effectively utilizes all the uplinks of participating nodes. Most followon research used similar distributed and randomized algorithms for peer and piece selection, but with different emphasis or twists. This work takes a different approach to the mesh-based file distribution problem by considering it as a scheduling problem, and strives to derive an optimal schedule that could minimize the total elapsed time. By comparing the total elapsed time of BitTorrent and CSFD in a wide variety of scenarios, we are able to determine how close BitTorrent is to the theoretical optimum. In addition, the study of applicability of BitTorrent to real-time media streaming applications, shows that with minor modifications, BitTorrent can serve as an effective media streaming tool as well. BitTorrent’s application in this information sharing age is almost priceless. However, it is still not perfected as it is still prone to malicious attacks and acts of misuse. Moreover, the lifespan of each torrent is still not satisfactory, which means that the length of file distribution can only survive for a limited period of time. Thus, further analysis and a more thorough study in the protocol will enable one to discover more ways to improve it. Division of Computer Engineering Page 40
  • 41. Chapter 8 REFERENCES 1. BitTorrent Inc. (2006) http://www.bittorrent.com 2. BitTorrent.Org (2006) http://www.bittorrent.org/protocol.htm 3. Cohen, Bram (2003) Incentives Build Robustness in BitTorrent, May 22 2003 http://www.bitconjurer.org/BitTorrent/bittorrentecon.pdf 4. Cachelogic, BitTorrent bandwidth usage http://www.cachelogic.com/research/2005_slide06.php 5. Information on BitTorrent Protocol en.wikipedia.org/wiki/BitTorrent_(protocol) 6. BitTorrent FAQ: http://btfaq.com 7. BitTorrent Specifications http://wiki.theory.org/BitTorrentSpecification 8. Other Information http://www.dessent.net/btfaq/#compare Division of Computer Engineering Page 41