This document summarizes the CAREPO checkpoint project which aims to build a content-addressable repository for NDN content. It discusses segmenting files using Rabin fingerprinting, analyzing intra-file and inter-file similarity, a trust model using signatures, and presents results from a quick performance test comparing a content-addressable repository to a traditional one.
3. Workload
• Files: CCNx releases at http://www.ccnx.org/releases/
• 29 versions from 0.1.0 to 0.8.1, uncompressed TAR
• Platform: Ubuntu 12.04, NDNx 0.2
4. Segment Size
• Rabin fingerprint chunking: variable segment size
• On the network
• Small packets waste resource: router states, packet header overhead, etc
• Packet size is limited by OS kernel: 8800 octets
• Rabin configuration
• sliding window size: 31 octets
• average block size: 4096 octets
• min/max block size: [1024,8192] octets
5. Intra-file Similarity
• 2.6% segments are duplicates within a file, eg. license boilerplate, #include
• When downloading from remote repository, each unique segment only needs to be downloaded
once, using any segment number.
15000
10000
5000
0
total chunks
unique chunks
6. Inter-file Similarity
• Client has ALL previous versions: 55.3% segments need to be downloaded.
• Client has ONE prior version: 60.3% segments need to be downloaded.
• Duplicate segment percentage varies with each version.
14000
12000
10000
8000
6000
4000
2000
0
total chunks
new chunks from ONE prior
version
new chunks from ALL previous
versions
7. Trust Model
• Metadata is signed by publisher.
• Strong signatures are unnecessary on segments.
• Segment can be verified by hash as listed in metadata, regardless of whether segment is retrieved by
Hash Request or Name Request.
• ndngetfile expects valid signatures on segments.
• If we want to be compatible with legacy downloaders, we must sign segments.
8. Implementation Status
• caput, publisher program: implemented
• caget, downloader program: implemented
• car, repository program: not implemented
• We could publish segments and metadata to a regular ndnr repository, show their contents, and
download from remote repository.
• Current implementation can benefit from intra-file similarity only, and incurs overhead of hash
requests for every segment.
9. Quick Performance Test
• ccnx-0.5.0.tar, 47MB
• Client connects to server directly over UDP tunnel (2 IP hops)
• client to server: 0.5Mbps, 20ms delay
• server to client: 2.5Mbps, 20ms delay
• MTU: 9000 octets
• Content-Addressable repository
• caput: 9911 segments, 9696 unique segments
• caget: download time 183 seconds
• traditional repository
• ndnputfile: 11921 segments, fixed 4096 octets per segment
• ndngetfile: download time 194 seconds