12. Why It Matters
• We rely on fresh data to make decisions
– Google searches
– Facebook profiles
– Twitter, Linked-In
• Outdated data has big impact on users
– Wrong profile information: confusion, embarrassment
– Old search results: bad business decisions, embarrassment
– Old document versions: costly business decisions, regulatory issues
14. Design:
Cloud Storage API
• Block Device
– Fixed block size (1Mb)
– Write(block number, block)
– Read(block number) block
• Easy to reason about the security
• File systems operate on top of this abstraction
B1 B2 B3 B4
Disk divided into 1MB blocks
15. Design:
System Architecture
Client
FPGA / ASIC Secure NVRAM
(Trusted) Chip
System Bus
(Untrusted)
Internet
(Untrusted)
CPU Disk RAM
(Untrusted) (Untrusted) Network Card
(Untrusted) (Untrusted)
16. Design:
Trusted Storage on Untrusted Disks
160-bit hash in trusted memory authenticates 1TB disk
Root Hash
Root hash matches
h7=h(h5||h6) iff all blocks match
20
levels
h5=h(h1||h2) Nodes hash
h6=h(h3||h4) their children
h1=h(B1) h2=h(B2) h3=h(B3) h4=h(B4) Leaves hash
their blocks
B1 B2 B3 B4
Disk divided into 1MB blocks
17. Design:
Hash Tree Caching
Node Hash Verified Left Right
number child child
1 fabe3c05d8ba995af93e Y Y N
2 e6fc9bc13d624ace2394 Y Y Y
The FPGA
caches hash
4 53a81fc2dcc53e4da819 Y N N
tree nodes
5 b2ce548dfa2f91d83ec6 Y N N
1
The untrusted OS is free to choose
the caching policy, for maximum 2 3
performance
4 5 6 7
18. Design:
Hash Tree Cache
• Server stores entire hash tree in RAM
• FPGA has a cache that stores a subset of nodes
• Server tells FPGA what nodes to store
Cache management commands
1 Node Hash Verified
1 fabe… Y
2 3 2 e6fc… Y
4 53a8… Y
4 5 6 7
5 b2ce… Y
19. Design:
Hash Tree Cache - Load
• Server tells the FPGA to load a node into a cache entry
• The cache entry is unverified right after a load
1 1
2 2
4 4 5
Node Hash Verified Node Hash Verified
1 fabe… Y 1 fabe… Y
2 e6fc… Y 2 e6fc… Y
4 53a8… N 4 53a8… N
5 b2ce… N
20. Design:
Hash Tree Cache - Verify
• Server tells the FPGA to use a node to verify its children
• FPGA checks that parent’s hash matches children hashes
1 1
2 2
4 5 4 5
Node Hash Verified Node Hash Verified
1 fabe… Y 1 fabe… Y
2 e6fc… Y 2 e6fc… Y
4 53a8… N 4 53a8… Y
5 b2ce… N 5 b2ce… Y
21. Design:
Hash Tree Cache - Efficiency
• Checking leaf 33 requires 10 node loads for a cold cache on
this toy example (38 loads on the real FPGA tree)
• Remember the root is always loaded in the cache
1
2 3
4 5
8 9
16 17
32 33
22. Design:
Hash Tree Cache - Efficiency
• Checking leaf 38 only 4 node loads, because 9 is already in
the cache and verified
• Server can predict client requests and manage cache for
high performance
1
2 3
4 5
8 9
16 17 18 19
32 33 38 39
24. Results:
System Architecture
Client
FPGA / ASIC Secure NVRAM
(Trusted) Chip
System Bus
(Untrusted)
Internet
(Untrusted)
CPU Disk RAM
(Untrusted) (Untrusted) Network Card
(Untrusted) (Untrusted)
31. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
32. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
33. Results: Prototype Performance (est.)
Read / Write 1MB Data Block to Disk Disk I/O Throughput
Limit: Disk I/O Speed
7,200 RPM HDD 70 MB/s
10,000 RPM HDD 100 MB/s
Hash 1MB Data Block
15,000 RPM HDD 130 MB/s
Limit: Hash Engine Speed Limit: FPGA Data Bus
SSD 250 MB/s
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
1 MB = 1 block
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
34. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
35. Results: Prototype Performance (est.)
Read / Write 1MB Data Block to Disk Operation Throughput
Limit: Disk I/O Speed Block Hash 800 MB/s
Pipelined 3,200 MB/s
Hash 1MB Data Block Block Hash
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
1 MB = 1 block
Limit: Hash Engine Speed Limit: Dependencies
Transport Throughput
Update Hash Tree (Writes Only)
PCI Express x16 4,096 MB/s
Limit: Hash Engine Speed Limit: Dependencies
SATA II 384 MB/s
HMAC (Sign) Result PCI Express x1 250 MB/s
Limit: Hash Engine Speed Ethernet 125 MB/s
36. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
37. Results: Prototype Performance (est.)
Read / Write 1MB Data Block to Disk Operation Throughput
Limit: Disk I/O Speed Tree Node Hash 1.25 M/s
Pipelined 5.0 M/s
Hash 1MB Data Block Tree Node Hash
Limit: Hash Engine Speed Limit: FPGA Data Bus Tree Operations 62.5 k/s
Optimized Tree 2.5 M/s
Load & Verify Hash Tree Nodes Operations
Limit: Hash Engine Speed Limit: Dependencies
1 MB = 1 block
Update Hash Tree (Writes Only) Transport Throughput
Limit: Hash Engine Speed Limit: Dependencies PCI Express x16 4,096 MB/s
SATA II 384 MB/s
HMAC (Sign) Result PCI Express x1 250 MB/s
Limit: Hash Engine Speed
Ethernet 125 MB/s
38. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
39. Results: Prototype Performance (est.)
Read / Write 1MB Data Block to Disk Operation Throughput
Limit: Disk I/O Speed Tree Node Hash 1.25 M/s
Pipelined 5.0 M/s
Hash 1MB Data Block Tree Node Hash
Limit: Hash Engine Speed Limit: FPGA Data Bus Tree Operations 62.5 k/s
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
1 MB = 1 block
Update Hash Tree (Writes Only) Transport Throughput
Limit: Hash Engine Speed Limit: Dependencies PCI Express x16 4,096 MB/s
SATA II 384 MB/s
HMAC (Sign) Result PCI Express x1 250 MB/s
Limit: Hash Engine Speed
Ethernet 125 MB/s
40. Results: Performance Block Diagram
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
HMAC (Sign) Result
Limit: Hash Engine Speed
41. Results: Prototype Performance (est.)
Read / Write 1MB Data Block to Disk Operation Throughput
Limit: Disk I/O Speed Node HMAC 1.25 M/s
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
1 MB = 1 block
Update Hash Tree (Writes Only) Transport Throughput
Limit: Hash Engine Speed Limit: Dependencies PCI Express x16 4,096 MB/s
SATA II 384 MB/s
HMAC (Sign) Result PCI Express x1 250 MB/s
Limit: Hash Engine Speed
Ethernet 125 MB/s
42. Results: Performance Block Diagram
• Steps are performed in
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
parallel (pipelined),
because they are in
Hash 1MB Data Block
different system
Limit: Hash Engine Speed Limit: FPGA Data Bus components
• However, the slowest
Load & Verify Hash Tree Nodes
step is the bottleneck
Limit: Hash Engine Speed Limit: Dependencies
for the entire system
Update Hash Tree (Writes Only) • Each step can be made
Limit: Hash Engine Speed Limit: Dependencies faster by adding more
hardware (e.g. more
HMAC (Sign) Result disks), assuming cache
Limit: Hash Engine Speed
policies can scale up
44. Results: Photo Gallery Workload
10 • Modeled after data on
9 photo applications
8
7
• Real-Life
6
– Facebook’s #1 Feature
Block
5
– Google Picasa
4 – Flixter
3
2
• Special policy inspired
1
by Facebook Haystack
0 classifies photos, loads
0 5 10 15 20
Time cache predictively
45. Results: Map-Reduce Workload
30 • Index-generating Map-
Reduce
25
20 • Real-Life
– Google Pagerank
Block
15
– Facebook friend graph
(EdgeRank)
10
5 • Special policy that
takes advantage of
0 Map-Reduce access
0 5 10 pattern
Time
46. Results: Cache Hit Rates
• Applications: 2 users
1 collaborating on a file (ping-
pong), photo gallery
0.9 browsing, Map-Reduce job
0.8
• Cache policies: Speculative
Last-Recently Used,
0.7 Spec LRU
Facebook Haystack’s policy
Haystack
optimized for caching,
0.6 MR-Aware
policy optimized for Map-
Reduce access patterns
0.5 • Conclusion: no policy
works well on all
applications, so app server
must drive policy
47. Results: Protocol Overhead
• Client – Server Bandwidth overhead: 0.002%
– Operation: 1 HMAC (20 bytes) per 1MB = 0.002%
– Handshake: extra secret exchange piggybacks on SSL: 5%
• Latency overhead (1 client): 4%
– Without security: 8.2ms / request
– With security: 8.5ms / request
– Latency overhead = the latency of a very fast Internet hop
• No throughput overhead (N-clients)
– With or without security: 100MB/s
– Need 40 HDDs to saturate PCI-E x16, 52 HDDs to saturate FPGA
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
48. Results: Protocol Overhead
• Protocol is simple
enough to implement
on browser side
– Chrome
– Firefox
– Internet Explorer 10
• Easy integration in
existing Web
applications
• End-to-end security
50. Other Applications
• FPGA can be used to load user-specified circuits and
perform arbitrary computation with security guarantees
• Applications: encrypted image search, financial calculations
• Potential applications in highly regulated industries, e.g.
medical record keeping and processing, secure financial
services
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
51. Secure Computation:
Overview
Untrusted
computation: VM image
CPU cores
VM image Cloud
Task
Trusted Machine Circuit spec
computation: FPGA
Circuit spec LUTs
• Most code is untrusted, executes in a VM
• Trusted code is broken up into kernels which become
circuits deployed onto an FPGA
• If efficiency is not an issue, deploy a processor on the
FPGA, execute software securely
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY 6/9/2011
52. Secure Computation: Challenge
• Multi-tenancy is the key VM Hypervisor
to the cloud’s cost
Client 1 Client 2 Client 3
effectiveness VM VM VM
PCI Express
• FPGA can host different
applications running in FPGA controller
parallel
Client 2
Application
• Challenge: isolation Client 1
between applications, Application
just like a hypervisor Client 3
Application
53. Other Applications
• FPGA can be used to load user-specified circuits and
perform arbitrary computation with security guarantees
• Applications: encrypted image search, financial calculations
• Potential applications in highly regulated industries, e.g.
medical record keeping and processing, secure financial
services
54. Design:
FPGA Boot Sequence
random nonce
PKcard + Manufacturer Certificate
Check certificate against e-fuses
Check Pkcard against certificate
PUFsyndrome + SignPKcard(PUFsyndrome)
Compute SKfpga from PUFsyndrome
Root Hash + SignPKcard(nonce || Root Hash)
Verify signature
EncSKfpga(SKcard) + MACSKfpga(nonce || SKcard)
Verify MAC
55. Design:
Client Trust Model
• Each FPGA – NVRAM pair has a Endorsement Key (EK)
• Manufacturer certifies the public EK
• Client uses the public EK to encrypt a HMAC key, which
becomes its shared secret with the trusted hardware
Manufacturer
verify Endorsement sign
Client
Certificate
generate
HMAC key PubEK PrivEK
encrypt with PubEK
decrypt with
Encrypted HMAC key PrivEK
HMAC key
56. Design:
Hash Tree Security
1. Impossible to come up with a block B1’ such that B1 ≠ B1’
but h(B1) = h(B1’)
2. Impossible to come up with a node hash h1’ such that h1’
such that h1 ≠ h1’ but h(h1||h2) = h(h1’||h2)
Therefore, the root hash authenticates the entire contents of
the tree.
57. Design:
FPGA Boot Sequence Security
• Server OS transfers messages between FPGA and Trusted
Memory untrusted channel
• FPGA authenticates Trusted Memory using Manufacturer
Certificate, whose public key is burned into FPGA’s e-fuses
• Trusted Memory authenticates FPGA using its Physically
Unclonable Function (PUF)
• At manufacturing time, FPGA is paired with memory chip
• FPGA can be paired with new memory chip if necessary
58. Design:
Hash Tree Cache Security
• Server OS responsible for loading and verifying tree nodes
• Parent node hash verifies children nodes
• Reading a block requires the block’s leaf to be verified
• Writing a block requires the path from the block’s leaf to the
root to be loaded and verified
• A node can be loaded in at most one cache line, to prevent
replay attacks using stale node hashes