“Experiences with billions of blobs across many blobstore providers.”
How Maginatics used Apache jclouds and architected MagFS to achieve broad blobstore portability and high scalability for the Maginatics Cloud Storage Platform, a cloud-optimized NAS filer.
Presented by Andrew Gaul at the Apache jclouds meetup on March 4, 2014.
3. What is blob storage?
Blobstores offer key-value storage that is:
• Scalable: 10s TB with few nodes and 100s of PB with
thousands of nodes
• Inexpensive: built on commodity hardware
• Available/durable: tolerates hardware failures
Do not offer guarantees that block storage and file systems
provide:
• Limited interface: get, put, delete
• Eventual consistency: blob reads may return stale or no data
for some limited time
Maginatics
3
4. jclouds supports many providers
Multiple public and private implementations allow customer
trade-offs.
Public Object Storage
Maginatics
Private Object Storage
4
5. Blobstore compatibility
jclouds abstracts differences between APIs, but semantic differences
remain:
• Atmos: cannot overwrite blob
• AWS-S3: cannot mutate or append to a blob, cannot put blob
without explicit size
• Swift: eventually consistent
Portable applications must use the lowest-common denominator
functionality:
• Write to blobs exactly once, never mutate or append
• Can read from blobs at any time, but must retry due to eventual
consistency
• When deleting, never reuse blob name
Maginatics
5
6. Maginatics Cloud Storage Platform (MCSP)
• Virtualized, cloud-based storage system
• Layers network file system semantics on top of blob storage
• Run any application on a variety platforms, including
multiple-client file sharing
• MCSP is a cloud-optimized NAS filer
• Smart client gives LAN performance over WANs
• Flexible deployment options: public, private, hybrid cloud
• Refer to SNIA SDC 2013 slides for technical background
Maginatics
6
7. Scaling Throughput
MCSP supports thousands of clients reading and writing
simultaneously.
Single server could become a bottleneck, especially smaller
instance sizes.
Instead vend signed URLs to clients to allow them direct access
to blobstore:
• Cryptographically signed URLs allow read or write access to
a specific blob for a specified time
• Can embed other properties like content length and hash
This technique allows a single MCSP server to mediate many
Gbit/s throughput!
Maginatics
7
8. Scaling Number of Blobs
MCSP manages 100 TB of blob data across 1 billion blobs.
Some providers require specific naming or sharding for best
performance:
• Atmos: no more than 100,000 blobs per directory, shard across
directories
• AWS-S3: name blob with unique prefixes
• Swift: no more than 1 million blobs per container, shard across
containers
• GCS & HPC: remove Expect: 100-continue
• Other quirks: Cleversafe performs better when disabling container
listing
Surprisingly challenging workload: removing all blobs from a large
container.
Maginatics
8
9. Scaling Blob Sizes
Most MCSP blobs have small sizes, but some use cases require
larger ones.
jclouds support up to 2 GB blobs across all blobstores:
• Could support 5 GB with Java 7
AWS-S3, Azure, and Swift support multi-part upload, tested
with 40 GB blobs:
Large blobs increase chances of transient network errors and
failures:
• Use a repeatable Payload like ByteSource to allow jclouds to
retry
• Always include MD5 checksum to guarantee data integrity
Maginatics
9
10. Lessons Learned
Cross-provider support required substantial effort:
• Long tail of issues with authentication, configuration, error
codes, timeouts, etc.
• S3- and Swift-compatible clones are like snowflakes, no two
are alike
Measuring performance is difficult:
• Blob naming and sharding important
• Public providers will reshard very active containers for
better performance
• Private blobstores require configuration and tuning
Mock blobstores (filesystem and transient) helped testing.
Maginatics
10
11. Future Directions
More diagnostic tools, especially for private blobstores.
• Maginatics will contribute benchmark tool and compatibility
tester
Modernize with Guava additions, e.g., ByteSource, Hashing,
MediaType.
Simplify implementation:
• De-async?
• Remove annotations?
New providers:
• Modernized Swift (in-progress)
• Google Cloud Storage (GSoC 2014?)
• Amazon Glacier
• Joyent Manta?
Maginatics
11
12. Recap
• jclouds can provide portability between blobstore providers
if your application does not strongly depend on blobstore
semantics
• Applications can scale with the correct architecture and
implementation choices
• More work to do to make jclouds an inviting platform for all
Java developers
• jclouds community helped Maginatics over the last three
years and we look forward to continuing to contribute
Maginatics
12