SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Brought to you by
Compacting Small Objects
in Cloud for High Yield
Tejas Chopra
Senior Software Engineer at
Agenda
■ Introduction
■ Problem statement & overview
■ Compaction basics
■ Data structures: blobs, blob tables
■ File system operations
■ Results
■ Conclusion
Introduction
■ Senior Software Engineer, Netflix
■ Keynote speaker: Distributed
systems, Cloud, Blockchain
■ Senior Software Engineer, Box
■ Datrium, Samsung, Cadence,
Tensilica
Overview
■ Cloud FS: Provide FS interface to objects
● Simple approach: Each file → object
● Cloud workloads: Many small files, bursty
■ Small sized files, not beneficial for Cloud
Storage
■ Request Level costs
● To upload 1GiB file with 4KiB PUTs is 57x the cost of
storing the file
■ Throttling
● S3 Best Practice Guideline: If > 300
PUT/DELETE/LIST/s or > 800 GET/s, S3 may rate limit
● 1 4MiB PUT is 1000x faster than 1000 4KiB PUTs
S3 Alternatives
■ EFS: Generic File system: Charges only for storage: $0.3/GB/month
■ DynamoDB: NoSQL, $0.25/GB/month for 1 write/s/month.
$0.09/GB/month for 1 read/s/month
■ S3: $0.023/GB/month, 0.0005c/PUT
■ To store 1 TiB worth of 4KiB files, 100 writes/s:
● S3 (as is): $1366; operation cost: 98%; 5 yrs of storage to match operation cost
● EFS: $307, operation cost: 0%
● DynamoDB: $303, operation cost: 16%; 5 days of storage to match operation cost
● S3 (with compaction): $24, operation cost: 0.001%; 27 seconds of storage to match
operation cost.
What is compaction?
■ Pluggable module on the client side
■ Module packs data into GiB sized blobs and an index of where the file is
within the blob
■ Redundant client side global index of all files in all blobs
■ Embedded index in every blob is for fault-tolerance
Data Structures: Blob
■ Single, immutable object on cloud
■ Data from multiple files is packed into a blob. Footer contains information
about the byte ranges of a file within the blob
■ By reading all footers of all compacted blobs, we can recreate the world
■ Could have used SSTables, but it requires sorting before persistence
Data Structures: Blob Table
■ Optimization over reading all blob footers
■ Global table that contains information of all files in all blobs
■ Redundant - can be recreated
■ Consistency is a challenge because Blob creation is distributed.
Compaction
■ Sufficient data to be accumulated before it is compacted
■ Data is buffered on client worker nodes, and is not durable until pushed to
cloud
■ Compaction policy is specified at mount time
■ Once data is compacted, embedded indices within the blob get updated
■ Blob name, objectId: <clientIP>:<FooterOffset>:<Date>
■ After blob is pushed to cloud, blob table is updated with new extents for a
file
■ Global blob table always maintains the updated loc for each file
● If file is buffered on worker, contains the worker’s IP
● If file is blob on cloud, contains the blob’s objectId and file’s extent in the blob
Reading Data
■ Depends on whether data is buffered or pushed to cloud
■ Blob Table contains the location of data
● If on a client worker, worker-worker transfer for read
■ Range reads for cloud blobs
● May want to fetch more than just what’s needed - essence of prefetching
Deletes and Renames
■ Deletes
● When client issues a delete, remove entry from Blob table
● Periodically check the liveness of a blob
● If live files are less than a threshold, repack the blob
■ Renames
● Atomic update of file’s name in Blob table
● Piggyback on future blobs’ embedded indices
● For recovery from embedded indices, blobs are read in order of creation, so latest blobs
will contain the rename information
Fault Tolerance
■ Master
● Periodically backup global Blob table to cloud
● On a crash, read last checkpointed Blob table and all the blobs written after that time
■ Worker
● Files buffered for compaction are lost
● Can be still recovered if persisted on the local disk
● If compacted and pushed to cloud, Blob table contains all information for recovery
● If compacted and pushed to cloud, but the Blob table not updated, the embedded indices
contain information for recovery
Results
Conclusion
■ Cloud file systems should aggressively compact small objects
■ Significant costs for operations and rate limiting
■ Throughput increased by 60x, costs reduced by 25000x
■ Backups of smaller files can be archived using this strategy to improve
backup times and costs
■ Compacting policies can compact objects by create-time, per-application,
etc. We could apply learning to choose objects to compact, objects with
similar read profiles can benefit from prefetching the entire blob
Brought to you by
Tejas Chopra
chopratejas@gmail.com
@chopra_tejas

Mais conteúdo relacionado

Mais procurados

OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 

Mais procurados (20)

Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
Data Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleData Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at Scale
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
How to Measure Latency
How to Measure LatencyHow to Measure Latency
How to Measure Latency
 
[POSS 2019] OVirt and Ceph: Perfect Combination.?
[POSS 2019] OVirt and  Ceph: Perfect Combination.?[POSS 2019] OVirt and  Ceph: Perfect Combination.?
[POSS 2019] OVirt and Ceph: Perfect Combination.?
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another Workload
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for All
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 

Semelhante a Object Compaction in Cloud for High Yield

Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
DoKC
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Andrejs Karpovs
 

Semelhante a Object Compaction in Cloud for High Yield (20)

week1slides1704202828322.pdf
week1slides1704202828322.pdfweek1slides1704202828322.pdf
week1slides1704202828322.pdf
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
To Serverless and Beyond
To Serverless and BeyondTo Serverless and Beyond
To Serverless and Beyond
 
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
 
Simplify Consolidation with Oracle Database 12c
Simplify Consolidation with Oracle Database 12cSimplify Consolidation with Oracle Database 12c
Simplify Consolidation with Oracle Database 12c
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 

Mais de ScyllaDB

Mais de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Object Compaction in Cloud for High Yield

  • 1. Brought to you by Compacting Small Objects in Cloud for High Yield Tejas Chopra Senior Software Engineer at
  • 2. Agenda ■ Introduction ■ Problem statement & overview ■ Compaction basics ■ Data structures: blobs, blob tables ■ File system operations ■ Results ■ Conclusion
  • 3. Introduction ■ Senior Software Engineer, Netflix ■ Keynote speaker: Distributed systems, Cloud, Blockchain ■ Senior Software Engineer, Box ■ Datrium, Samsung, Cadence, Tensilica
  • 4. Overview ■ Cloud FS: Provide FS interface to objects ● Simple approach: Each file → object ● Cloud workloads: Many small files, bursty ■ Small sized files, not beneficial for Cloud Storage ■ Request Level costs ● To upload 1GiB file with 4KiB PUTs is 57x the cost of storing the file ■ Throttling ● S3 Best Practice Guideline: If > 300 PUT/DELETE/LIST/s or > 800 GET/s, S3 may rate limit ● 1 4MiB PUT is 1000x faster than 1000 4KiB PUTs
  • 5. S3 Alternatives ■ EFS: Generic File system: Charges only for storage: $0.3/GB/month ■ DynamoDB: NoSQL, $0.25/GB/month for 1 write/s/month. $0.09/GB/month for 1 read/s/month ■ S3: $0.023/GB/month, 0.0005c/PUT ■ To store 1 TiB worth of 4KiB files, 100 writes/s: ● S3 (as is): $1366; operation cost: 98%; 5 yrs of storage to match operation cost ● EFS: $307, operation cost: 0% ● DynamoDB: $303, operation cost: 16%; 5 days of storage to match operation cost ● S3 (with compaction): $24, operation cost: 0.001%; 27 seconds of storage to match operation cost.
  • 6. What is compaction? ■ Pluggable module on the client side ■ Module packs data into GiB sized blobs and an index of where the file is within the blob ■ Redundant client side global index of all files in all blobs ■ Embedded index in every blob is for fault-tolerance
  • 7. Data Structures: Blob ■ Single, immutable object on cloud ■ Data from multiple files is packed into a blob. Footer contains information about the byte ranges of a file within the blob ■ By reading all footers of all compacted blobs, we can recreate the world ■ Could have used SSTables, but it requires sorting before persistence
  • 8. Data Structures: Blob Table ■ Optimization over reading all blob footers ■ Global table that contains information of all files in all blobs ■ Redundant - can be recreated ■ Consistency is a challenge because Blob creation is distributed.
  • 9. Compaction ■ Sufficient data to be accumulated before it is compacted ■ Data is buffered on client worker nodes, and is not durable until pushed to cloud ■ Compaction policy is specified at mount time ■ Once data is compacted, embedded indices within the blob get updated ■ Blob name, objectId: <clientIP>:<FooterOffset>:<Date> ■ After blob is pushed to cloud, blob table is updated with new extents for a file ■ Global blob table always maintains the updated loc for each file ● If file is buffered on worker, contains the worker’s IP ● If file is blob on cloud, contains the blob’s objectId and file’s extent in the blob
  • 10. Reading Data ■ Depends on whether data is buffered or pushed to cloud ■ Blob Table contains the location of data ● If on a client worker, worker-worker transfer for read ■ Range reads for cloud blobs ● May want to fetch more than just what’s needed - essence of prefetching
  • 11. Deletes and Renames ■ Deletes ● When client issues a delete, remove entry from Blob table ● Periodically check the liveness of a blob ● If live files are less than a threshold, repack the blob ■ Renames ● Atomic update of file’s name in Blob table ● Piggyback on future blobs’ embedded indices ● For recovery from embedded indices, blobs are read in order of creation, so latest blobs will contain the rename information
  • 12. Fault Tolerance ■ Master ● Periodically backup global Blob table to cloud ● On a crash, read last checkpointed Blob table and all the blobs written after that time ■ Worker ● Files buffered for compaction are lost ● Can be still recovered if persisted on the local disk ● If compacted and pushed to cloud, Blob table contains all information for recovery ● If compacted and pushed to cloud, but the Blob table not updated, the embedded indices contain information for recovery
  • 14. Conclusion ■ Cloud file systems should aggressively compact small objects ■ Significant costs for operations and rate limiting ■ Throughput increased by 60x, costs reduced by 25000x ■ Backups of smaller files can be archived using this strategy to improve backup times and costs ■ Compacting policies can compact objects by create-time, per-application, etc. We could apply learning to choose objects to compact, objects with similar read profiles can benefit from prefetching the entire blob
  • 15. Brought to you by Tejas Chopra chopratejas@gmail.com @chopra_tejas