O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Couchbase Architecture and IT Cost Management at UPS – Connect Silicon Valley 2018

287 visualizações

Publicada em

Speaker: Mike Ryder, Data Architect, UPS

Demanding requirements, the right technology to meet them – everything is coming together, but… what if the solution costs so much that it starts eroding the business case? How can a more economical solution be found? The one with fewer Couchbase nodes, yet matching or exceeding the capabilities of a larger cluster? The authors share a practical experience in Couchbase creative economics. Oh, and don’t take our word for it – you can experience the result on your mobile device, before you leave the room!

Key points
1) UPS use case + demo
2) Presenting a problem of node sprawl
3) Is all document content created equal? Indexing vs. payload
4) Compression and sharding techniques
5) Wins
6) Couchbase is evolving … feature wishlist to help contain node sprawl
7) Lessons learned

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Couchbase Architecture and IT Cost Management at UPS – Connect Silicon Valley 2018

  1. 1. Friend or Foe? Couchbase Architecture and IT Cost Management A Case Study in Controlling Node Sprawl. Konstantin Tadenev, UPS Mike Ryder, UPS
  2. 2. What are we going to cover today?  The use case at UPS  Problems associated with node sprawl  The architectural choices made and why  Wins  Features we’d like to see added  Lessons learned 2
  3. 3. Package tracking use case - Demo 3
  4. 4. Quality of Service Requirements  Changing state and serving inquiries for billions of packages  Top performance  Near-linear scalability  Flexibility in supporting new requirements  Billions of documents  Tens of terabytes of data  Tens of thousands of operations per second  30+ million document inserts and updates per day 4
  5. 5. Why Couchbase (three generations – tracking evolution)?  Generation 1 – traditional RDBMS storage and retrieval  Legacy, but better solutions available Network Client Database Application 5
  6. 6. Why Couchbase (three generations – tracking evolution)?  Generation 2 – addition of caching  Legacy relational with non-persistent elastic caching array Network Client Database Write-aside Cache Application 6
  7. 7. Why Couchbase (three generations – tracking evolution)?  Generation 3 – persistent caching with Couchbase  The data sleeps in relational databases, the data plays in Couchbase  The relational databases represent the system of record; Couchbase – system of engagement Network Client Couchbase Publish/Subscribe Source 7
  8. 8. What Drove the Number of Nodes?  Optimal Couchbase node size depends on the percentage of data cached. The following is true up to Couchbase 5.0  In maintenance operations, such as rebalance, If cached data >= 90% of all data, then the data is streamed directly from memory to the target  If cached data < 90%, then the data is paged from storage into memory before being sent to the target. This paging background process is referred to as backfills in Couchbase  Backfills are most stable, when nodes have < 2 TB of data, and when at least 15% of that data fits in memory  15% of 2TB is ~307GB, which conservatively maps to the typical 256GB RAM configuration  In our case cached data was below 90%, which led to limiting each node to 256GB of RAM.  The smaller the RAM per node, the more nodes are needed, even if the compute is over-allocated 8
  9. 9. Ripple Effects of Growing Number of Nodes in a Cluster  More nodes  higher probability of concurrent failures (servers, network, etc.)  more replicas are needed to safeguard against them  even more nodes  Typically clusters with fewer than 10 nodes can be safely operated with one replica. It is a good practice to configure larger clusters with a greater number of replicas 9
  10. 10. Not all document content is equal  Few elements are used as search predicates  The remaining elements are not interrogated, they are written and read as payload only { “recordReference": "testRecordResponse" }, “movement": { “senderNumber": "75AXXX", “shipmentUnitNumber": "1Z75A1E303662XXXXXX", “collectionDate": "20180123", “serviceName": "UPS Super", “serviceCode": "509", "billType": "P/P", "": "Prepaid (aka PRE)", "declaredBillTypeText": { "declaredValueFlag": "" }, "inquiryID": { "code": “81", "value": "1Z75A1E30366XXXXXX" }, "invoice": {}, “itemCounts": { "originalItemCount": "1", "expectedItemCount": "1", "actualItemCount": "1", "voidItemCount": "0" }, "referenceValue": {}, "service": {}, “movementType": { "code": "01", …. } 10
  11. 11. UPS Solution  Data size + Couchbase guidelines resulted in ~ 100 nodes  The compression idea:  Compress “payload” prior to ingestion into Couchbase and treat it as key-value pair. Note that the storage-side Snappy (Couchbase 5.0) compression is not useful for in- memory operations  Maintain all search predicates in indexable, uncompressed documents  The sharding idea:  Logically shard the data across multiple physical clusters as to maintain one-replica configuration (#shards >= #clusters)  Configure the clusters to be flexible for data growth  Limit the number of nodes in each cluster even with future data growth  Retain multiple smaller clusters to afford better maintainability  Balance workloads by moving shards among clusters 11
  12. 12. Wins Reduction in the number of nodes from ~100 to ~20 Other benefits:  Data compression ~80%  Parallel maintenance activities can be performed  Replica count kept to one  Compressed documents reduce infrastructure footprint  Sharding provides for growth without adding replicas or compromising maintainability 12
  13. 13. Features we’d like to see (1 of 3)  Native in-memory compression  UPS requested this capability in May 2017  Couchbase 5.5 introduced this feature in July 2018  Using the SNAPPY library for compression  XDCR traffic is also compressed  UPS would like to see further improvements with:  Better compression ratio  While maintaining high performance 13
  14. 14. Features we’d like to see (2 of 3)  SDK to manage sharding across multiple clusters  UPS submitted requirements for client-side distribution of KV data across clusters, similar to how data is distributed over vbuckets within a cluster, as well as an SDK-level scatter/gather implementation for View and N1QL queries. Couchbase request JDBC- 1084  Couchbase has not yet targeted this feature for a release 14
  15. 15. Features we’d like to see (3 of 3)  Increased node density  Customer requirements exist to enable Couchbase nodes with 1% or less in RAM residency. One of the critical factors in this scenario is rebalance speed and stability, which must be improved substantially to achieve higher node density. Couchbase request MB-23243  UPS also would like to see improvements in rebalance speed at the RAM residency levels of 15-25%.  Couchbase has not yet targeted this feature for a release 15
  16. 16. Lessons learned  In order to optimize a Couchbase cluster one needs to consider  Quality of Service requirements (e.g., data size, performance, responsiveness, etc.)  Couchbase best practices (e.g., RAM per node, number of replicas in relation to number of nodes, etc.)  Deployment considerations may drive Couchbase node sprawl:  Limiting RAM per node leads to more nodes  As the number of nodes grows, so does the optimal number of replicas, driving the number of nodes even higher  Number of nodes affects the cost of the solution, which in turn may erode the business case  Possible answers to node sprawl:  Compression  Sharding across multiple clusters 16
  17. 17. Question & Comments 17

×