O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Scalable Persistent Storage for Erlang Theory and Practice 
Amir Ghaffari Jon Meredith 
Natalia Chechina , Phil Trinder 
L...
Outline 
•RELEASE Project 
•General principles of scalable DBMSs 
•NoSQL DBMSs for Erlang 
•Riak 1.1.1 Scalability in Prac...
RELEASE project 
•RELEASE is an European project aiming to scale Erlang onto commodity architectures with 100,000 cores. 
3
RELEASE project 
The RELEASE consortium work at following levels: 
Virtual machine 
Language 
scalable Computation mode...
General principles of scalable DBMSs 
Data Fragmentation 
1.Decentralized model (e.g. P2P model) 
2.Systematic load balanc...
General principles of scalable DBMSs 
Replication 
1.Decentralized model (e.g. P2P model) 
2.Location transparency 
3.Asyn...
General principles of scalable DBMSs 
Consistency 
Availability 
Partition Tolerance 
ACID Systems 
Eventual Consistency 
...
NoSQL DBMSs for Erlang 
Mnesia 
CouchDB 
Riak 
Cassandra 
Fragmentation 
•Explicit placement 
•Client-server 
•Automatic b...
Initial Evaluation Results 
General Principles 
Initial Evaluation 
•Mnesia 
•CouchDB 
•Riak 
•Cassendra 
Scalable persist...
Riak Scalability in Practice 
•Basho Bench: a benchmarking tool for Riak 
•We measure Basho Bench on 348-node Kalkyl clust...
Node Organisation 
11 
Heuristic: one traffic generator per 3 Riak nodes
Traffic Generator 
12
Riak 1.1.1 Scalability 
Benchmark on 100-node cluster (800 cores) 
13
Failures 
14
Profiling Resource Usage 
15 
CPU Usage
Profiling Resource Usage 
16 
DISK Usage
Profiling Resource Usage 
17 
Memory Usage
Profiling Resource Usage 
18 
Network Traffic of Generator Nodes
Profiling Resource Usage 
19 
Network Traffic of Riak Nodes
Bottleneck for Riak Scalability 
CPU, RAM, Disk, and Network profiling reveal that they can't be bottleneck for Riak scala...
DE-Bench 
21 
•DE-Bench: a benchmarking tool for distributed Erlang 
•It is based on Basho Bench 
•Measures the throughput...
Distributed Erlang Commands 
22 
• Spawn/RPC: peer to peer commands 
• register_name : global name tables located on every...
DE-Bench’s P2P Design 
23 
Physical host 1 
Physical host 2
Frequency of Global Operation 
24 
Frequently Max Throughput 1% 30 nodes 0.5% 50 nodes 0.33% 70 nodes 0% 1600 nodes 
Globa...
Riak Software Scalability 
•Monitoring global.erl module from OTP library shows that Riak does NOT use any global operatio...
Eliminating the Bottlenecks 
•Independently, Basho identified that two supervisor processes, i.e. riak_kv_get/put_fsm_sup,...
Riak1.1.1 Elasticity 
Time-line shows Riak cluster losing and gaining nodes 
27
Riak1.1.1 Elasticity 
How Riak cluster deals with nodes leaving and joining 
28
Observation 
• Number of failures (37) 
• Number of successful operations (approximately 3.41 million) 
• When failed node...
Conclusion and Future work 
 Our benchmark confirms that Riak has a good elasticity. 
 We establish for the first time s...
References 
 Benchmarking Riak https://github.com/amirghaffari/benchmark_riak 
Basho Bench http://docs.basho.com/riak/la...
Próximos SlideShares
Carregando em…5
×

Scalable Persistent Storage for Erlang: Theory and Practice

1.048 visualizações

Publicada em

The RELEASE project at Glasgow University aims to improve the scalability of Erlang onto commodity architectures with 100,000 cores.

Such architectures require scalable and available persistent storage on up to 100 hosts. The talk describes the provision of scalable persistent storage options for Erlang.

We outline the theory and apply it to popular Erlang distributed database management systems (DBMS): Mnesia, CouchDB, Riak and Cassandra. We identify Dynamo-style NoSQL DBMS as suitable scalable persistent storage technologies. To evidence the scalability we benchmark Riak in practice, measuring the scalability and elasticity of Riak on 100-node cluster with 800 cores.

Publicada em: Software
  • Seja o primeiro a comentar

Scalable Persistent Storage for Erlang: Theory and Practice

  1. 1. Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith Natalia Chechina , Phil Trinder London Riak Meetup - October 22, 2013 1 http://www.release-project.eu
  2. 2. Outline •RELEASE Project •General principles of scalable DBMSs •NoSQL DBMSs for Erlang •Riak 1.1.1 Scalability in Practice •Investigating the scalability of distributed Erlang •Riak Elasticity •Conclusion & Future work 2
  3. 3. RELEASE project •RELEASE is an European project aiming to scale Erlang onto commodity architectures with 100,000 cores. 3
  4. 4. RELEASE project The RELEASE consortium work at following levels: Virtual machine Language scalable Computation model Scalable In-memory data structures Scalable Persistent data structures Infrastructure levels Profiling and refactoring tools 4
  5. 5. General principles of scalable DBMSs Data Fragmentation 1.Decentralized model (e.g. P2P model) 2.Systematic load balancing (make life easier for developer) 3.Location transparency 5 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 20kB e.g. 20k data is fragmented among 10 nodes
  6. 6. General principles of scalable DBMSs Replication 1.Decentralized model (e.g. P2P model) 2.Location transparency 3.Asynchronous replication (write is considered complete as soon as on node acknowledges it) 6 X e.g. Key X is replicated on three nodes . . X . . X . . X
  7. 7. General principles of scalable DBMSs Consistency Availability Partition Tolerance ACID Systems Eventual Consistency CAP theorem: cannot simultaneously guarantee: •Partition tolerance: system continues to operate despite nodes can't talk to each other •Availability: guarantee that every request receives a response •Consistency: all nodes see the same data at the same time Not achievable because network failures are inevitable 7 Solution: Eventual consistency and reconciling conflicts via data versioning ACID=Atomicity, Consistency, Isolation, Durability
  8. 8. NoSQL DBMSs for Erlang Mnesia CouchDB Riak Cassandra Fragmentation •Explicit placement •Client-server •Automatic by using a hash function •Explicit placement •Multi-server •Lounge is not part of each CouchDB node •Implicit placement •Peer to peer •Automatic by using consistent hash technique •Implicit placement •Peer to peer •Automatic by using consistent hash technique Replication •Explicit placement •Client-server •Asynchronous ( Dirty operation) •Explicit placement •Multi-server •Asynchronous •Implicit placement •Peer to peer •Asynchronous •Implicit placement •Peer to peer •Asynchronous Partition Tolerant •Strong consistency •Eventual consistency •Multi-Version Concurrency Control for reconciliation •Eventual consistency •Vector clocks for reconciliation •Eventual consistency •Use timestamp to reconcile Query Processing & Backend Storage •The largest possible Mnesia table is 4Gb •No limitation •Supports Map/Reduce Queries •Bitcask has memory limitation •LevelDB has no limitation •Supports Map/Reduce queries •No limitation •Supports Map/Reduce queries 8
  9. 9. Initial Evaluation Results General Principles Initial Evaluation •Mnesia •CouchDB •Riak •Cassendra Scalable persistent storage for SD Erlang can be provided by Dynamo-style DBMSs such as Riak,Cassandra 9
  10. 10. Riak Scalability in Practice •Basho Bench: a benchmarking tool for Riak •We measure Basho Bench on 348-node Kalkyl cluster •Scalability: How does adding more Riak nodes affect the throughput? •There are two kinds of nodes in a cluster: •Traffic generators •Riak nodes 10
  11. 11. Node Organisation 11 Heuristic: one traffic generator per 3 Riak nodes
  12. 12. Traffic Generator 12
  13. 13. Riak 1.1.1 Scalability Benchmark on 100-node cluster (800 cores) 13
  14. 14. Failures 14
  15. 15. Profiling Resource Usage 15 CPU Usage
  16. 16. Profiling Resource Usage 16 DISK Usage
  17. 17. Profiling Resource Usage 17 Memory Usage
  18. 18. Profiling Resource Usage 18 Network Traffic of Generator Nodes
  19. 19. Profiling Resource Usage 19 Network Traffic of Riak Nodes
  20. 20. Bottleneck for Riak Scalability CPU, RAM, Disk, and Network profiling reveal that they can't be bottleneck for Riak scalability. Is the Riak scalability limits due to limits in distributed Erlang? To find out, let’s measure the scalability of distributed Erlang. 20
  21. 21. DE-Bench 21 •DE-Bench: a benchmarking tool for distributed Erlang •It is based on Basho Bench •Measures the throughput of a cluster of Erlang nodes •Records the latency of distributed Erlang commands individually
  22. 22. Distributed Erlang Commands 22 • Spawn/RPC: peer to peer commands • register_name : global name tables located on every node • unregister_name : global name tables located on every node • whereis_name : a lookup in the local table Register Unregister Erlang VM Erlang VM Erlang VM Erlang VM Global name table Global name table Global name table Global name table
  23. 23. DE-Bench’s P2P Design 23 Physical host 1 Physical host 2
  24. 24. Frequency of Global Operation 24 Frequently Max Throughput 1% 30 nodes 0.5% 50 nodes 0.33% 70 nodes 0% 1600 nodes Global Operations limit the scalability of distributed Erlang
  25. 25. Riak Software Scalability •Monitoring global.erl module from OTP library shows that Riak does NOT use any global operation. •Instrumenting gen_server.erl module reveals that:  Of the 15 most time-consuming operations, only the time of rpc:call grows with cluster size. Moreover, of the five Riak RPC calls, only start_put_fsm function from module riak_kv_put_fsm_sup grows with cluster size. 25
  26. 26. Eliminating the Bottlenecks •Independently, Basho identified that two supervisor processes, i.e. riak_kv_get/put_fsm_sup, become bottleneck under heavy load, exhibiting build up in message queue length. •To improve the Riak scalability in version 1.3 and 1.4 Basho applied a number of techniques and introduced new library sidejob (https://github.com/basho/sidejob). 26
  27. 27. Riak1.1.1 Elasticity Time-line shows Riak cluster losing and gaining nodes 27
  28. 28. Riak1.1.1 Elasticity How Riak cluster deals with nodes leaving and joining 28
  29. 29. Observation • Number of failures (37) • Number of successful operations (approximately 3.41 million) • When failed nodes come back up, the throughput has grown that shows Riak1.1.1 has a good elasticity. 29
  30. 30. Conclusion and Future work  Our benchmark confirms that Riak has a good elasticity.  We establish for the first time scientifically the scalability limit of Riak 1.1.1 as 60 nodes.  We have shown how global operations limits the scalability of distributed Erlang. Riak scalability bottelnecks are eliminated in Riak versions 1.3 and upcoming versions.  In RELEASE, we are working to scale up distributed Erlang by grouping nodes in smaller partitions. 30
  31. 31. References  Benchmarking Riak https://github.com/amirghaffari/benchmark_riak Basho Bench http://docs.basho.com/riak/latest/ops/building/benchmarking/  DE-Bench https://github.com/amirghaffari/DEbench  A. Ghaffari, N. Chechina, P. Trinder, and J. Meredith. Scalable Persistent Storage for Erlang: Theory and Practice. In Proceedings of the Twelfth ACM SIGPLAN Workshop on Erlang, pages 73-74, September 2013. ACM Press.  Clusters at UPPMAX http://www.uppmax.uu.se/hardware  Sidejob https://github.com/basho/sidejob 31

×