O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Distribute Storage System May-2014

974 visualizações

Publicada em

Architect document for build distributed storage system and the good sample distributed storage from author

Publicada em: Tecnologia
  • I have done a couple of papers through ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ they have always been great! They are always in touch with you to let you know the status of paper and always meet the deadline!
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Distribute Storage System May-2014

  1. 1. DISTRIBUTED STORAGE SYSTEM Mr. Dương Công Lợi Company: VNG-Corp Tel: +84989510016/+84908522017 Email:loidc@vng.com.vn/loiduongcong@gmail.com
  2. 2. CONTENTS  1. What is distributed-computing system?  2. Principle of distributed database/storage system  3. Distributed storage system paradigm  4. Canonical problems in distributed systems  5. Common solution for canonical problems in distributed systems  6. UniversalDistributedStorage  7. Appendix
  3. 3. 1. WHAT IS DISTRIBUTED-COMPUTING SYSTEM?  Distributed-Computing is the process of solving a computational problem using a distributed system.  A distributed system is a computing system in which a number of components on multiple computers cooperate by communicating over a network to achieve a common goal.
  4. 4. DISTRIBUTED DATABASE/STORAGE SYSTEM  A distributed database system, the database is stored on several computers .  A distributed database is a collection of multiple , logic computer network .
  5. 5. DISTRIBUTED SYSTEM ADVANCE  Advance  Avoid bottleneck & single-point-of-failure  More Scalability  More Availability  Routing model  Client routing: client request to appropriate server to read/write data  Server routing: server forward request of client to appropriate server and send result to this client * can combine the two model above into a system
  6. 6. DISTRIBUTED STORAGE SYSTEM  Store some data {1,2,3,4,6,7,8} into 1 server  And store them into 3 distributed server 1,2,3,4, 6,7,8 1,2,3 4,6 7,8
  7. 7. 2. PRINCIPLE OF DISTRIBUTED DATABASE/STORAGE SYSTEM  Shard data key and store it into appropriate server use Distributed Hash Table (DHT)  DHT must be consistent hashing:  Uniform distribution of generation  Consistent  Jenkins, Murmur are the good choice;some else: MD5, SHA slower
  8. 8. 3. DISTRIBUTED STORAGE SYSTEM PARADIGM  Data Hashing/Addressing  Determine server for data store in  Data Replication  Store data into multi server node for more availability, fault-tolerance
  9. 9. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Data Hashing/Addressing  Use DHT to addressing server (use server-name) to a number, performing it on one circle called the keys space  Use DHT to addressing data and find server store it by successor(k)=ceiling(addressing(k))  successor(k): server store k 0 server3 server1 server2
  10. 10. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Addressing – Virtual node  Each server node is generated to more node-id for evenly distributed, load balance Server1: n1, n4, n6 Server2: n2, n7 Server3: n3, n5, n8 0 server3 server1 server2 n7 n1 n5 n2 n4 n8 n3 n6
  11. 11. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Data Replication Data k1 store in server1 as master and store in server2 as slave 0 server3 server1 server2 k1
  12. 12. 4. CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS  Distributed transactions: ACID (Atomicity, Consistency, Isolation, Durability) requirement  Distributed data independence  Fault tolerance  Transparency
  13. 13. 5. COMMON SOLUTION FOR CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS  Atomicity and Consistency with Two Phase Commit protocal  Distributed data independence with consistent hashing algorithm  Fault tolerance with leader election, multi master and data replication  Transparency with server routing, client seen distributed system as a single server
  14. 14. TWO PHASE COMMIT PROTOCAL  What is this?  Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers.  Two-phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.  With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers. The commit process proceeds as follows:
  15. 15. TWO PHASE COMMIT PROTOCAL  Phase1: Obtaining a Decision  Step 1  Coordinator asks all participants to prepare to commit transaction Ti.  Ci adds the records <prepare T> to the log and forces log to stable storage (a log is a file which maintains a record of all changes to the database)  sends prepare T messages to all sites where T executed
  16. 16. TWO PHASE COMMIT PROTOCAL  Phase1: Making a Decision  Step 2  Upon receiving message, transaction manager at site determines if it can commit the transaction  if not: add a record <no T> to the log and send abort T message to Ci  if the transaction can be committed, then: 1). add the record <ready T> to the log 2). force all records for T to stable storage 3). send ready T message to Ci
  17. 17. TWO PHASE COMMIT PROTOCAL  Phase 2: Recording the Decision  Step 1  T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted.  Step 2  Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage, it cannot be revoked (even if failures occur)  Step 3  Coordinator sends a message to each participant informing it of the decision (commit or abort)  Step 4  Participants take appropriate action locally.
  18. 18. TWO PHASE COMMIT PROTOCAL  Costs and Limitations  If one database server is unavailable, none of the servers gets the updates.  This is correctable through network tuning and correctly building the data distribution through database optimization techniques.
  19. 19. LEADER ELECTION  Some leader election algorithm can use: LCR (LeLann-Chang-Roberts), Pitterson, HS (Hirschberg-Sinclair)
  20. 20. LEADER ELECTION  Bully Leader Election algorithm
  21. 21. MULTI MASTER  Multi-master replication  Problem of multi-master replication
  22. 22. MULTI MASTER  Solution, 2 candicate model:  Two phase commit (always consistency)  Asynchronize sync data among multi node  Still active despite some node dies  Faster than 2PC
  23. 23. MULTI MASTER  Asynchronize sync data  Data store to main master (called sub-leader), then this data post to queue to sync to other master.
  24. 24. MULTI MASTER  Asynchronize sync data req1 req2 Server1 (leader ) Server2 data queue req2: forward X
  25. 25. UNIVERSALDISTRIBUTEDSTORAGE a distributed storage system
  26. 26. 6. UNIVERSALDISTRIBUTEDSTORAGE  UniversalDistributedStorage is a distributed storage system develop for:  Distributed transactions (ACID)  Distributed data independence  Fault tolerance  Leader election (decision for join or leave server node)  Replicate with multiple master replication  Transparency
  27. 27. UNIVERSALDISTRIBUTEDSTORAGE ARCHITECTURE  Overview Bussiness Layer Distrib uted Layer Storage Layer Bussiness Layer Distrib uted Layer Storage Layer Bussiness Layer Distrib uted Layer Storage Layer Server
  28. 28. UNIVERSALDISTRIBUTEDSTORAGE ARCHITECTURE  Internal Overview Business Layer Distributed Layer StorageLayer dataLocate(), dataRemote() Result(s) localData() Result{s} Client request(s) remote queuing
  30. 30. UNIVERSALDISTRIBUTEDSTORAGE FEATURE  Data hashing/addressing  Use Murmur hashing function
  31. 31. UNIVERSALDISTRIBUTEDSTORAGE FEATURE  Leader election  Use Bully Leader Election algorithm
  32. 32. UNIVERSALDISTRIBUTEDSTORAGE FEATURE  Multi-master replication  Use asynchronize sync data among server nodes
  33. 33. UNIVERSALDISTRIBUTEDSTORAGE STATISTIC  System information:  3 machine 8GB Ram, core i5 3,220GHz  LAN/WAN network  7 physical servers on 3 above mechine  Concurrence write 16500000 items in 3680s, rate~ 4480req/sec (at client computing)  Concurrence read 16500000 items in 1458s, rate~ 11320req/sec (at client computing) * It doesn’t limit of this system, it limit at clients (this test using 3 client thread)
  34. 34. Q & A Contact: Duong Cong Loi loidc@vng.com.vn loiduongcong@gmail.com https://www.facebook.com/duongcong.loi
  35. 35. 7. APPENDIX
  36. 36. APPENDIX - 001  How to join/leave server(s) 1. join/leave 2. join/leave:forward Leaderserver 4. broadcast result 3. process join/leave Server A Server B Server C
  37. 37. APPENDIX - 002  How to move data when join/leave server(s)  Make appropriate data for the moving  Async data for the moving by thread, and control speed of the moving
  38. 38. APPENDIX - 003  How to detect Leader or sub-leader die  Easy dectect by polling connection
  39. 39. APPENDIX - 004  How to make multi virtual node for one server  Easy generate multi virtual node for one server by hash server-name  Ex: make 200 virtual node for server ‘photoTokyo’: use hash value of: photoTokyo1, photoTokyo2, …, photoTokyo200
  40. 40. APPENDIX - 005  For fast moving data  Use bloomfilter for dectect exist hash value of data- key  Use a storage for store all data-key for this local server
  41. 41. APPENDIX - 006  How to avoid network turnning  Use client connection pool with screening strategy before, it avoid many connection hanging when call remote via network between two server