SlideShare a Scribd company logo
1 of 43
Download to read offline
DISTRIBUTED STORAGE
SYSTEM
Mr. Dương Công Lợi
Company: VNG-Corp
Tel: +84989510016/+84908522017
Email:loidc@vng.com.vn/loiduongcong@gmail.com
CONTENTS
 1. What is distributed-computing system?
 2. Principle of distributed database/storage
system
 3. Distributed storage system paradigm
 4. Canonical problems in distributed systems
 5. Common solution for canonical problems in
distributed systems
 6. UniversalDistributedStorage
 7. Appendix
1. WHAT IS DISTRIBUTED-COMPUTING
SYSTEM?
 Distributed-Computing is the process of solving a
computational problem using a distributed
system.
 A distributed system is a computing system in
which a number of components on multiple
computers cooperate by communicating over a
network to achieve a common goal.
DISTRIBUTED DATABASE/STORAGE
SYSTEM
 A distributed database system, the database is
stored on several computers .
 A distributed database is a collection of multiple
, logic computer network .
DISTRIBUTED SYSTEM ADVANCE
 Advance
 Avoid bottleneck & single-point-of-failure
 More Scalability
 More Availability
 Routing model
 Client routing: client request to appropriate server to
read/write data
 Server routing: server forward request of client to
appropriate server and send result to this client
* can combine the two model above into a system
DISTRIBUTED STORAGE SYSTEM
 Store some data {1,2,3,4,6,7,8} into 1 server
 And store them into 3 distributed server
1,2,3,4,
6,7,8
1,2,3
4,6
7,8
2. PRINCIPLE OF DISTRIBUTED
DATABASE/STORAGE SYSTEM
 Shard data key and store it into appropriate
server use Distributed Hash Table (DHT)
 DHT must be consistent hashing:
 Uniform distribution of generation
 Consistent
 Jenkins, Murmur are the good choice;some else:
MD5, SHA slower
3. DISTRIBUTED STORAGE SYSTEM
PARADIGM
 Data Hashing/Addressing
 Determine server for data store in
 Data Replication
 Store data into multi server node for more
availability, fault-tolerance
DISTRIBUTED STORAGE SYSTEM
ARCHITECT
 Data Hashing/Addressing
 Use DHT to addressing server (use server-name) to a
number, performing it on one circle called the keys
space
 Use DHT to addressing data and find server store it
by successor(k)=ceiling(addressing(k))
 successor(k): server store k
0
server3
server1
server2
DISTRIBUTED STORAGE SYSTEM
ARCHITECT
 Addressing – Virtual node
 Each server node is generated to more node-id for
evenly distributed, load balance
Server1: n1, n4, n6
Server2: n2, n7
Server3: n3, n5, n8
0
server3
server1
server2
n7
n1
n5
n2
n4
n8
n3
n6
DISTRIBUTED STORAGE SYSTEM
ARCHITECT
 Data Replication
Data k1 store in server1 as master and store in
server2 as slave
0
server3
server1
server2
k1
4. CANONICAL PROBLEMS IN DISTRIBUTED
SYSTEMS
 Distributed transactions: ACID (Atomicity,
Consistency, Isolation, Durability) requirement
 Distributed data independence
 Fault tolerance
 Transparency
5. COMMON SOLUTION FOR CANONICAL
PROBLEMS IN DISTRIBUTED SYSTEMS
 Atomicity and Consistency with Two Phase
Commit protocal
 Distributed data independence with consistent
hashing algorithm
 Fault tolerance with leader election, multi
master and data replication
 Transparency with server routing, client seen
distributed system as a single server
TWO PHASE COMMIT PROTOCAL
 What is this?
 Two-phase commit is a transaction protocol designed for
the complications that arise with distributed resource
managers.
 Two-phase commit technology is used for hotel and
airline reservations, stock market transactions, banking
applications, and credit card systems.
 With a two-phase commit protocol, the distributed
transaction manager employs a coordinator to manage
the individual resource managers. The commit process
proceeds as follows:
TWO PHASE COMMIT PROTOCAL
 Phase1: Obtaining a Decision
 Step 1  Coordinator asks all participants to prepare
to commit transaction Ti.
 Ci adds the records <prepare T> to the log and
forces log to stable storage (a log is a file which
maintains a record of all changes to the database)
 sends prepare T messages to all sites where T
executed
TWO PHASE COMMIT PROTOCAL
 Phase1: Making a Decision
 Step 2  Upon receiving message, transaction
manager at site determines if it can commit the
transaction
 if not:
add a record <no T> to the log and send abort T
message to Ci
 if the transaction can be committed,
then:
1). add the record <ready T> to the log
2). force all records for T to stable storage
3). send ready T message to Ci
TWO PHASE COMMIT PROTOCAL
 Phase 2: Recording the Decision
 Step 1  T can be committed of Ci received a ready T
message from all the participating sites: otherwise T
must be aborted.
 Step 2  Coordinator adds a decision record, <commit
T> or <abort T>, to the log and forces record onto stable
storage. Once the record is in stable storage, it cannot
be revoked (even if failures occur)
 Step 3  Coordinator sends a message to each
participant informing it of the decision (commit or abort)
 Step 4  Participants take appropriate action locally.
TWO PHASE COMMIT PROTOCAL
 Costs and Limitations
 If one database server is unavailable, none of the
servers gets the updates.
 This is correctable through network tuning and
correctly building the data distribution through
database optimization techniques.
LEADER ELECTION
 Some leader election algorithm can use: LCR
(LeLann-Chang-Roberts), Pitterson, HS
(Hirschberg-Sinclair)
LEADER ELECTION
 Bully Leader Election algorithm
MULTI MASTER
 Multi-master replication
 Problem of multi-master replication
MULTI MASTER
 Solution, 2 candicate model:
 Two phase commit (always consistency)
 Asynchronize sync data among multi node
 Still active despite some node dies
 Faster than 2PC
MULTI MASTER
 Asynchronize sync data
 Data store to main master (called sub-leader), then
this data post to queue to sync to other master.
MULTI MASTER
 Asynchronize sync data
req1
req2
Server1
(leader )
Server2
data queue
req2: forward
X
UNIVERSALDISTRIBUTEDSTORAGE
a distributed storage system
6. UNIVERSALDISTRIBUTEDSTORAGE
 UniversalDistributedStorage is a distributed
storage system develop for:
 Distributed transactions (ACID)
 Distributed data independence
 Fault tolerance
 Leader election (decision for join or leave server node)
 Replicate with multiple master replication
 Transparency
UNIVERSALDISTRIBUTEDSTORAGE
ARCHITECTURE
 Overview
Bussiness
Layer
Distrib
uted
Layer
Storage
Layer
Bussiness
Layer
Distrib
uted
Layer
Storage
Layer
Bussiness
Layer
Distrib
uted
Layer
Storage
Layer
Server
UNIVERSALDISTRIBUTEDSTORAGE
ARCHITECTURE
 Internal Overview
Business
Layer
Distributed
Layer
StorageLayer
dataLocate(),
dataRemote()
Result(s)
localData()
Result{s}
Client request(s)
remote
queuing
ARCHITECTURE OVERVIEW
UNIVERSALDISTRIBUTEDSTORAGE
FEATURE
 Data hashing/addressing
 Use Murmur hashing function
UNIVERSALDISTRIBUTEDSTORAGE
FEATURE
 Leader election
 Use Bully Leader Election algorithm
UNIVERSALDISTRIBUTEDSTORAGE
FEATURE
 Multi-master replication
 Use asynchronize sync data among server nodes
UNIVERSALDISTRIBUTEDSTORAGE
STATISTIC
 System information:
 3 machine 8GB Ram, core i5 3,220GHz
 LAN/WAN network
 7 physical servers on 3 above mechine
 Concurrence write 16500000 items in 3680s, rate~
4480req/sec (at client computing)
 Concurrence read 16500000 items in 1458s, rate~
11320req/sec (at client computing)
* It doesn’t limit of this system, it limit at clients (this
test using 3 client thread)
Q & A
Contact:
Duong Cong Loi
loidc@vng.com.vn
loiduongcong@gmail.com
https://www.facebook.com/duongcong.loi
7. APPENDIX
APPENDIX - 001
 How to join/leave server(s)
1. join/leave
2. join/leave:forward
Leaderserver
4. broadcast result
3. process join/leave
Server A
Server B Server C
APPENDIX - 002
 How to move data when join/leave server(s)
 Make appropriate data for the moving
 Async data for the moving by thread, and control
speed of the moving
APPENDIX - 003
 How to detect Leader or sub-leader die
 Easy dectect by polling connection
APPENDIX - 004
 How to make multi virtual node for one server
 Easy generate multi virtual node for one server by
hash server-name
 Ex:
make 200 virtual node for server ‘photoTokyo’:
use hash value of: photoTokyo1, photoTokyo2, …,
photoTokyo200
APPENDIX - 005
 For fast moving data
 Use bloomfilter for dectect exist hash value of data-
key
 Use a storage for store all data-key for this local
server
APPENDIX - 006
 How to avoid network turnning
 Use client connection pool with screening strategy
before, it avoid many connection hanging when call
remote via network between two server

More Related Content

What's hot

Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsAya Mahmoud
 
A QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCHA QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCHecij
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik TambekarPratik Tambekar
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsNandakumar P
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 
resource management
  resource management  resource management
resource managementAshish Kumar
 
Distributed concurrency control
Distributed concurrency controlDistributed concurrency control
Distributed concurrency controlBinte fatima
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureAnkita Mahajan
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systemsijsrd.com
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systemsSHATHAN
 

What's hot (20)

CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Lecture 04 Chapter 1 - Introduction to Parallel Computing
Lecture 04  Chapter 1 - Introduction to Parallel ComputingLecture 04  Chapter 1 - Introduction to Parallel Computing
Lecture 04 Chapter 1 - Introduction to Parallel Computing
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
A QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCHA QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCH
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
Computer network solution
Computer network solutionComputer network solution
Computer network solution
 
Week2.1
Week2.1Week2.1
Week2.1
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik Tambekar
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 
Network Layer
Network LayerNetwork Layer
Network Layer
 
resource management
  resource management  resource management
resource management
 
Distributed concurrency control
Distributed concurrency controlDistributed concurrency control
Distributed concurrency control
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
C0312023
C0312023C0312023
C0312023
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systems
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Data communication q and a
Data communication q and aData communication q and a
Data communication q and a
 

Viewers also liked

my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"Shuai Yuan
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Micah Altman
 
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat DacHATCH! PROGRAM
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon Nexus, Inc.
 
Embedded Systems in Automobile
Embedded Systems in AutomobileEmbedded Systems in Automobile
Embedded Systems in AutomobileAbhishek Sutrave
 

Viewers also liked (8)

Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac
[HATCH! FAIR 2013] Decision Making for Startups - Mr. Nguyen Tat Dac
 
Network Coding
Network CodingNetwork Coding
Network Coding
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
Embedded Systems in Automobile
Embedded Systems in AutomobileEmbedded Systems in Automobile
Embedded Systems in Automobile
 

Similar to Distribute Storage System May-2014

Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.pptpadalamail
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
indroduction of rain technology
indroduction of rain technologyindroduction of rain technology
indroduction of rain technologynarayan dudhe
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptMohmdUmer
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systemsSri Prasanna
 
Distributed System Security Aspects
Distributed System Security AspectsDistributed System Security Aspects
Distributed System Security Aspectssmita gupta
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...Daniel Martin
 
Ms Tech Ed Best Practices For Exchange Server Cluster Deployments June 2003
Ms Tech Ed   Best Practices For Exchange Server Cluster Deployments June 2003Ms Tech Ed   Best Practices For Exchange Server Cluster Deployments June 2003
Ms Tech Ed Best Practices For Exchange Server Cluster Deployments June 2003Armando Leon
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptxlevichan1
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt omalreda
 

Similar to Distribute Storage System May-2014 (20)

MYSQL
MYSQLMYSQL
MYSQL
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Mcse notes
Mcse notesMcse notes
Mcse notes
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
indroduction of rain technology
indroduction of rain technologyindroduction of rain technology
indroduction of rain technology
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowaki
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
Distributed System Security Aspects
Distributed System Security AspectsDistributed System Security Aspects
Distributed System Security Aspects
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
 
Ms Tech Ed Best Practices For Exchange Server Cluster Deployments June 2003
Ms Tech Ed   Best Practices For Exchange Server Cluster Deployments June 2003Ms Tech Ed   Best Practices For Exchange Server Cluster Deployments June 2003
Ms Tech Ed Best Practices For Exchange Server Cluster Deployments June 2003
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 
GemFire In-Memory Data Grid
GemFire In-Memory Data GridGemFire In-Memory Data Grid
GemFire In-Memory Data Grid
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Distribute Storage System May-2014

  • 1. DISTRIBUTED STORAGE SYSTEM Mr. Dương Công Lợi Company: VNG-Corp Tel: +84989510016/+84908522017 Email:loidc@vng.com.vn/loiduongcong@gmail.com
  • 2. CONTENTS  1. What is distributed-computing system?  2. Principle of distributed database/storage system  3. Distributed storage system paradigm  4. Canonical problems in distributed systems  5. Common solution for canonical problems in distributed systems  6. UniversalDistributedStorage  7. Appendix
  • 3. 1. WHAT IS DISTRIBUTED-COMPUTING SYSTEM?  Distributed-Computing is the process of solving a computational problem using a distributed system.  A distributed system is a computing system in which a number of components on multiple computers cooperate by communicating over a network to achieve a common goal.
  • 4. DISTRIBUTED DATABASE/STORAGE SYSTEM  A distributed database system, the database is stored on several computers .  A distributed database is a collection of multiple , logic computer network .
  • 5. DISTRIBUTED SYSTEM ADVANCE  Advance  Avoid bottleneck & single-point-of-failure  More Scalability  More Availability  Routing model  Client routing: client request to appropriate server to read/write data  Server routing: server forward request of client to appropriate server and send result to this client * can combine the two model above into a system
  • 6. DISTRIBUTED STORAGE SYSTEM  Store some data {1,2,3,4,6,7,8} into 1 server  And store them into 3 distributed server 1,2,3,4, 6,7,8 1,2,3 4,6 7,8
  • 7. 2. PRINCIPLE OF DISTRIBUTED DATABASE/STORAGE SYSTEM  Shard data key and store it into appropriate server use Distributed Hash Table (DHT)  DHT must be consistent hashing:  Uniform distribution of generation  Consistent  Jenkins, Murmur are the good choice;some else: MD5, SHA slower
  • 8. 3. DISTRIBUTED STORAGE SYSTEM PARADIGM  Data Hashing/Addressing  Determine server for data store in  Data Replication  Store data into multi server node for more availability, fault-tolerance
  • 9. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Data Hashing/Addressing  Use DHT to addressing server (use server-name) to a number, performing it on one circle called the keys space  Use DHT to addressing data and find server store it by successor(k)=ceiling(addressing(k))  successor(k): server store k 0 server3 server1 server2
  • 10. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Addressing – Virtual node  Each server node is generated to more node-id for evenly distributed, load balance Server1: n1, n4, n6 Server2: n2, n7 Server3: n3, n5, n8 0 server3 server1 server2 n7 n1 n5 n2 n4 n8 n3 n6
  • 11. DISTRIBUTED STORAGE SYSTEM ARCHITECT  Data Replication Data k1 store in server1 as master and store in server2 as slave 0 server3 server1 server2 k1
  • 12. 4. CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS  Distributed transactions: ACID (Atomicity, Consistency, Isolation, Durability) requirement  Distributed data independence  Fault tolerance  Transparency
  • 13. 5. COMMON SOLUTION FOR CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS  Atomicity and Consistency with Two Phase Commit protocal  Distributed data independence with consistent hashing algorithm  Fault tolerance with leader election, multi master and data replication  Transparency with server routing, client seen distributed system as a single server
  • 14. TWO PHASE COMMIT PROTOCAL  What is this?  Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers.  Two-phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.  With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers. The commit process proceeds as follows:
  • 15. TWO PHASE COMMIT PROTOCAL  Phase1: Obtaining a Decision  Step 1  Coordinator asks all participants to prepare to commit transaction Ti.  Ci adds the records <prepare T> to the log and forces log to stable storage (a log is a file which maintains a record of all changes to the database)  sends prepare T messages to all sites where T executed
  • 16. TWO PHASE COMMIT PROTOCAL  Phase1: Making a Decision  Step 2  Upon receiving message, transaction manager at site determines if it can commit the transaction  if not: add a record <no T> to the log and send abort T message to Ci  if the transaction can be committed, then: 1). add the record <ready T> to the log 2). force all records for T to stable storage 3). send ready T message to Ci
  • 17. TWO PHASE COMMIT PROTOCAL  Phase 2: Recording the Decision  Step 1  T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted.  Step 2  Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage, it cannot be revoked (even if failures occur)  Step 3  Coordinator sends a message to each participant informing it of the decision (commit or abort)  Step 4  Participants take appropriate action locally.
  • 18.
  • 19. TWO PHASE COMMIT PROTOCAL  Costs and Limitations  If one database server is unavailable, none of the servers gets the updates.  This is correctable through network tuning and correctly building the data distribution through database optimization techniques.
  • 20. LEADER ELECTION  Some leader election algorithm can use: LCR (LeLann-Chang-Roberts), Pitterson, HS (Hirschberg-Sinclair)
  • 21. LEADER ELECTION  Bully Leader Election algorithm
  • 22.
  • 23. MULTI MASTER  Multi-master replication  Problem of multi-master replication
  • 24. MULTI MASTER  Solution, 2 candicate model:  Two phase commit (always consistency)  Asynchronize sync data among multi node  Still active despite some node dies  Faster than 2PC
  • 25. MULTI MASTER  Asynchronize sync data  Data store to main master (called sub-leader), then this data post to queue to sync to other master.
  • 26. MULTI MASTER  Asynchronize sync data req1 req2 Server1 (leader ) Server2 data queue req2: forward X
  • 28. 6. UNIVERSALDISTRIBUTEDSTORAGE  UniversalDistributedStorage is a distributed storage system develop for:  Distributed transactions (ACID)  Distributed data independence  Fault tolerance  Leader election (decision for join or leave server node)  Replicate with multiple master replication  Transparency
  • 33. UNIVERSALDISTRIBUTEDSTORAGE FEATURE  Leader election  Use Bully Leader Election algorithm
  • 34. UNIVERSALDISTRIBUTEDSTORAGE FEATURE  Multi-master replication  Use asynchronize sync data among server nodes
  • 35. UNIVERSALDISTRIBUTEDSTORAGE STATISTIC  System information:  3 machine 8GB Ram, core i5 3,220GHz  LAN/WAN network  7 physical servers on 3 above mechine  Concurrence write 16500000 items in 3680s, rate~ 4480req/sec (at client computing)  Concurrence read 16500000 items in 1458s, rate~ 11320req/sec (at client computing) * It doesn’t limit of this system, it limit at clients (this test using 3 client thread)
  • 36. Q & A Contact: Duong Cong Loi loidc@vng.com.vn loiduongcong@gmail.com https://www.facebook.com/duongcong.loi
  • 38. APPENDIX - 001  How to join/leave server(s) 1. join/leave 2. join/leave:forward Leaderserver 4. broadcast result 3. process join/leave Server A Server B Server C
  • 39. APPENDIX - 002  How to move data when join/leave server(s)  Make appropriate data for the moving  Async data for the moving by thread, and control speed of the moving
  • 40. APPENDIX - 003  How to detect Leader or sub-leader die  Easy dectect by polling connection
  • 41. APPENDIX - 004  How to make multi virtual node for one server  Easy generate multi virtual node for one server by hash server-name  Ex: make 200 virtual node for server ‘photoTokyo’: use hash value of: photoTokyo1, photoTokyo2, …, photoTokyo200
  • 42. APPENDIX - 005  For fast moving data  Use bloomfilter for dectect exist hash value of data- key  Use a storage for store all data-key for this local server
  • 43. APPENDIX - 006  How to avoid network turnning  Use client connection pool with screening strategy before, it avoid many connection hanging when call remote via network between two server