SlideShare uma empresa Scribd logo
1 de 22
Nilesh Salpe
A collection of computers that appear to its user as one computer.
Characteristics
 The computers operate concurrently.
 The computers fail independently.
 The computers do not share global clock.
 Examples
 Amazon.com
 Cassandra Database
 Is multi-core processor distributed system ?
 Is single core computer with peripherals (Wifi , printer , multiple displays etc. )distributed
system ?
 Distributed Storage
 Relational , Mongo , Cassandra, HDFS (Hadoop Distributed File System), Hbase , Redis
 Distributed Computation
 Hadoop, Spark, Storm , Akka , Apache Flink
 Distributed Synchronization
 NTP (Network time protocol ) , Vector Clocks
 Distributed Consensus
 Paxos, Zookeeper
 Distributed Messaging
 Apache Kafka , RabbitMQ
 Load Balancers
 Round Robin , weighted Round Robin , min load , weighted load , Session Aware etc.
 Serialization
 Protocol Buffers , Thrift , Avro etc.
 Single-Master storage
 Powerful machine and scaling up.
 Type of loads
Read heavy load
Write heavy load
Mixed read write load
 Scaling Strategies / Data distribution
 Read Replication (scaling out)
 Sharding (scaling out)
Master Node – Updates must pass through master node.
Followers Nodes – Gets asynchronous replication or data propagation from master
to follower nodes.
Read requests can be fulfilled by follower nodes. It increases over-all I/O of system.
Problems of such design :
 Increased complexity of replication.
 No guarantee of strong consistency.
 Read after write scenarios will not guarantee of latest value.
 Master node could be bottleneck for write request.
Model is suitable for read heavy work load.
Example : Google search engine , Relational Data base with cluster.
Maste
r
X =5
F1
X =5
Follower
X = 3 Follower
X =3
Data distribution techniques
Sharding
 Used in relational databases and distributed databases.
 Could be manual to completely automated based on scheme.
Consistent hashing
 Used in distributed data bases.
 It is automated .
 Used to partition data across multiple nodes based on some key or aspect.
 Techniques ranges from manual to automated sharding
 Functional Partitioning (burden on client)
Example : Store all user data on one node and transaction data on another node.
 Horizontal partitioning (popular)
 Ranges
 Hashes
 Directory
 Vertical partitioning (less popular)
 Data belongs to same set/table/relation is distributed across nodes.
Value Node
10 F1
N1
N2
X = 10
N3
N4
Shard
Router
 Shard routing layer to distribute writes/reads
 More complexity
 Routing layer , awareness of network topology , handle dynamic cluster
 Limited data model
 Every data model should have key which is used for routing.
 Limited data access patterns
 All read/write/update/delete queries must include the key.
 Redundant data for more access models.
 For accessing data with more than one key , Data need to be stored by those keys so multiple copies or de-
normalized data .
 Need to consider data access patterns before designing the models.
 Too much scatter gather for aggregations , read might slow down system.
 OLTP will slow down the system.
 Number of shards need to be decided early in system design.
 In case of hash function of in-memory map , when we re-hash after load factor
increases after certain threshold we have to re-hash all keys . So modular hash
function will not work in case when number of buckets of map are changing
dynamically .
 Consistent hashing is technique used to limit the reshuffling of keys when a hash
table structure is rebalanced .(Dynamically changing say number of buckets ).
 Hash space is shared by key hash space and virtual node hash space.
 Keys/virtual nodes are hashed to same value despite of number of physical nodes.
Only difference is they stored on different physical nodes.
 Advantages
 Avoids re-hashing of all keys when nodes leave and join.
 Example : For Cassandra there are 128 virtual nodes per physical node.
A , B are physical nodes mapped to 8 virtual nodes A , B ,C,D are physical nodes mapped to 8 virtual nodes
Node Hash
A (0-8),(16-24) ,(32-40) ,
(48-56)
B (8-16) , (24-32) ,(40-
48) ,(56-64)
Node Hash
A 0-8),(32-40)
B (8-16), (40-48)
C (16-24), (48-56)
D (24-32) , (56-64)
Remove Node C,D
50% keys are affected
Add Node C,D
50% keys are affected
Lets say we have hash function which gives 8 bit hash. So hash space is 2^8 = 64.
hash(John) = 00111100 hash(Jane) = 00011000 . Hashes are assigned to node segment ahead of node in
CWD.
 Eventual Consistency
 Consistency Tuning
R+W > N
R – Number of replicas to responds for successful reading
W – Number of replicas to respond for successful writing/updates
N – Number of replicas
 Failures
 Node offline
 Network latency
 GC like process making node un-responsive
 Hinted Handoffs , read repairs
 Huge impact on design ( write then read scenarios)
 CAP
 Consistency - Every request gets most recent value after (write/update/delete)
 Availability – Every request receives response without error .(No guarantee of
most recent value)
 Partition Tolerance – The system continues to respond despite of arbitrary
number of nodes fails. (can not communicate with other nodes temporarily or
permanently due to network partition , congestion or communication delays ,
GC pause in case of JVM)
 Ground Reality
 Partition Tolerance is must. Nobody wants data loss.
 Practical choice is always choosing between Consistency and Availability.
 Example:
 Amazon S3 services chooses availability over consistency so it is A-P system.
 In relational database
 Two phase commit in distributed relational databases. (suffer throughput)
 ACID properties of transaction in relational databases .
A – Atomic , Transactions ( bundle of statements) completes all or nothing.
C – Consistency , Keep database in valid state before and after transaction.
I – Isolation , Transactions acts like it is alone working the data.(serializable , repeatable
reads, read committed, read uncommitted ( dirty reads , phantom reads) )
D- Durability , Once transaction committed , changes are permanent.
 Can roll-back like that transaction as if did not happen
 Options in distributed storage systems
 Lighter transactions are supported like update if present etc.
 Write-off (no money back not guarantee of delivery )
 Re-try (try with exponential time interval )
 Compensating actions (say revert credit card payment)
 Distributed transactions (2PL) (slow you down.)
 Main reason to scarifies transactions is availability.
 Impact on design of applications using distributed storage.
 Aspects to consider
 Scale
 Transactional Needs
 Highly available
 Design to failures
Storage options and scenarios
 Relational databases
 Strong transactional requirements (OLTP systems)
 NoSQL
 Giant distributed hash table (which can not fit on single machine) with
nested keys.
 Key value stores Map<K,V>
 Document databases Map<K,{k1:v1,k2:v2,…}> , value is generally of type JSON or some
kind of serializable/de- serializable format or binary file.
 Columnar databases SortedMap<<K1,K2,K3..> , V>
 Graph databases AdjacencyMap<K ,[K1,K2,K3..]> , lots of small relations or links
 Search Engines , lots of indexes based on search requirements Map<K1,K> ,Map<K2,K>
,Map<K3,K> .. actual raw document storage Map<K,V>
 Distributed Computation …
Basics of Distributed Systems - Distributed Storage

Mais conteúdo relacionado

Mais procurados

Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-llachudhivi
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2Stefanie Zhao
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data eraBill GU
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Aksh54
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)Spark Summit
 
Meet Hadoop Family: part 1
Meet Hadoop Family: part 1Meet Hadoop Family: part 1
Meet Hadoop Family: part 1caizer_x
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data ModellingKnoldus Inc.
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm treesTilak Patidar
 
Meet Hadoop Family: part 3
Meet Hadoop Family: part 3Meet Hadoop Family: part 3
Meet Hadoop Family: part 3caizer_x
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesTilak Patidar
 
Distributed Caching - Cache Unleashed
Distributed Caching - Cache UnleashedDistributed Caching - Cache Unleashed
Distributed Caching - Cache UnleashedAvishek Patra
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developerAnton Kirillov
 

Mais procurados (20)

MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2
 
Cloud Spanner
Cloud SpannerCloud Spanner
Cloud Spanner
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Corbett osdi12 slides (1)
Corbett osdi12 slides (1)
 
Cassandra Insider
Cassandra InsiderCassandra Insider
Cassandra Insider
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Meet Hadoop Family: part 1
Meet Hadoop Family: part 1Meet Hadoop Family: part 1
Meet Hadoop Family: part 1
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm trees
 
Meet Hadoop Family: part 3
Meet Hadoop Family: part 3Meet Hadoop Family: part 3
Meet Hadoop Family: part 3
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
 
Distributed Caching - Cache Unleashed
Distributed Caching - Cache UnleashedDistributed Caching - Cache Unleashed
Distributed Caching - Cache Unleashed
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developer
 

Semelhante a Basics of Distributed Systems - Distributed Storage

Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityRenato Lucindo
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsVineet Gupta
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for ConcurrencySusan Potter
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMSkoolkampus
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJoseph Kuo
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...javier ramirez
 

Semelhante a Basics of Distributed Systems - Distributed Storage (20)

Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Cassandra
CassandraCassandra
Cassandra
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web Systems
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for Concurrency
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Basics of Distributed Systems - Distributed Storage

  • 2. A collection of computers that appear to its user as one computer. Characteristics  The computers operate concurrently.  The computers fail independently.  The computers do not share global clock.  Examples  Amazon.com  Cassandra Database  Is multi-core processor distributed system ?  Is single core computer with peripherals (Wifi , printer , multiple displays etc. )distributed system ?
  • 3.  Distributed Storage  Relational , Mongo , Cassandra, HDFS (Hadoop Distributed File System), Hbase , Redis  Distributed Computation  Hadoop, Spark, Storm , Akka , Apache Flink  Distributed Synchronization  NTP (Network time protocol ) , Vector Clocks  Distributed Consensus  Paxos, Zookeeper  Distributed Messaging  Apache Kafka , RabbitMQ  Load Balancers  Round Robin , weighted Round Robin , min load , weighted load , Session Aware etc.  Serialization  Protocol Buffers , Thrift , Avro etc.
  • 4.  Single-Master storage  Powerful machine and scaling up.  Type of loads Read heavy load Write heavy load Mixed read write load  Scaling Strategies / Data distribution  Read Replication (scaling out)  Sharding (scaling out)
  • 5. Master Node – Updates must pass through master node. Followers Nodes – Gets asynchronous replication or data propagation from master to follower nodes. Read requests can be fulfilled by follower nodes. It increases over-all I/O of system. Problems of such design :  Increased complexity of replication.  No guarantee of strong consistency.  Read after write scenarios will not guarantee of latest value.  Master node could be bottleneck for write request. Model is suitable for read heavy work load. Example : Google search engine , Relational Data base with cluster.
  • 7. Data distribution techniques Sharding  Used in relational databases and distributed databases.  Could be manual to completely automated based on scheme. Consistent hashing  Used in distributed data bases.  It is automated .
  • 8.  Used to partition data across multiple nodes based on some key or aspect.  Techniques ranges from manual to automated sharding  Functional Partitioning (burden on client) Example : Store all user data on one node and transaction data on another node.  Horizontal partitioning (popular)  Ranges  Hashes  Directory  Vertical partitioning (less popular)
  • 9.  Data belongs to same set/table/relation is distributed across nodes.
  • 10. Value Node 10 F1 N1 N2 X = 10 N3 N4 Shard Router
  • 11.
  • 12.  Shard routing layer to distribute writes/reads  More complexity  Routing layer , awareness of network topology , handle dynamic cluster  Limited data model  Every data model should have key which is used for routing.  Limited data access patterns  All read/write/update/delete queries must include the key.  Redundant data for more access models.  For accessing data with more than one key , Data need to be stored by those keys so multiple copies or de- normalized data .  Need to consider data access patterns before designing the models.  Too much scatter gather for aggregations , read might slow down system.  OLTP will slow down the system.  Number of shards need to be decided early in system design.
  • 13.  In case of hash function of in-memory map , when we re-hash after load factor increases after certain threshold we have to re-hash all keys . So modular hash function will not work in case when number of buckets of map are changing dynamically .  Consistent hashing is technique used to limit the reshuffling of keys when a hash table structure is rebalanced .(Dynamically changing say number of buckets ).  Hash space is shared by key hash space and virtual node hash space.  Keys/virtual nodes are hashed to same value despite of number of physical nodes. Only difference is they stored on different physical nodes.  Advantages  Avoids re-hashing of all keys when nodes leave and join.  Example : For Cassandra there are 128 virtual nodes per physical node.
  • 14. A , B are physical nodes mapped to 8 virtual nodes A , B ,C,D are physical nodes mapped to 8 virtual nodes Node Hash A (0-8),(16-24) ,(32-40) , (48-56) B (8-16) , (24-32) ,(40- 48) ,(56-64) Node Hash A 0-8),(32-40) B (8-16), (40-48) C (16-24), (48-56) D (24-32) , (56-64) Remove Node C,D 50% keys are affected Add Node C,D 50% keys are affected Lets say we have hash function which gives 8 bit hash. So hash space is 2^8 = 64. hash(John) = 00111100 hash(Jane) = 00011000 . Hashes are assigned to node segment ahead of node in CWD.
  • 15.  Eventual Consistency  Consistency Tuning R+W > N R – Number of replicas to responds for successful reading W – Number of replicas to respond for successful writing/updates N – Number of replicas  Failures  Node offline  Network latency  GC like process making node un-responsive  Hinted Handoffs , read repairs  Huge impact on design ( write then read scenarios)
  • 16.  CAP  Consistency - Every request gets most recent value after (write/update/delete)  Availability – Every request receives response without error .(No guarantee of most recent value)  Partition Tolerance – The system continues to respond despite of arbitrary number of nodes fails. (can not communicate with other nodes temporarily or permanently due to network partition , congestion or communication delays , GC pause in case of JVM)  Ground Reality  Partition Tolerance is must. Nobody wants data loss.  Practical choice is always choosing between Consistency and Availability.  Example:  Amazon S3 services chooses availability over consistency so it is A-P system.
  • 17.
  • 18.  In relational database  Two phase commit in distributed relational databases. (suffer throughput)  ACID properties of transaction in relational databases . A – Atomic , Transactions ( bundle of statements) completes all or nothing. C – Consistency , Keep database in valid state before and after transaction. I – Isolation , Transactions acts like it is alone working the data.(serializable , repeatable reads, read committed, read uncommitted ( dirty reads , phantom reads) ) D- Durability , Once transaction committed , changes are permanent.  Can roll-back like that transaction as if did not happen  Options in distributed storage systems  Lighter transactions are supported like update if present etc.  Write-off (no money back not guarantee of delivery )  Re-try (try with exponential time interval )  Compensating actions (say revert credit card payment)  Distributed transactions (2PL) (slow you down.)  Main reason to scarifies transactions is availability.  Impact on design of applications using distributed storage.
  • 19.  Aspects to consider  Scale  Transactional Needs  Highly available  Design to failures
  • 20. Storage options and scenarios  Relational databases  Strong transactional requirements (OLTP systems)  NoSQL  Giant distributed hash table (which can not fit on single machine) with nested keys.  Key value stores Map<K,V>  Document databases Map<K,{k1:v1,k2:v2,…}> , value is generally of type JSON or some kind of serializable/de- serializable format or binary file.  Columnar databases SortedMap<<K1,K2,K3..> , V>  Graph databases AdjacencyMap<K ,[K1,K2,K3..]> , lots of small relations or links  Search Engines , lots of indexes based on search requirements Map<K1,K> ,Map<K2,K> ,Map<K3,K> .. actual raw document storage Map<K,V>

Notas do Editor

  1. The computers operate concurrently. - No shared things like CPU ,GPU , Memory etc. The computers fail independently. – Computers might fail due to hardware failures /power outages etc.
  2. Rough overview of all these in next session. Messaging is key part of observer /pub-sub design pattern. Paxos – protocol to resolve conflict of values between multiple machines.
  3. Lets start with our own e-commerce site with small customers.
  4. By Range Easy , No even distribution , it depends on data characteristics. By Hash Better distribution if hash function is good . Needs re-hashing of all keys and transfer of most of keys. Which increases internal network traffic. By directory Create virtual directory for say server and object but management is challenging as data increases.
  5. Used for tabular data or relational data base . Less common , as relation grows in row number not in column . Binary objects columns might be vertically partitioned .
  6. Exmaple Riak uses SHA-1 160 bit hash function . So hash space could be 2^160-1
  7. Companies do not use two phase commits for speed and throughput. XA transactions implementation of distributed transactions . Co-Ordinator node and participants nodes . Commit Request (voting phase) prepare phase Send query to all participants , execute query but do not commit Say yes Commit Phase If all say yes then commit else abort . Transaction Receive order Process payment Enque Order Process order Deliver the order Parallel Separate Systems Failures Payment failure Out of stock Hardware failures Services failure Global locks Holds on critical resource affecting the chain Light weight transactions – like update if exists etc .