O slideshow foi denunciado.

Sql or NoSql: that is the question...

0

Compartilhar

Próximos SlideShares
Need 4 speed
Need 4 speed
Carregando em…3
×
1 de 41
1 de 41

Sql or NoSql: that is the question...

0

Compartilhar

Baixar para ler offline

Descrição

Seeks to give you enough information about how "Nosql databases" works, to let you answer the question: "Make sense for my next project considering a NoSQL database ?"

Transcrição

  1. 1. 1 That is the question
  2. 2. { "_id": "555ae00a475a9b259281b21a", "name": "Nicola Galgano", "alias": "alikon", "gender": "male", "work": "DB consultant on banking systems", "company": "looking for a new one", "email": "info@alikonweb.it", "twitter": "@alikon", "address": "Roma, Italy, EU“, “current_hobby”:”run away from dentist” } 2
  3. 3. Henri Poincaré 3 Ipse dixit
  4. 4. 4
  5. 5. What is Big Data ? Big data is an all- encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data processing applications. From wikipedia 5
  6. 6. How much is Big data ? DVD 4.7 GB Human brain 2.5 PB LHC 1 PB/s Net traffic 1 ZB/year 6
  7. 7. Internet of Everything  IPv6 = 2^128 3,4e+38 7 IPv6 can address every quark in the world
  8. 8. 8
  9. 9. Structured / Unstructured Volume 9
  10. 10.  Volume  Velocity  Variety  Veracity 10
  11. 11. Availability Downtime/year Downtime/month Downtime/week 90 % (1 nine) 36.5 days 72 hours 16.8 hours 99 % (2 nines) 3.65 days 7.20 hours 1.68 hours 99,9 % (3 nines) 8.76 hours 43.8 minutes 10.1 minutes 99,99 % (4 nines) 52.56 minutes 4.38 minutes 1.01 minutes 99,999% (5 nines) 5.26 minutes 25.9 seconds 6.05 seconds 11
  12. 12. 12
  13. 13. Next Generation Databases mostly addressing some of the points:  non-relational  distributed  horizontal scalable  open-source From www.nosql-database.org 13
  14. 14.  Key / value  Column  Document  Graph 14
  15. 15. A data model is a rapresentation that we use to perceive and manipulate data 15 •Logic model •Normalization • 1NF,2NF,3NF,.. • E-R • Schema (rigid) • Algebra of sets •Impedance mismatch
  16. 16. 16 Schemaless (dynamic/implicit) Denormalization Aggregate Aggregates are the basic element of data storage
  17. 17. 17 Simple data model Blob/Opaque Only 3 API function • Get(key) • Set(key, value) • Delete(key) Key and value can be complex
  18. 18. More trasparent 18 JSON (JavaScript Object Notation) A lightweight data interchange format Easy for humans and machines to read and write
  19. 19. Column Sparse semi structured, sorted map. Flexible number of columns Column key can be grouped to family 19 How is stored
  20. 20.  Graph theory model G = ( V, E )  Store, map and query relationships 20 •Node connected by edges •Complex relationships •Recommend products •ACID Queries = graph traversal
  21. 21. The map job takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs) The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples 21 refers to 2 separate and distinct tasks Tasks runs in parallel
  22. 22. 22
  23. 23.  There are multiple ways to model data  How the data is going to be accessed  Read intensive or Write intensive  Complex queries 23 Schemaless Normalized Model
  24. 24. Vertical (up) Add more power (ram/cpu/disk) Horizontal (out) Add more commodity systems 24
  25. 25.  1. The network is reliable.  2. Latency is zero.  3. Bandwidth is infinite.  4. The network is secure.  5. Topology doesn't change.  6. There is one administrator.  7. Transport cost is zero.  8. The network is homogeneous. 25
  26. 26.  Split up data into multiple chunks  Store each chunk in a separate data node  Partitioning strategy “The shard key“  Multishard ops (Join/aggregate)  Load balancing 26
  27. 27.  Master / Slave  Multi / Master  Synchonous  Asynchonous  Provide redundancy  Increase availability  Failover (automatic) 27
  28. 28. 28 Maria NickData Get(X) T0 Get(X) T1 T2 Put(X) Put(X) T3
  29. 29. Transaction  A sequence of operations that form a single unit of work  Transaction have 4 properties  Atomic  Consistent  Isolated  Durable 29
  30. 30. ACID - Atomicity Transfer 100€ from A to B 1. Read(a) 2. If a > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 30
  31. 31. ACID - Consistency Transfer 100€ from A to B 1. Read(a) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 31
  32. 32. ACID - Isolation Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 32
  33. 33. ACID - Durability Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 33
  34. 34. Basically Available:  There will be a response to any request.  Fast response even if some replicas are slow or crashed Soft State:  The state of the system could change over time  It’s user application task to guarantee consistency Eventual consistent:  The system will eventually become consistent once it stops receiving input.  The data will propagate to everywhere 34
  35. 35.  Nick finds a cool photo and shares with Maria by posting on her Facebook wall  Nick asks Maria to check it out  Maria logs in her account, checks her Facebook wall but: - Nothing is there! (x apart)  Nick tells Maria to wait a bit and check out later  Maria waits for a minute or so and checks back: - She finds the photo Nick shared with her! 35
  36. 36.  It’s impossible for a distributed computer system to simultaneously provide all this three guarantees:  Consistency – all node see the same data at same time  Availability – all can always read and write  Partition tollerance – the system will work on failure*  A distributed system can satisfay only 2 at the same time 36
  37. 37. 37 Nick Maria Who will take the next flight ? EU US
  38. 38. 38  ATM will allow you to withdraw money even if the machine is partitioned from the network  Higher availability means higher revenue  However, it puts a limit on the amount of withdraw  The bank might also charge you a fee when a overdraft happens
  39. 39. In the absence of partitions how does the system trade off latency (L) and consistency (C)? 39
  40. 40. 40
  41. 41. ACID RDBMS BASE NOSQL  Strong consistency  Isolation  Transaction  Mature technology  SQL  Available & consistent  Scale up (limited)  Shared something (disk/ram/proc)  Weak consistenct (stale data)  Last write wins  Program managed  New technology  No standard  Available & partition tolerant  Scale out (unlimited*)  Shared nothing (parallelizable) 41

Descrição

Seeks to give you enough information about how "Nosql databases" works, to let you answer the question: "Make sense for my next project considering a NoSQL database ?"

Transcrição

  1. 1. 1 That is the question
  2. 2. { "_id": "555ae00a475a9b259281b21a", "name": "Nicola Galgano", "alias": "alikon", "gender": "male", "work": "DB consultant on banking systems", "company": "looking for a new one", "email": "info@alikonweb.it", "twitter": "@alikon", "address": "Roma, Italy, EU“, “current_hobby”:”run away from dentist” } 2
  3. 3. Henri Poincaré 3 Ipse dixit
  4. 4. 4
  5. 5. What is Big Data ? Big data is an all- encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data processing applications. From wikipedia 5
  6. 6. How much is Big data ? DVD 4.7 GB Human brain 2.5 PB LHC 1 PB/s Net traffic 1 ZB/year 6
  7. 7. Internet of Everything  IPv6 = 2^128 3,4e+38 7 IPv6 can address every quark in the world
  8. 8. 8
  9. 9. Structured / Unstructured Volume 9
  10. 10.  Volume  Velocity  Variety  Veracity 10
  11. 11. Availability Downtime/year Downtime/month Downtime/week 90 % (1 nine) 36.5 days 72 hours 16.8 hours 99 % (2 nines) 3.65 days 7.20 hours 1.68 hours 99,9 % (3 nines) 8.76 hours 43.8 minutes 10.1 minutes 99,99 % (4 nines) 52.56 minutes 4.38 minutes 1.01 minutes 99,999% (5 nines) 5.26 minutes 25.9 seconds 6.05 seconds 11
  12. 12. 12
  13. 13. Next Generation Databases mostly addressing some of the points:  non-relational  distributed  horizontal scalable  open-source From www.nosql-database.org 13
  14. 14.  Key / value  Column  Document  Graph 14
  15. 15. A data model is a rapresentation that we use to perceive and manipulate data 15 •Logic model •Normalization • 1NF,2NF,3NF,.. • E-R • Schema (rigid) • Algebra of sets •Impedance mismatch
  16. 16. 16 Schemaless (dynamic/implicit) Denormalization Aggregate Aggregates are the basic element of data storage
  17. 17. 17 Simple data model Blob/Opaque Only 3 API function • Get(key) • Set(key, value) • Delete(key) Key and value can be complex
  18. 18. More trasparent 18 JSON (JavaScript Object Notation) A lightweight data interchange format Easy for humans and machines to read and write
  19. 19. Column Sparse semi structured, sorted map. Flexible number of columns Column key can be grouped to family 19 How is stored
  20. 20.  Graph theory model G = ( V, E )  Store, map and query relationships 20 •Node connected by edges •Complex relationships •Recommend products •ACID Queries = graph traversal
  21. 21. The map job takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs) The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples 21 refers to 2 separate and distinct tasks Tasks runs in parallel
  22. 22. 22
  23. 23.  There are multiple ways to model data  How the data is going to be accessed  Read intensive or Write intensive  Complex queries 23 Schemaless Normalized Model
  24. 24. Vertical (up) Add more power (ram/cpu/disk) Horizontal (out) Add more commodity systems 24
  25. 25.  1. The network is reliable.  2. Latency is zero.  3. Bandwidth is infinite.  4. The network is secure.  5. Topology doesn't change.  6. There is one administrator.  7. Transport cost is zero.  8. The network is homogeneous. 25
  26. 26.  Split up data into multiple chunks  Store each chunk in a separate data node  Partitioning strategy “The shard key“  Multishard ops (Join/aggregate)  Load balancing 26
  27. 27.  Master / Slave  Multi / Master  Synchonous  Asynchonous  Provide redundancy  Increase availability  Failover (automatic) 27
  28. 28. 28 Maria NickData Get(X) T0 Get(X) T1 T2 Put(X) Put(X) T3
  29. 29. Transaction  A sequence of operations that form a single unit of work  Transaction have 4 properties  Atomic  Consistent  Isolated  Durable 29
  30. 30. ACID - Atomicity Transfer 100€ from A to B 1. Read(a) 2. If a > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 30
  31. 31. ACID - Consistency Transfer 100€ from A to B 1. Read(a) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 31
  32. 32. ACID - Isolation Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 32
  33. 33. ACID - Durability Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 33
  34. 34. Basically Available:  There will be a response to any request.  Fast response even if some replicas are slow or crashed Soft State:  The state of the system could change over time  It’s user application task to guarantee consistency Eventual consistent:  The system will eventually become consistent once it stops receiving input.  The data will propagate to everywhere 34
  35. 35.  Nick finds a cool photo and shares with Maria by posting on her Facebook wall  Nick asks Maria to check it out  Maria logs in her account, checks her Facebook wall but: - Nothing is there! (x apart)  Nick tells Maria to wait a bit and check out later  Maria waits for a minute or so and checks back: - She finds the photo Nick shared with her! 35
  36. 36.  It’s impossible for a distributed computer system to simultaneously provide all this three guarantees:  Consistency – all node see the same data at same time  Availability – all can always read and write  Partition tollerance – the system will work on failure*  A distributed system can satisfay only 2 at the same time 36
  37. 37. 37 Nick Maria Who will take the next flight ? EU US
  38. 38. 38  ATM will allow you to withdraw money even if the machine is partitioned from the network  Higher availability means higher revenue  However, it puts a limit on the amount of withdraw  The bank might also charge you a fee when a overdraft happens
  39. 39. In the absence of partitions how does the system trade off latency (L) and consistency (C)? 39
  40. 40. 40
  41. 41. ACID RDBMS BASE NOSQL  Strong consistency  Isolation  Transaction  Mature technology  SQL  Available & consistent  Scale up (limited)  Shared something (disk/ram/proc)  Weak consistenct (stale data)  Last write wins  Program managed  New technology  No standard  Available & partition tolerant  Scale out (unlimited*)  Shared nothing (parallelizable) 41

Mais Conteúdo rRelacionado

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

×