O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

初探AWS 平台上的 NoSQL 雲端資料庫服務

1.077 visualizações

Publicada em

本場專題演講將從介紹非關連式 (NoSQL) 資料庫開始,並將與關連式 (SQL) 資料庫進行比較。另外也將詳細探索 DynamoDB 的功能及效益,並探討如何充分發揮 DynamoDB 資料庫的潛力,以及介紹如何設計高效率索引、掃描及查詢作業,以及深入了解多項新近發布的功能,如 JSON 文件支援、DynamoDB Streams 及其他。本專題演講結束後,您將學到 DynamoDB 的基礎知識,以及本款高效能 NoSQL 資料庫的常見使用案例及效益。

Publicada em: Tecnologia
  • Login to see the comments

初探AWS 平台上的 NoSQL 雲端資料庫服務

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 蔣宗恩, Technical Account Manager, AWS Enterprise Support 2017/06 Getting Started with NoSQL Cloud Database Service on AWS
  2. 2. Agenda 1. What is NoSQL? 2. Relational (SQL) vs. non-relational? 3. What is DynamoDB? 4. DynamoDB Tables & Indexes 5. Scaling 6. Integration Capabilities 7. Demo
  3. 3. What is NoSQL?
  4. 4. Data volume since 2010 • 90% of stored data generated in last 2 years • 1 terabyte of data in 2010 equals 6.5 petabytes today • Linear correlation between data pressure and technical innovation • No reason these trends will not continue over time
  5. 5. Timeline of database technology DataPressure
  6. 6. What is NoSQL? NoSQL is a term to describe data stores that trade full ACID compliance for high availability and scale. A C I D tomicity onsistency solation urability Single row/single item only Eventual consistency Dirty Read Data replication on commodity storage
  7. 7. Why NoSQL? • Dirty Reads? • Eventual Consistency? • Single row transactions only? • Why would anybody trade ACID compliance for this?
  8. 8. Relational (SQL) vs. non-relational?
  9. 9. Relational vs. non-relational databases Traditional SQL NoSQL DB Primary Secondary Scale up DB DB DBDB DB DB Scale out
  10. 10. Scale Up vs Scale Out
  11. 11. The CAP Theorem Network partitions will happen in distributed systems: DB DBDB DB DB Consistency Availability Partition Tolerance C A P CA APCP
  12. 12. SQL vs. NoSQL schema design NoSQL design optimizes for compute instead of storage
  13. 13. Why NoSQL? Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad-hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL
  14. 14. What is DynamoDB?
  15. 15. RDBMS DynamoDB Amazon’s Path to DynamoDB
  16. 16. Amazon DynamoDB DynamoDB is a fully managed, NoSQL document and key value data store Predictable Performance Highly Available Massively Scalable Fully Managed Low Cost
  17. 17. Consistently low latency at scale PREDICTABLE PERFORMANCE!
  18. 18. WRITES Replicated continuously to 3 Availability Zones Persisted to disk (custom SSD) READS Strongly or eventually consistent No latency trade-off Designed to support 99.99% of availability Built for high durability High availability and durability
  19. 19. High availability and durability DynamoDB automatically partition data • Partition key spreads data (and workload) across partitions • Automatically partitions as data grows and throughput needs increase High-scale APP Large number of unique hash keys + Uniform distribution of workload across hash keys Partition 1..N
  20. 20. Fully managed service = automated operations DB hosted on premises DynamoDB
  21. 21. DynamoDB Tables & Indexes
  22. 22. DynamoDB table structure Table Items Attributes Partition key Sort key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values
  23. 23. 00 55 A954 FFAA00 FF Partition Keys Id = 1 Name = Jim Hash (1) = 7B Id = 2 Name = Andy Dept = Eng Hash (2) = 48 Id = 3 Name = Kim Dept = Ops Hash (3) = CD Key Space Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale
  24. 24. Partition 3 Partition:Sort Key uses two attributes together to uniquely identify an Item Within unordered hash index, data is arranged by the sort key No limit on the number of items (∞) per partition key Except if you have local secondary indexes Partition:Sort Key 00:0 FF:∞ Hash (2) = 48 Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7B Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD 55 A9:∞54:∞ AAPartition 1 Partition 2
  25. 25. Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Replica 1 Replica 2 Replica 3 Partition 1 Partition 2 Partition N
  26. 26. Local secondary index (LSI) Alternate sort key attribute Index is local to a partition key A1 (partition) A3 (sort) A2 (item key) A1 (partition) A2 (sort) A3 A4 A5 LSIs A1 (partition) A4 (sort) A2 (item key) A3 (projected) Table KEYS_ONLY INCLUDE A3 A1 (partition) A5 (sort) A2 (item key) A3 (projected) A4 (projected) ALL 10 GB max per partition key, i.e. LSIs limit the # of range keys!
  27. 27. Global secondary index (GSI) Alternate partition and/or sort key Index is across all partition keys Use composite sort keys for compound indexes A1 (partition) A2 A3 A4 A5 A5 (partition) A4 (sort) A1 (item key) A3 (projected) INCLUDE A3 A4 (partition) A5 (sort) A1 (item key) A2 (projected) A3 (projected) ALL A2 (partition) A1 (itemkey) KEYS_ONLY GSIs Table RCUs/WCUs provisioned separately for GSIs Online indexing
  28. 28. How do GSI updates work? Table Primary table Primary table Primary table Primary table Global Secondary Index Client 2. Asynchronous update (in progress) If GSIs don’t have enough write capacity, table writes will be throttled!
  29. 29. LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
  30. 30. Scaling
  31. 31. Scaling Throughput  Provision any amount of throughput to a table Size  Add any number of items to a table - Max item size is 400 KB - LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning
  32. 32. Throughput Provisioned at the table level  Write capacity units (WCUs) are measured in 1 KB per second  Read capacity units (RCUs) are measured in 4 KB per second - RCUs measure strictly consistent reads - Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent WCURCU
  33. 33. Partitioning Math In the future, these details might change… Number of Partitions By Capacity (Total RCU / 3000) + (Total WCU / 1000) By Size Total Size / 10 GB Total Partitions CEILING(MAX (Capacity, Size))
  34. 34. Partitioning Example Table size = 8 GB, RCUs = 5000, WCUs = 500 RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB RCUs and WCUs are uniformly spread across partitions Number of Partitions By Capacity (5000 / 3000) + (500 / 1000) = 2.17 By Size 8 / 10 = 0.8 Total Partitions CEILING(MAX (2.17, 0.8)) = 3
  35. 35. What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads  Hot keys/hot partitions  Very large bursts Mixing hot data with cold data  Use a table per time period From the example before:  Table created with 5000 RCUs, 500 WCUs  RCUs per partition = 1666.67  WCUs per partition = 166.67  If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle requests - Solution: Increase provisioned throughput
  36. 36. To learn more, please attend: Deep Dive on Amazon DynamoDB 3:55 p.m.– 4:35 p.m.
  37. 37. Integration Capabilities
  38. 38. DynamoDB Streams  Stream of table update  Asynchronous  Exactly once  Strictly ordered  24-hr lifetime per item Integration Capabilities DynamoDB Triggers  Implement as AWS lambda function  Your code scale automatically  Java, Node.js and Python
  39. 39. IAM  Fine-grained access control via AWS IAM  Table-,Item, and attribute- level access control Integration Capabilities ElasticSearch integration  Full-text queries  Add search to mobile app  Monitor IoT sensor status code  App telemetry pattern discovery using regular expressions
  40. 40. Reference Architecture
  41. 41. Demo
  42. 42. Architecture of a simple serverless web application AWS Identity & Access Management DynamoDBAPI Gateway JavaScript users Amazon S3 Bucket internet Lambda
  43. 43. bit.ly/NoSQLDesignPatterns

×