O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

Presentation for http://strataconf.com/strata2012/public/schedule/detail/22693

Many of the new online and device-oriented application models require a high degree of operational and development agility such as unlimited elastic scale and flexible data models. The nascent NoSQL market is aiming to address these requirements but is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing NoSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another. The SQL market on the other hand has a high level of maturity and at least conceptual standardization, but relational database systems were not originally designed for these requirements.

However, in contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus NoSQL. While the NoSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for “small” data as well. On the other hand, it is possible to scale relational SQL databases.

In this presentation, I will provide a short introduction to some architectural patterns that SQL-based solutions have been using to achieve scale and operational agility, contrast them with the NoSQL paradigms and show how SQL can be augmented with NoSQL paradigms at the platform level by using SQL Azure Federations as an example. I will also show how NoSQL offerings can benefit from the lessons learned with SQL.

What this all means is that NoSQL, BigData and SQL are not in conflict, like good and evil. Instead they are sometimes overlapping, but often complementary solutions that benefit from common paradigms addressing different requirements and can and will coexist.

  • Entre para ver os comentários

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

  1. 1. SQL AND NOSQL ARE TWO SIDES OF THESAME COINMichael Rys, Microsoft Corp.@SQLServerMike © 2012 MicrosoftStrata 2012 Conference, March 2012
  2. 2. AGENDA• Scaling out your business is important!• NoSQL Paradigms and NoSQL Platforms• SQL learns from NoSQL (with a demo of SQL Azure Federations)• NoSQL learns from SQL• Scalable Data Processing Platform of the Future
  3. 3. THE WEB 2.0 BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
  4. 4. SOCIAL NETWORKING: THE BUSINESS PROBLEM• 100s of million of users • 10s of million of users concurrently• Terabytes to petabytes of data • Structured and unstructured• Required (eventual) data consistency across users • E.g. show your updated state in your friends’ profile pages
  5. 5. SOLUTION• Shard/Partition user data across hundreds to thousands of SQL Databases• Propagate data changes from one DB to other DBs using reliable, async Message Service • Managing routes from each DB to every other DB would be too complex • Global Transactions would hinder scale and availability• Provide a caching layer for performance• And also used for o Clean-up state (e.g. on account close) o Deploy business logic (stored procedures)
  6. 6. EXAMPLE ARCHITECTURE1-1000 3001-4000 I change My DB Async gets updated my status Message Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message2001-3000 Async Message 1001-2000 TX4 TX54001-5000 5001-6000 Web Tier Data Tier
  7. 7. MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS• Patterns • Sharding and reliable messaging • Sharding and fan/out query layer • Caching layer• Customer Examples • Social Networking: Facebook, MySpace, etc • Online electronic stores (cannot give names ) • Travel reservation systems (e.g. Choice International) • MSN Casual Gaming • etc.
  8. 8. LESSONS LEARNED FROM THESE SCENARIOS• Require high availability• Be able to scale out: • Functional and Data Partitioning Architecture • Provide scale-out processing: o Function shipping o Fanout and Map/Reduce processing • Be able to deal with failures: o Quorum o Retries o Eventual Consistency (similar to Read-consistent Snapshot Isolation)• Be able to quickly grow and change: • Elastic scale • Flexible, open schema • Multi-version schema supportMove better support for these patterns into the Data Platform!
  9. 9. WHAT IS NOSQL ABOUT?• NoSQL = operational and developer agility at low CapEx and OpEx!• Low Cost • Free Open Source Stores, Community Support • Scale CapEx cost below customer growth rate • Web friendly developer model and tool chain, ease of use• Processing Paradigms • High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency) • Scale-out (Sharding, Map-Reduce, Elasticity) • Performance (tuned for specific workloads, Caching, co-located compute with partitioned state) • Tunable/Eventual Consistency• Data Model Paradigms • Data first: Flexible Schema • Low-impedance mismatch between programming and data model: o Key-Documents and Objects (BLOBS, JSON, XML, POJO) o Key-Wide Sparse Column Sets o Graphs (e.g., RDF)• Range from devices, over OLTP Web 2.0 applications to BigData Analytics
  10. 10. DATA MODELSData Model Example Stores (apologies to the ones I did not list)Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure CachingWide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columnsBLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTableJSON Documents MongoDB, CouchBase, Riak, RavenDBGraph Neo4J, GraphDB, HypergraphDB, Stig, IntellidimensionObjects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
  11. 11. WHAT CAN SQL LEARN FROM NOSQL?• Low CapEx, Low OpEx• Built-in tunable High-Availability• Data scale-out (Sharding)• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)• Flexible Data Models • JSON (& XML) support • Sparse columns/Column sets• Integrate with BigData Analytics (e.g., Hadoop)Many Relational Database Systems are incorporating these learning!
  12. 12. EXAMPLE: SQL AZURE FEDERATIONS• Provides Data Partitioning/Sharding at the Data Platform• Enables applications to build elastic scale-out applications• Provides non-blocking SPLIT/DROP for shards (MERGE to come later)• Auto-connect to right shard based on sharding keyvalue• Provides SPLIT resilient query mode
  13. 13. SQL AZURE FEDERATION CONCEPTS Federation Represents the data being sharded Azure DB with Federation Root Federation Root Federation Directories, Federation Database that logically houses federations, contains Users, Federation Distributions, … federation meta data Federation Key Value that determines the routing of a piece of data Federation “Orders_Fed” (defines a Federation Distribution) (Federation Key: CustomerID) Atomic Unit Member: PK [min, 100) All rows with the same federation key value: always together! AU AU AU PK=5 PK=25 PK=35 Federation Member (aka Shard) A physical container for a set of federated tables for a specific key range and reference tables Member: PK [100, 488) Federated Table AU AU AU Table that contains only atomic units for the PK=105 PK=235 PK=365 Connection member’s key range Gateway Reference Table Member: PK [488, max) Non-sharded table AU AU AU PK=555 PK=2545 PK=3565 Sharded 16 Application
  14. 14. DEMOMAP-REDUCE SCALE-OUT OVER SQLAZURE FEDERATIONS• Sharded GamesInfo table using SQL Azure Federations• Use a C# library that does implement a Map/Reduce processor on top SQL Azure Federations• Mapper and Reducer are specified using SQL 17
  15. 15. WHAT CAN NOSQL LEARN FROM SQL?• Flexible data is good, but: • Provide optional schema in data platform to help with constraints and optimizations• Procedural Scale-Out processing is good, but: • Develop a declarative language suited for and across the data models (e.g., coSQL) • Standardize suitable abstractions and languages• Eventual Consistency is good, but: • Provide users the choice• Simple Queries are good, but: • Provide me with secondary indexes • it will be more efficient to join between two collections of JSON documents in the query engine than in the Application layerMany NoSQL Database Systems are starting to incorporate these learnings!
  16. 16. THE WEB 2.0 BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
  17. 17. SCALE-OUT DATA PLATFORM ARCHITECTURE Readable Replica Primary Copy Shard ReadableOLTP Workloads Replica Traditional OLAP WorkloadsHighly Available known schemaHigh Scale Readable Data warehouse, “Star joins”High Flexibility Replica Primary Shard Dynamic OLAP Workloadsmostly touching 1 Readableto low number of Replica 3Vs (Volume, Velocity, Variety)shards Exploratory Readable Replica Primary Scale-out queries, often using Shard Query eventual consistent scale-out Readable frameworks like Hadoop Replica SQL or NoSQL Store
  18. 18. BIG DATA REQUIRES AN END-TO-END APPROACH21
  19. 19. CALL TO ACTION• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform • Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com• Engage with us throughout Strata Presentation Speaker Date and Time Do We Have the Tools We Need to Navigate Dave Campbell 2/29 9:00am PST the New World of Data? Onsite Interview * Tim O’Reilly, Dave Campbell 2/29 10:15am PST Unleash Insights on All Data With Microsoft Alexander Stojanovic 2/29 11:30am PST Big Data Office Hours (Q&A session) Dave Campbell 2/29 1:30pm PST Hadoop + Javascript: What We Learned Asad Khan 2/29 2:20pm PST Democratizing BI at Microsoft: 40,000 Users Kirkland Barrett 3/1 10:40am PST and Counting Data Marketplaces For Your Extended Piyush Lumba 3/1 2:20pm PST Enterprise• Download slides with additional information and related resources: http://www.slideshare.net/MichaelRys/presentations 22
  20. 20. APPENDIX 23
  21. 21. RELATED RESOURCES• Scale-Out with SQL Databases • http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale/ • Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql • http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations• NoSQL and the Windows Azure Platform • Whitepaper: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf • SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure- federations.aspx• Contact me • @SQLServerMike • http://sqlblog.com/blogs/michael_rys/default.aspx

×