Mais conteúdo relacionado


NoSql - mayank singh

  1. Good Morning
  2. SQL Mayank Singh 1316110115 CSE - sec ’B’
  3. Agenda • What is NoSQL? • NoSQL Technology Breakdown • Where is NoSQL a Killer App? • What Good is Relational? • NoSQL and CouchDB • NoSQL, Relational, or Both?
  4. What is NoSQL ? •Common traits • Non-relational, Scalable • Non-schematized/schema-free • Eventual consistency • Open source • Distributed • “Web scale” • Developed at big Internet companies
  5. Consistency • CAP Theorem Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance • NoSQL does not offer “ACID” guarantees Atomicity, consistency, isolation and durability • Instead offers “eventual consistency” Similar to DNS propagation
  6. • Indexing • Most NoSQL databases are indexed by key • Some allow so-called “secondary” indexes • Often the primary key indexes are clustered • Hbase uses Hadoop Distributed File System, which is append-only • Writes are logged • Logged writes are batched • File is re-created and sorted • You get control back quikly • Queries • Typically no query language • Instead, create procedural program • Sometimes SQL is supported • Sometimes MapReduce code is used…
  7. MapReduce • Map step: split the query up • Reduce Most typical of Hadoop andstep: merge the results • used with Wide Column Stores, esp. Hbase • Amazon Web Services’ Elastic MapReduce (EMR) can read/write DynamoDB, S3, Relational Database Service (RDS) • “Hive”add-on to Hadoop offers a HiveQL (SQL-like) abstraction over MR • Use with Hive tables • Use with Hbase
  8. NoSQL Technology Breakdown • Key-Value Stores • The most common; not necessarily the most popular • Has rows, each with something like a big dictionary/associative array • Schema may differ from row to row • Common on Cloud platforms • e.g. Amazon SimpleDB, Azure Table Storage • MemcacheDB, Voldemort • DynamoDB (AWS), Dynomite, Redis and Riak
  9. Key-Value Stores
  10. Wide Column Stores • Has tables with declared column families • Each column family has “columns” which are KV pair that can vary from row to row • These are the most foundational for large sites • Big Table (Google) • Hbase (Originally part of Yahoo-dominated Hadoop project) • Cassandra (Facebook) • Calls column families “super columns” and tables “super column families” • They are the most “Big Data”-ready • Especially Hbase + Hadoop
  11. Wide Column Stores
  12. Document Stores • Have “databases,” which are akin to tables • Have “documents,” akin to rows • Documents are typically JSON objects • Each document has properties and values • Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) • Can have attachments as well • Old versions are retained • So Doc Stores work well for content management • Some view doc stores as specialized KV stores • Most popular with developers, startups, VCs • The biggies: • CouchDB • MongoDB
  13. Document Stores
  14. Document Store Application Orientation • Documents can each be addressed by URIs • CouchDB supports full REST interface • Very geared towards JavaScript and JSON • Documents are JSON objects • CouchDB/MongoDB use JavaScript as native language • In CouchDB, “view functions” also have unique URIs and they return HTML • So you can build entire applications in the database
  15. Graph Databases • Great for social network applications and others where relationships are important • Nodes and edges • Edge like a join • Nodes like rows in a table • Nodes can also have properties and values • Neo4j is a popular graph db
  16. Graph Databases
  17. • Source: SlideShare
  18. Where is NoSQL a Killer App? • Content Management • Document databases work really well here • Regular KV pairs can store meta data • Can also store text-based content • Attachments can store file-based or binary content • Versioning and URI addressability help as well • CouchDB gets called a “Web database” • Product Catalogs • Products in a catalog tend to have many attributes in common and then various others that are class-specific • Common • Product ID • Name • Description • Price • Key Value Stores and Wide Column Stores work well here
  19. • Social • Graph databases work best here • Great for tracking: • Networks • Followers • Group membership • Threaded interactions (comments, likes/favorites) • Great for Membership, Ownership • Avoids the self-joins and many-to-many tables necessary in relational DBs • Big Data • Wide Column and Key-Value Stores work best here • MapReduce is designed for this scenario • Hadoop and Hbase come up a lot • Sharding and append-only help here • Premise of analytics is reading data, not maintaining it.
  20. • Miscellaneous • Event-driven data (i.e. logs) • User profiles, preferences • Mail, status message streams • Other Web data • Automobile directions • Info for sites on maps (category, name, description, photo) • User reviews • Etc.
  21. What Good is Relational ? Transactional • support transactions • Business systems require atomic transactions • You can’t process an order without decrementing inventory • You can’t register a credit without its corresponding debit • No exceptions, no excuses Formal Schema • Regular processes have regular data • Stocks, trades • PO line items • Personnel records • Insurance policies • These need relational databases with declared schema • These don’t need MapReduce, document or graph representation
  22. • Banded Reporting • Operational reporting is based on detail and group sections with predictable, consistent layout, based on known schema • Very hard to design pixel-perfect reports against indeterminate schema • This highlights how operational business processes almost always require relational databases • Data Size • A well-defined query language • Mature development and administration tools • Denormalize the database
  23. NoSQL and CouchDB
  24. • Source: Microsoft Azure
  25. NoSQL, Relational, or Both? • Type of App • Really a question of consistency versus massive scale • Is this an internal system or a public one? • Is it an application for the data or data for a system? • Below a certain threshold of concurrent usage, NoSQL may be slower than relational • Skill Sets and Investment • Does your staff have RDBMS skills already? • Do you have significant investment in relational database hw/sw? • Lots of apps that use an RDBMS? • Do you want to support both? • Are you a startup? • Employ developers who possess NoSQL skills and prefer NoSQL?
  26. • Relational databases use Structured Querying Language (SQL), making them a good choice for applications that involve the management of several transactions. • NoSQL Limitations • In non-relational databases like Mongo, there are no joins like there would be in relational databases. • It also doesn’t automatically treat operations as transactions the way a relational database does, you must manually choose to create a transaction and then manually verify it, manually commit it or roll it back.
  27. Companies Using NoSQL DB
  28. References