Mais conteúdo relacionado


Einführung in MongoDB

  1. .NET User Group Bern Roger Rudin bbv Software Services AG
  2. Agenda – What is NoSQL – Understanding the Motivation behind NoSQL – MongoDB: A Document Oriented Database – NoSQL Use Cases
  3. What is NoSQL? NoSQL = Not only SQL
  4. NoSQL Definition NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent /BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
  5. Who Uses NoSQL? • Twitter uses DBFlock/MySQL and Cassandra • Cassandra is an open source project from Facebook • Digg, Reddit use Cassandra •, foursquare, sourceforge, and New York Times use MongoDB • Adobe, Alibaba, Ebay, use Hadoop
  7. Why SQL sucks.. • O/R mapping (also known as Impedance Mismatch) • Data-Model changes are hard and expensive • SQL database are designed for high throughput, not low latency • SQL Databases do no scale out well • Microsoft, Oracle, and IBM charge big bucks for databases – And then you need to hire a database admin • Take it from the context of Google, Twitter, Facebook and Amazon. – Your databases are among the biggest in the world and nobody pays you for that feature – Wasting profit!!!
  8. What has NoSQL done? • Implemented the most common use cases as a piece of software • Designed for scalability and performance
  9. Visual Guide To NoSQL
  10. NoSQL Data Models • Key-Value • Document-Oriented • Column Oriented/Tabular
  12. NoSQL Data Model: Document Oriented • Data is stored as “documents” • We are not talking about Word documents • Comparable to Aggregates in DDD • It means mostly schema free structured data • Can be queried • Is easily mapped to OO systems (Domain Model, DDD) • No join need to implement via programming
  13. Network Communications • REST/JSON • TCP/BSON (ClientDriver) BSON [bee · sahn], short for Binary JSON, is a binary-en- coded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
  14. Client Drivers (Apache License) • MongoDB currently has client support for the following programming languages: • C • C++ • Erlang • Haskell • Java • Javascript • .NET (C# F#, PowerShell, etc) • Perl • PHP • Python • Ruby • Scala
  15. Collections vs. Capped Collection (Table in SQL) • Collections • blog.posts • blog.comments • forum.users • etc. • Capped collections (ring buffer) • Logging • Caching • Archiving db.createCollection("log", {capped: true, size: <bytes>, max: <docs>});
  16. Indexes • Every field in the document can be indexed • Simple Indexes: db.cities.ensureIndex({city: 1}); • Compound indexes: db.cities.ensureIndex({city: 1, zip: 1}); • Unique indexes: db.cities.ensureIndex({city: 1, zip: 1}, {unique: true}); • Sort order: 1 = descending, -1 = ascending
  17. Relations • ObjectId db.users.insert( {name: "Umbert", car_id: ObjectId("<GUID>")}); • DBRef db.users.insert( {name: "Umbert", car: new DBRef("cars“, ObjectId("<GUID>")}); db.users.findOne( {name: "Umbert"}).car.fetch().name;
  18. Queries (1)
  19. Queries (Regular Expressions) {field: /regular.*expression/i} // get all cities that start with “atl” and end on “a” (e.g. atlanta) db.cities.count({city: /atl.*a/i});
  20. Queries (2) : LINQ Equals x => x.Age == 21 will translate to {"Age": 21} Greater Than, $gt: x => x.Age > 18 will translate to {"Age": {$gt: 18}} Greater Than Or Equal, $gte: x => x.Age >= 18 will translate to {"Age": {$gte: 18}} Less Than, $lt: x => x.Age < 18 will translate to {"Age": {$lt: 18}} Less Than Or Equal, $lte: x => x.Age <= 18 will translate to {"Age": {$lte: 18}} Not Equal, $ne: x => x.Age != 18 will translate to {"Age": {$ne: 18}}
  21. Atomic Operations (Optimistic Locking) • Update if current: • Fetch the object. • Modify the object locally. • Send an update request that says "update the object to this new value if it still matches its old value".
  22. Atomic Operations: Sample > t=db.inventory > s = t.findOne({sku:'abc'}) {"_id" : "49df4d3c9664d32c73ea865a" , "sku" : "abc" , "qty" : 1} > t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ; > db.$cmd.findOne({getlasterror:1}) {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it has worked > t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ; >db.$cmd.findOne({getlasterror:1}) {"err" : , "updatedExisting" : false , "n" : 0 , "ok" : 1} // did not work
  23. Atomic Operations: multiple items db.products.update( {cat: “boots”, $atomic: 1}, {$inc: {price: 10.0}}, false, //no upsert true //update multiple );
  24. Replica set (1) • Automatic failover • Automatic recovery of servers that were offline • Distribution over more than one Datacenter • Automatic nomination of a new Master Server in case of a failure • Up to 7 server in one replica set
  25. ReplicaSet RECOVERING Replica set (2) PRIMARY DOWN PRIMARY
  26. Mongo Sharding • Partitioning data across multiple physical servers to provide application scale-out • Can distribute databases, collections or objects in a collection • Choose how you partition data (shardkey) • Balancing, migrations, management all automatic • Range based • Can convert from single master to sharded system with 0 downtime • Often works in conjunction with object replication (failover)
  27. Sharding-Cluster
  28. Map Reduce • It is a two step calculation where one step is used to simplify the data, and the second step is used to summarize the data
  29. Map Reduce Sample
  30. Map Reduce using LINQ • LINQ is by far an easier way to compose map-reduce functions. // Compose a map reduce to get the sum everyone's ages. var sum = collection.AsQueryable().Sum(x => x.Age); // Compose a map reduce to get the age range of everyone grouped by the first letter of their last name. var ageRanges = from p in collection.AsQueryable() group p by p.LastName[0] into g select new { FirstLetter = g.Key, AverageAge = g.Average(x => x.Age), MinAge = g.Min(x => x.Age), MaxAge = g.Max(x => x.Age) };
  31. Store large Files: GridFS • The database supports native storage of binary data within BSON objects (limited in size 4 – 16 MB). • GridFS is a specification for storing large files in MongoDB • Comparable to Amazon S3 online storage service when using it in combination with replication and sharding
  32. Performance On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports. It means you can charge a lot less per user of your application and get the same revenue. Think about it.
  33. Performance • It’s the inserts where the differences are most obvious between MongoDB and SQL Server (about 30x-50x faster than SQL Server)
  34. Administration: MongoVUE (Windows)
  35. Administration: Monitoring • MongoDB Monitoring Service
  37. Use Cases: Well suited • Archiving and event logging • Document and Content Management Systems • E-Commerce • Gaming. High performance small read/writes, geospatial indexes • High volume problems • Mobile. Specifically, the server-side infrastructure of mobile systems • Projects using iterative/agile development methodologies • Real-time stats/analytics
  38. Use Cases: Less Well Suited • Systems with a heavy emphasis on complex transactions such as banking systems and accounting (multi-object transactions) • Traditional Non-Realtime Data Warehousing • Problems requiring SQL
  39. Questions?

Notas do Editor

  1. We are entering an age where data is live, hardware cheap and we need a new programming paradigm to access and process the data
  2. The new theory is based on the idea that RAM is the storage, Harddisk a backup, and you keep ten’s, hundred’s, if not thousand’s of servers in a LAN In the end results in blazing fast access times and incredible up times
  3. i: case insensitive m: multiline x: extended mode
  4. No upsert.. Soll das dokument erzeugt werden, wenn es nicht gefunden wurde.
  5. Shards meist aus replica sets bestehend Config servern, die die Metadaten des Clusters verwalten Mongos-Prozessen, die als router dienen
  6. use techday db.things.insert( { _id: 1, tags: ['dog', 'cat'] } ); db.things.insert( { _id: 2, tags: ['cat'] } ); db.things.insert( { _id: 3, tags: ['mouse', 'dog', 'cat'] } ); db.things.insert( { _id: 4, tags: [] } ); // map function m = function(){ this.tags.forEach( function(z){ emit(z, {count: 1} ); } ); }; // reduce function r = function(key, values){ var total = 0; for (var i = 0; i < values.length; i++) total += values[i].count; return {count: total}; }; res = db.things.mapReduce(m, r, {out: {inline: 1}}); res.find() res.drop()
  7. Wie amazon s3 für arme
  8. Mit MySQL hatte Sourceforge mit dem aktuellen user load die limite für die geforderte Performance erreicht. Dann haben sie MySQL mit MongoDB erstetzt und haben mit der scale out option den gleichen workload locker handlen können. Nach einigen Tests haben sie dann sogar herausgefunden, das sie jetzt 100 Mal die Menge der Benutzer handeln können. Das heisst, sie haben weniger kosten pro benutzer der applikation bei gleichem Umsatz!
  9. Show import from SQL Server
  10. Systeme mit hoher gewichtung von komplexen transactionen