DMDW Extra Lesson - NoSql and MongoDB

STUDIEREN UND DURCHSTARTEN. Author: Dip.-Inf. (FH) Johannes Hoppe Date: 06.05.2011

NoSQL and MongoDB Author: Dip.-Inf. (FH) Johannes Hoppe Date: 06.05.2011

Trends Data Facebook had 60k servers in 2010 Google had 450k servers in 2006 (speculated) Microsoft: between 100k and 500k servers (since Azure) Amazon: likely has a similar numbers, too (S3) Facebook Server Footprint 5

Trends Trend 1: increasing data sizes Trend 2: more connectedness (“web 2.0”) Trend 3:moreindividualization (feverstructure) 6

NoSQL Database paradigms Relational (RDBMS) NoSQL Key-Value stores Document databases Wide column stores (BigTable and clones) Graph databases Other 8

NoSQL Some NoSQL use cases 1. Massive data volumes Massively distributed architecture required to store the data Google, Amazon, Yahoo, Facebook… 2. Extreme query workload Impossible to efficiently do joins at that scale with an RDBMS 3. Schema evolution Schema flexibility (migration) is not trivial at large scale Schema changes can be gradually introduced with NoSQ 9

NoSQL - CAP theorem Requirements for distributed systems: Consistency Availability Partition tolerance 10

NoSQL - CAP theorem Consistency The system is in a consistent state after an operation All clients see the same data Strong consistency (ACID)vs. eventual consistency (BASE) ACID: Atomicity, Consistency, Isolation and Durability BASE: Basically Available, Soft state, Eventually consistent 11

NoSQL - CAP theorem Availability The system is “always on”, no downtime Node failure tolerance– all clients can find some available replica Software/hardware upgrade tolerance 12

NoSQL - CAP theorem Partition tolerance The system continues to function even when Split into disconnected subsets (by a network disruption) Not only for reads, but writes as well! 13

NoSQL CAP Theorem E. Brewer, N. Lynch You can satisfyat most 2 out of the 3 requirements 14

NoSQL CAP Theorem  CA Single site clusters(easier to ensure all nodes are always in contact) When a partition occurs, the system blocks e.g. usable for two-phase commits (2PC) which already require/use blocks 15

NoSQL CAP Theorem  CA Single site clusters(easier to ensure all nodes are always in contact) When a partition occurs, the system blocks e.g. usable for two-phase commits (2PC) which already require/use blocks Obviously, any horizontal scaling strategy is based on data partitioning; therefore, designers are forced to decide between consistency and availability. 16

NoSQL CAP Theorem  CP Some data may be inaccessible (availability sacrificed), but the rest is still consistent/accurate e.g. sharded database 17

NoSQL CAP Theorem  AP System is still available under partitioning,but some of the data returned my be inaccurate Need some conflict resolution strategy e.g. Master/Slave replication 18

NoSQL RDBMS Guaratnee ACID by CA(two-phasecommits) SQL Mature: 19

NoSQL NoSQL DBMS No relational tables No fixed table schemas No joins No risk, no fun! CP and AP (and sometimes even AP and on top of CP  MongoDB*) * This is damn cool! 20

NoSQL Key-value One key  one value, very fast Key: Hash (no duplicates) Value: binary object („BLOB“) (DB does not understand your content) Players: Amazon Dynamo, Memcached… 21

NoSQL key value ?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=% GIVE ME A MEANING! customer_22 22

NoSQL Document databases Key-value store, too Value is „understood“ by the DB Querying the data is possible(not just retrieving the key‘s content) Players: Amazon SimpleDB, CouchDB, MongoDB … 23

NoSQL key value { Type: “Customer”, Name: "Norbert“, Invoiced: 2222 } customer_22 24

NoSQL key value / documents { Type: "Customer", Name: "Norbert", Invoiced: 2222 Messages: [ { Title: "Hello", Text: "World" }, { Title: "Second", Text: "message" } ] } customer_22 25

NoSQL (Wide) column stores Often referred as “BigTable clones” Each key is associated with many attributes (columns) NoSQL column stores are actually hybrid row/column stores Different from “pure” relational column stores! Players: Google BigTable, Cassandra (Facebook), HBase… 26

NoSQL Won‘t be stored as: It will be stored as: 22;Norbert;22222 22;23;24 23;Hans;50000 Norbert;Hans;Franz 24;Franz;44000 22222;50000;44000 27

NoSQL Graph databases Multi-relational graphs SPARQL query language (W3C Recommendation!) Players: Neo4j, InfoGrid … (note: graph DBs are special and somehow the “black sheep” in the NoSQL world –the following PROs/CONs don’t apply very well) 28

NoSQL PROs (& Promisses) Scheme-free / semi-structured data Massive data stores Scaling is easy Very, very high availability Often simpler to implement (and OR Mappers aren’t required) „Web 2.0 ready“ 29

NoSQL CONSs NoSQL implementations often „alpha“, no standards Data consistency, no transactions, Insufficient access control SQL: strong for dynamic, cross-table queries (JOIN) Relationships aren‘t enforced (conventions over constrains – except for graph DBs (of course)) Premature optimization: Scalability (Don’t build for scalability if you never need it!) 30

NoSQL Lets rock! MongoDB Quick Reference Cards http://www.10gen.com/reference 32

Basic Deployment Create the default data directory in c:atab Start mongod.exe Optionally: mongod.exe --dbpath c:atab --port 27017 --logpath c:ataongodb.log Start the shell: mongo.exe 33

Data Import cd c:ba-training-dataata mongoimport -d twitter -c tweets twitter.json cd c:ba-training-dataataumpraining mongorestore -d training -c scores scores.bson cd c:ba-training-dataataump mongorestore -d diggdigg 34

MongoDB Documents (in the shell) use digg db.stories.findOne(); 36

JSON  BSON All JSON documents are stored in a binary format called BSON. BSON supports a richer set of types than JSON. http://bsonspec.org 37

CRUD – Create (in the shell) db.people.save({name: 'Smith', age: 30}); See how the save command works: db.foo.save 38

CRUD – Create How training.scores was created: for(i=0; i<1000; i++) { ['quiz', 'essay', 'exam'].forEach(function(name) { var score = Math.floor(Math.random() * 50) + 50; db.scores.save({student: i, name: name, score: score}); }); } db.scores.count(); 39

CRUD – Read Queries are specified using a document-style syntax! use training db.scores.find({score: 50}); db.scores.find({score: {"$gte": 70}}); db.scores.find({score: {"$gte": 70}}); Cursor! 40

Exercises Find all scores less than 65. Find the lowest quiz score. Find the highest quiz score. Write a query to find all digg stories where the view count is greater than 1000. Query for all digg stories whose media type is either 'news' or 'images' and where the topic name is 'Comedy’.(For extra practice, construct two queries using different sets of operators to do this. ) Find all digg stories where the topic name is 'Television' or the media type is 'videos'. Skip the first 5 results, and limit the result set to 10. 41

CRUD – Update use digg; db.people.update({name: 'Smith'}, {'$set': {interests: []}}); db.people.update({name: 'Smith'}, {'$push': {interests: ['chess']}}); 42

Exercises Set the proper 'grade' attribute for all scores. For example, users with scores greater than 90 get an 'A.' Set the grade to ‘B’ for scores falling between 80 and 90. You're being nice, so you decide to add 10 points to every score on every “final” exam whose score is lower than 60. How do you do this update? 43

CRUD – Delete db.dropDatabase(); db.foo.drop(); db.foo.remove(); 44

“Map Reduce is the Uzi of aggregation tools. Everything described with count, distinct and group can be done with MapReduce, and more.” Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide 45

MapReduce To use map-reduce, you first write a map function. map = function() { emit(this.user.name, {diggs: this.diggs, posts: 0}); } 46

MapReduce The reduce functions then aggregation those docs by key. reduce = function(key, values) { vardiggs = 0; var posts = 0; values.forEach(function(doc) { diggs += doc.diggs; posts += 1; }); return {diggs: diggs, posts: posts}; } 47

MapReduce Now both are used to perform custom aggregation. db.stories.mapReduce(map, reduce, {out: 'digg_users'}); 48

THANK YOU FOR YOUR ATTENTION 49

DMDW Extra Lesson - NoSql and MongoDB

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (15)

Semelhante a DMDW Extra Lesson - NoSql and MongoDB

Semelhante a DMDW Extra Lesson - NoSql and MongoDB (20)

Mais de Johannes Hoppe

Mais de Johannes Hoppe (18)

Último

Último (20)

DMDW Extra Lesson - NoSql and MongoDB