SlideShare uma empresa Scribd logo
1 de 69
Baixar para ler offline
Dr. Pouria Amirian
June 2014
Dr. Pouria Amirian
Big Data Project Manager and Data Scientist
University of Oxford
Pouria.Amirian@ndm.ox.ac.uk; Pouria.Amirian@gmail.com
@pouriaamirian
2
 “By 2015, 4.4 million IT jobs globally will be created to support
Big Data.
 But there is a challenge. There is not enough talent in the
industry. Our public and private education systems are failing us.
Therefore only one-third of the IT jobs will be filled.These jobs are
the future of the new information economy.”
 Three Major areas of demand in Computer Science and IT:
 Big Data, Mobile and SocialComputing
(the foundation of theses three topics is Cloud Computing)
3
 SQL
 Advantages and Disadvantages
 NoSQL
 History
 CommonTraits
 Categories
 Examples
 Trends
4
5
Relational Databases
6
Row
Column  Keys
 Single/Multi-column Key
 Operations on tables:
 select, join (SQL)
 Relationship on key
 Primary Key
 Foreign Key
Table
Key
 Proven and Available talent /Well-known
 Many programmers are already familiar with it.
 Transactions and ACID make development easy.
 Lots of tools to use.
 Scalable
 Free and Commercial production support
 SQL (general and high-level query language)
7
 Create a database for posts of a weblog
 Each post is authored by a user
 Each post can have multiple comments from other
users
 Users can vote for a post (stars 0-5)
 Users can like comments
 Posts have date, comments have date
How Can I Cast an object to an Interface in C#?
I have to work with COM-based system and the only way to
work with the system is to work with interfaces. the problem
is when I worked in VB 6.0 the compiler could automatically
cast any object to an interface. However since C# is more
type-safe it is not provided automatically. So how can I
convert an Obj to an Interface in C#?
Joe “2011-07-26”
Tags: C#, Cast, Interface
James “2011-07-26”
use the cast operator of C#
Ana, “11-07-27”
you can use the ‘as’ keyword, look at the following code:
Iinterface myInterface= myObj as Iinterface
What are the posts by “Joe”? How many Stars they got?
What are the comments written by “James”?
12
{
“_id” : ObjectId("4e2e3f92268cdda473b628f6"),
“title” : “How can I cast an Object to an Interface in C#?”,
“when” : Date(“2011-07-26”),
“author” : “joe”,
“text” : “I have to work with COM-based system and the only
way to work with the system is to work with interfaces. the
problem is ….”,
“tags” : [“C#”, “Cast”, “Interface”],
“voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5],
“comments” : [
{“by”:“James”, “text”:“use the cast operator of C#”,
“when”:”11-07-26”},
{“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”,
“when”:”11-07-27”}]
} db.posts.find({“author” : “joe”}).sort()
db.posts.find({“comments.by” : “James”})
 Rigid schema design
 Hard to scale (Very limited scalability)
 Hard and complex Joins across multiple nodes
 Hard to handle data growth (Schema change, High
Volume of Data, HighVolume ofTransactions,…)
 Need for interface for data access (another layer of complexity)
 Impedance mismatches
 Mapping between Relational storage and Object-based
computing (Object Relational Mapping doesn't work quite well)
13
 Relational Databases are no longer one-size-fits-all
 Examples
 Content Management Systems
 Network Data (Social Networking, Location-Based
Application)
 Spatial Data Management Systems
 High frequency of change (huge amount of read and
write)
14
15
 Tuples (rows)
 Key/Value Pairs
 Documents
 Columns
 Graphs
 Relational DBMS
 Key/Value Databases
 Documents Data Store
 Column-Family Stores
 Graphs Database
 Tuples (rows)
 Key/Value Pairs
 Documents
 Columns
 Graphs
 Relational DBMS
 Key/Value Databases
 Documents Data Store
 Column-Family Stores
 Graphs Database
16
SQL
NoSQL
 The needs of modern applications do not always
match what relational databases provide.
 Success stories of Big Data management of
internet giants such as Google, Amazon,
Facebook, LinkedIn, …
 The mentioned companies faced unique
challenge and they developed some sort of
custom solution
17
 The Google File System, October 2003
 MapReduce, December 2004
 BigTable, November 2006
 …
Massively Scalable Google’s Infrastructure for:
 Google Search Engine
 Google Map and Google Earth
 Gmail, …
18
 Open source developers have tried to replicate each
peace of Google’sTechnology Stack
 Project Hadoop and its sub projects was born atYahoo!
Google Infrastructure Hadoop Universe
Google File System
(GFS)
Hadoop Distributed File
System (HDFS)
MapReduce Hadoop
BigTable HBase
19
 Dynamo: Amazon’s Highly Available Key/Value
Store, 2007
 Then use cases from Ebay, Facebook, Netflix,
Yahoo, IBM and …
20
21
2004 BigTable (Google)
2007 Dynamo (Amazon)
2008 Cassandra (Facebook)
In 2009 in San Francisco NoSQL name proposed by Eric Evans to
describe the growing non-relational movement
In 1998Carlos Strozzi use the word “NoSQL” to describe a relational database
that did not expose a SQL interface
 Not based on the relational model
 Flexible Schema
 Supports distributed database architectures
 Provides high scalability, high availability, and fault
tolerance
 Supports very large amounts of sparse data
 Geared toward performance rather than consistency
22
 Examples
11
K1
K2
K3
V1
V2
V2
24
 Memcached – Key value stores.
 Membase – Memcached with persistence and
improved consistent hashing.
 AppFabric Cache – Multi region Cache.
 Redis – Data structure server.
 Riak – Based on Amazon’s Dynamo.
 ProjectVoldemort – eventual consistent key value
stores, auto scaling.
 Schema Free.
 Usually JSON like interchange model.
 Query Model: JavaScript or custom.
 Aggregations: Map/Reduce.
 Indexes are done via B-Trees.
11
27
{
“_id” : ObjectId("4e2e3f92268cdda473b628f6"),
“title” : “How can I cast an Object to an Interface in C#?”,
“when” : Date(“2011-07-26”),
“author” : “joe”,
“text” : “I have to work with COM-based system and the only
way to work with the system is to work with interfaces. the
problem is ….”,
“tags” : [“C#”, “Cast”, “Interface”],
“voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5],
“comments” : [
{“by”:“James”, “text”:“use the cast operator of C#”,
“when”:”11-07-26”},
{“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”,
“when”:”11-07-27”}]
}
Id username email Department
1 John john@foo.com Sales
2 Mary mary@foo.com Marketing
3 Yoda yoda@foo.com IT
Id
1
2
3
Username
John
Mary
Yoda
email
john@foo.com
mary@foo.com
yoda@foo.com
Department
Sales
Marketing
IT
Row oriented (Relational)
Column oriented
29
 Based on GraphTheory.
 Scale vertically, no clustering.
 You can use graph algorithms easily.
 Relational Model
Social Network
 Who are Bob’s friends?
32
 Find all
friends of Alice’s friend
33
 In a sample social network containing 1,000,000 nodes
(people) each with approximately 50 edges
(relationship)
34
Depth RDBMS Graph Returned Records
2 0.016 0.01 ~2500
3 30.267 0.168 ~110,000
4 1543.505 1.359 ~600,000
5 Unfinished 2.132 ~800,000
Time in Seconds
35
1- Non-relational
 NoTables
 No Joins
 No ACIDTransaction *
 No support for SQL *
 *: a few NoSQL databases support ACID and SQL
36
2- Schema Free
 In a data collection:
 There can be records with completely different data
items (fields)
▪ Book 1 {name, publicationYear}
▪ Book 2 {author, publisher}
 The schema is in:
 the data itself or (JSON)
 usually in application not in the database
37
3- Horizontal Scalability
 Vertical (Scale up)
 Horizontal (Scale out)
38
4-Web Scale Applications:
 Simple requests (underlying database seems to be
unsophisticated)
 However:
 Sheer volume of data
 huge number of users (millions of user)
39
5- Open Source but from large internet companies:
 Google
 Facebook
 Twitter
 Linkedin
 Yahoo
40
41
42
Volume
• Huge amount of Collected and generated data by organizations or
individuals
• Need for huge amount of storage and processing power
Velocity
• Frequency at which data is generated, captured, shared and processed
• Need for real-time retrieval and process of data for large number of users
Variety
• Many formats and structures and sources
• Need for new types of storage and processing for structured and
Unstructured data
43
 many different types of tools, techniques,
technologies, algorithms and computation models for
collection, generation, storage, management, analysis
and visualization of high-volume (of size), high-velocity
(of change) and high-variety (in nature) data sets.
44
45
 Management
 Processing
46
47
 Also known as Brewer’sTheorem by Prof. Eric Brewer,
published in 2000 at University of Berkeley.
 “Of three properties of a shared data system: data
consistency, system availability and tolerance to
network partitions, only two can be achieved at any
given moment.”
 Proven by Nancy Lynch et al. MIT labs.
 http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-
keynote.pdf
 Consistency: All clients have same view of data
 Availability: Each client can always read and write
data
 Partition tolerance: the system works well despite
physical network partitions
 “CAP theorem” says A Database may only excels at
two of the CAP attributes
49
 ACID (Atomicity, Consistency, Isolation, Durability)
50
try{
Transaction.begin();
insert(data1);
update(data2);
insert(data3);
delete(data4);
Transaction.Commit();
}
catch(){
Transaction.Rollback();
}
 Atomicity: All or nothing.
 Consistency: Consistent state of data
 Isolation:Transactions are isolated from each other.
 Durability:When the transaction is committed, state
will be durable.
Any data store can achieve Atomicity, Isolation and
Durability but do you always need consistency? No.
By giving up ACID properties, one can achieve higher
performance and scalability.
 CAP in SQL databases >> CA (not distributed), CP (not
available distributed)
 ACID is guaranteed
 DBMS keeps users waiting (in order to propagate all
the changes to all nodes)
52
 CAP in NoSQL databases >> AP, CP
 DBMS will guarantee the consistency eventually but
meanwhile DBMS give control back to the application
(no waiting for users)
 The NoSQL database doesn’t commit the changes
right away (buffers)
 The data will be eventually consistent
53
 Acronym contrived to be the opposite of ACID
 Basically Available,
 Soft state,
 Eventually Consistent
54
55
 Basically Available
 possibilities of faults but not a fault of the whole system
 Soft state
 copies of a data item may be inconsistent
 Eventual Consistency
 When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
 copies becomes consistent at some later time if there are no
more updates to that data item
ACID:
• Strong consistency.
• Less availability.
• Pessimistic concurrency.
• Complex.
BASE:
• Availability is the most important thing.Willing to
sacrifice for this (CAP).
• Weaker consistency (Eventual).
• Simple and fast.
• Optimistic concurrency.
57
58
 Massive write performance
 Fast key value look ups
 No single point of failure
 Fast prototyping and development
 Out of the box scalability (Horizontally Scalable)
 Easy maintenance
59
 Simple APIs
 C# Example: db.collection.save(myDocument);
 Seamless language integration
 No impedance mismatch (look at the above C#
example)
 Designed to be horizontally scalable (elastic)
 Flexible data model and schema
 Majority free and/or Open Source
60
 There are more than 140 NoSQL Products
 Many are not proven
 Lack of SQL (the biggest missed feature)
 Proprietary Query Languages
 Lack of Skilled people
 Do you know a DBA for MarkLogic?
 Lack ofTools for modeling, documenting, reporting, …
(usually there are no good visual tools)
 Lack of Standards (It is the biggest threat)
61
62
63
e-Commerce application
SQL DB
Shopping
Cart Data
Orders
Session
Data
Web/Application
Server
64
e-Commerce application
SQL DB
Shopping Cart Data
Orders
Session Data
65
e-Commerce application
SQL DB
Orders
Key/Value
DB
Key/Value
DB
Shopping
Cart
Data
Session Data
66
e-Commerce application
SQL DB
Orders
Key/Value DBKey/Value DB
Shopping
Cart Data
Session
Data
Graph DB
Customer
Social
Graph
 It is not necessary for the application to use a single
data store for all of its needs, since different databases
are built for different purposes and not all problems
can be elegantly solved by a singe database.
 Using Different Data StorageTechnologies for
Varying Data Storage Needs
67
 Key-value stores:
 Processing a constant stream of small reads and writes.
 Document databases:
 Natural data modeling. Programmer friendly. Rapid
development. Web friendly, CRUD.
 RDMBS:
 OLTP. SQL.Transactions. Relations.
 Columnar:
 Handles size well. Massive write loads. High availability.
Multiple-data centers, MapReduce.
 Graph:
 Graph algorithms and relations.
Thanks for your attention
69

Mais conteúdo relacionado

Mais procurados

Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
KamleshKumar394
 

Mais procurados (20)

Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
Big Data for Ag (2019)
Big Data for Ag (2019)Big Data for Ag (2019)
Big Data for Ag (2019)
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Data science
Data scienceData science
Data science
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
data science
data sciencedata science
data science
 
Data Science
Data ScienceData Science
Data Science
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
2005)
2005)2005)
2005)
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 

Semelhante a NoSQL (Not Only SQL)

The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Paco Nathan
 

Semelhante a NoSQL (Not Only SQL) (20)

Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
DataHub
DataHubDataHub
DataHub
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Future platform for internet of things
Future platform for internet of thingsFuture platform for internet of things
Future platform for internet of things
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Mastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and ScienceMastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and Science
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
 
Koneksys - Offering Services to Connect Data using the Data Web
Koneksys - Offering Services to Connect Data using the Data WebKoneksys - Offering Services to Connect Data using the Data Web
Koneksys - Offering Services to Connect Data using the Data Web
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

NoSQL (Not Only SQL)

  • 1. Dr. Pouria Amirian June 2014 Dr. Pouria Amirian Big Data Project Manager and Data Scientist University of Oxford Pouria.Amirian@ndm.ox.ac.uk; Pouria.Amirian@gmail.com @pouriaamirian
  • 2. 2
  • 3.  “By 2015, 4.4 million IT jobs globally will be created to support Big Data.  But there is a challenge. There is not enough talent in the industry. Our public and private education systems are failing us. Therefore only one-third of the IT jobs will be filled.These jobs are the future of the new information economy.”  Three Major areas of demand in Computer Science and IT:  Big Data, Mobile and SocialComputing (the foundation of theses three topics is Cloud Computing) 3
  • 4.  SQL  Advantages and Disadvantages  NoSQL  History  CommonTraits  Categories  Examples  Trends 4
  • 6. 6 Row Column  Keys  Single/Multi-column Key  Operations on tables:  select, join (SQL)  Relationship on key  Primary Key  Foreign Key Table Key
  • 7.  Proven and Available talent /Well-known  Many programmers are already familiar with it.  Transactions and ACID make development easy.  Lots of tools to use.  Scalable  Free and Commercial production support  SQL (general and high-level query language) 7
  • 8.  Create a database for posts of a weblog  Each post is authored by a user  Each post can have multiple comments from other users  Users can vote for a post (stars 0-5)  Users can like comments  Posts have date, comments have date
  • 9. How Can I Cast an object to an Interface in C#? I have to work with COM-based system and the only way to work with the system is to work with interfaces. the problem is when I worked in VB 6.0 the compiler could automatically cast any object to an interface. However since C# is more type-safe it is not provided automatically. So how can I convert an Obj to an Interface in C#? Joe “2011-07-26” Tags: C#, Cast, Interface James “2011-07-26” use the cast operator of C# Ana, “11-07-27” you can use the ‘as’ keyword, look at the following code: Iinterface myInterface= myObj as Iinterface
  • 10.
  • 11. What are the posts by “Joe”? How many Stars they got? What are the comments written by “James”?
  • 12. 12 { “_id” : ObjectId("4e2e3f92268cdda473b628f6"), “title” : “How can I cast an Object to an Interface in C#?”, “when” : Date(“2011-07-26”), “author” : “joe”, “text” : “I have to work with COM-based system and the only way to work with the system is to work with interfaces. the problem is ….”, “tags” : [“C#”, “Cast”, “Interface”], “voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5], “comments” : [ {“by”:“James”, “text”:“use the cast operator of C#”, “when”:”11-07-26”}, {“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”, “when”:”11-07-27”}] } db.posts.find({“author” : “joe”}).sort() db.posts.find({“comments.by” : “James”})
  • 13.  Rigid schema design  Hard to scale (Very limited scalability)  Hard and complex Joins across multiple nodes  Hard to handle data growth (Schema change, High Volume of Data, HighVolume ofTransactions,…)  Need for interface for data access (another layer of complexity)  Impedance mismatches  Mapping between Relational storage and Object-based computing (Object Relational Mapping doesn't work quite well) 13
  • 14.  Relational Databases are no longer one-size-fits-all  Examples  Content Management Systems  Network Data (Social Networking, Location-Based Application)  Spatial Data Management Systems  High frequency of change (huge amount of read and write) 14
  • 15. 15  Tuples (rows)  Key/Value Pairs  Documents  Columns  Graphs  Relational DBMS  Key/Value Databases  Documents Data Store  Column-Family Stores  Graphs Database
  • 16.  Tuples (rows)  Key/Value Pairs  Documents  Columns  Graphs  Relational DBMS  Key/Value Databases  Documents Data Store  Column-Family Stores  Graphs Database 16 SQL NoSQL
  • 17.  The needs of modern applications do not always match what relational databases provide.  Success stories of Big Data management of internet giants such as Google, Amazon, Facebook, LinkedIn, …  The mentioned companies faced unique challenge and they developed some sort of custom solution 17
  • 18.  The Google File System, October 2003  MapReduce, December 2004  BigTable, November 2006  … Massively Scalable Google’s Infrastructure for:  Google Search Engine  Google Map and Google Earth  Gmail, … 18
  • 19.  Open source developers have tried to replicate each peace of Google’sTechnology Stack  Project Hadoop and its sub projects was born atYahoo! Google Infrastructure Hadoop Universe Google File System (GFS) Hadoop Distributed File System (HDFS) MapReduce Hadoop BigTable HBase 19
  • 20.  Dynamo: Amazon’s Highly Available Key/Value Store, 2007  Then use cases from Ebay, Facebook, Netflix, Yahoo, IBM and … 20
  • 21. 21 2004 BigTable (Google) 2007 Dynamo (Amazon) 2008 Cassandra (Facebook) In 2009 in San Francisco NoSQL name proposed by Eric Evans to describe the growing non-relational movement In 1998Carlos Strozzi use the word “NoSQL” to describe a relational database that did not expose a SQL interface
  • 22.  Not based on the relational model  Flexible Schema  Supports distributed database architectures  Provides high scalability, high availability, and fault tolerance  Supports very large amounts of sparse data  Geared toward performance rather than consistency 22
  • 25.  Memcached – Key value stores.  Membase – Memcached with persistence and improved consistent hashing.  AppFabric Cache – Multi region Cache.  Redis – Data structure server.  Riak – Based on Amazon’s Dynamo.  ProjectVoldemort – eventual consistent key value stores, auto scaling.
  • 26.  Schema Free.  Usually JSON like interchange model.  Query Model: JavaScript or custom.  Aggregations: Map/Reduce.  Indexes are done via B-Trees.
  • 27. 11 27 { “_id” : ObjectId("4e2e3f92268cdda473b628f6"), “title” : “How can I cast an Object to an Interface in C#?”, “when” : Date(“2011-07-26”), “author” : “joe”, “text” : “I have to work with COM-based system and the only way to work with the system is to work with interfaces. the problem is ….”, “tags” : [“C#”, “Cast”, “Interface”], “voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5], “comments” : [ {“by”:“James”, “text”:“use the cast operator of C#”, “when”:”11-07-26”}, {“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”, “when”:”11-07-27”}] }
  • 28. Id username email Department 1 John john@foo.com Sales 2 Mary mary@foo.com Marketing 3 Yoda yoda@foo.com IT Id 1 2 3 Username John Mary Yoda email john@foo.com mary@foo.com yoda@foo.com Department Sales Marketing IT Row oriented (Relational) Column oriented
  • 29. 29
  • 30.  Based on GraphTheory.  Scale vertically, no clustering.  You can use graph algorithms easily.
  • 31.
  • 32.  Relational Model Social Network  Who are Bob’s friends? 32
  • 33.  Find all friends of Alice’s friend 33
  • 34.  In a sample social network containing 1,000,000 nodes (people) each with approximately 50 edges (relationship) 34 Depth RDBMS Graph Returned Records 2 0.016 0.01 ~2500 3 30.267 0.168 ~110,000 4 1543.505 1.359 ~600,000 5 Unfinished 2.132 ~800,000 Time in Seconds
  • 35. 35
  • 36. 1- Non-relational  NoTables  No Joins  No ACIDTransaction *  No support for SQL *  *: a few NoSQL databases support ACID and SQL 36
  • 37. 2- Schema Free  In a data collection:  There can be records with completely different data items (fields) ▪ Book 1 {name, publicationYear} ▪ Book 2 {author, publisher}  The schema is in:  the data itself or (JSON)  usually in application not in the database 37
  • 38. 3- Horizontal Scalability  Vertical (Scale up)  Horizontal (Scale out) 38
  • 39. 4-Web Scale Applications:  Simple requests (underlying database seems to be unsophisticated)  However:  Sheer volume of data  huge number of users (millions of user) 39
  • 40. 5- Open Source but from large internet companies:  Google  Facebook  Twitter  Linkedin  Yahoo 40
  • 41. 41
  • 42. 42 Volume • Huge amount of Collected and generated data by organizations or individuals • Need for huge amount of storage and processing power Velocity • Frequency at which data is generated, captured, shared and processed • Need for real-time retrieval and process of data for large number of users Variety • Many formats and structures and sources • Need for new types of storage and processing for structured and Unstructured data
  • 43. 43
  • 44.  many different types of tools, techniques, technologies, algorithms and computation models for collection, generation, storage, management, analysis and visualization of high-volume (of size), high-velocity (of change) and high-variety (in nature) data sets. 44
  • 45. 45
  • 47. 47
  • 48.  Also known as Brewer’sTheorem by Prof. Eric Brewer, published in 2000 at University of Berkeley.  “Of three properties of a shared data system: data consistency, system availability and tolerance to network partitions, only two can be achieved at any given moment.”  Proven by Nancy Lynch et al. MIT labs.  http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC- keynote.pdf
  • 49.  Consistency: All clients have same view of data  Availability: Each client can always read and write data  Partition tolerance: the system works well despite physical network partitions  “CAP theorem” says A Database may only excels at two of the CAP attributes 49
  • 50.  ACID (Atomicity, Consistency, Isolation, Durability) 50 try{ Transaction.begin(); insert(data1); update(data2); insert(data3); delete(data4); Transaction.Commit(); } catch(){ Transaction.Rollback(); }
  • 51.  Atomicity: All or nothing.  Consistency: Consistent state of data  Isolation:Transactions are isolated from each other.  Durability:When the transaction is committed, state will be durable. Any data store can achieve Atomicity, Isolation and Durability but do you always need consistency? No. By giving up ACID properties, one can achieve higher performance and scalability.
  • 52.  CAP in SQL databases >> CA (not distributed), CP (not available distributed)  ACID is guaranteed  DBMS keeps users waiting (in order to propagate all the changes to all nodes) 52
  • 53.  CAP in NoSQL databases >> AP, CP  DBMS will guarantee the consistency eventually but meanwhile DBMS give control back to the application (no waiting for users)  The NoSQL database doesn’t commit the changes right away (buffers)  The data will be eventually consistent 53
  • 54.  Acronym contrived to be the opposite of ACID  Basically Available,  Soft state,  Eventually Consistent 54
  • 55. 55  Basically Available  possibilities of faults but not a fault of the whole system  Soft state  copies of a data item may be inconsistent  Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  copies becomes consistent at some later time if there are no more updates to that data item
  • 56. ACID: • Strong consistency. • Less availability. • Pessimistic concurrency. • Complex. BASE: • Availability is the most important thing.Willing to sacrifice for this (CAP). • Weaker consistency (Eventual). • Simple and fast. • Optimistic concurrency.
  • 57. 57
  • 58. 58
  • 59.  Massive write performance  Fast key value look ups  No single point of failure  Fast prototyping and development  Out of the box scalability (Horizontally Scalable)  Easy maintenance 59
  • 60.  Simple APIs  C# Example: db.collection.save(myDocument);  Seamless language integration  No impedance mismatch (look at the above C# example)  Designed to be horizontally scalable (elastic)  Flexible data model and schema  Majority free and/or Open Source 60
  • 61.  There are more than 140 NoSQL Products  Many are not proven  Lack of SQL (the biggest missed feature)  Proprietary Query Languages  Lack of Skilled people  Do you know a DBA for MarkLogic?  Lack ofTools for modeling, documenting, reporting, … (usually there are no good visual tools)  Lack of Standards (It is the biggest threat) 61
  • 62. 62
  • 63. 63 e-Commerce application SQL DB Shopping Cart Data Orders Session Data Web/Application Server
  • 64. 64 e-Commerce application SQL DB Shopping Cart Data Orders Session Data
  • 66. 66 e-Commerce application SQL DB Orders Key/Value DBKey/Value DB Shopping Cart Data Session Data Graph DB Customer Social Graph
  • 67.  It is not necessary for the application to use a single data store for all of its needs, since different databases are built for different purposes and not all problems can be elegantly solved by a singe database.  Using Different Data StorageTechnologies for Varying Data Storage Needs 67
  • 68.  Key-value stores:  Processing a constant stream of small reads and writes.  Document databases:  Natural data modeling. Programmer friendly. Rapid development. Web friendly, CRUD.  RDMBS:  OLTP. SQL.Transactions. Relations.  Columnar:  Handles size well. Massive write loads. High availability. Multiple-data centers, MapReduce.  Graph:  Graph algorithms and relations.
  • 69. Thanks for your attention 69