2. 1999: Sony delivers the final innovation of the
floppy disk which could manage 200 Mb.
Since 1999: Billion+ people worldwide entered the
middle class which means more money, better
literacy & information explosion.
Today WalMart handles million
transaction per hour, Facebook 40
billion user photos. Like financial
inflation where 1, 10 & 100s have been
eroded, so have Kilo, Mega, Giga. The
base has shifted ►►► α
Now, we have Tera, Peta, Exa.
3. Entity Relationship Model Any problem and its solution space is perceived by our
mind by entities and relationships.
Who is the wealthiest person to have the max number of patents published ?
Person =
Name
Age
Country
Wealth =
Bank Account
Stock
Convertibles
Publish =
Scope
Discipline
Applicability
Patents =
Patent Office
Disputes
Prior Work
assets
wealth from
publishing
wealth from
patents
authored
parents
published work
1970s: BIRTH of RDBMS: Normalize data in tables then inter-connect; find by SQL.
Trivia: IBM Research rejected E.F Codd’s paper on relational algebra which is the basis of RDBMS
4. Data explosion have exposed the limitation of RDBMS
•SQL joins are expensive
Inconsistent indexing strategies across database vendors and unintentional schema design
errors degrade performance.
•Difficult to manage & query data stored across machines
Horizontal scaling (add nodes to the system) truncates the features of SQL. Vertical scaling
(add more resources to the node) means poor fault-tolerance.
Efficiency lost when cluster of rows of the same table are stored in different nodes.
Storing different columns of same table in different machines is nearly impossible.
•Data tightly coupled to schema
The turnaround time to capture changes in business operations is high.
•Large documents cannot be stored and queried efficiently
CLOB & BLOB fields are suitable for < 1MB. The XML datatype field in Oracle is just a
wrapper over CLOB. Its not a true XML database.
•Loose-fit to programming models
Several important concepts of OO design such as inheritance and hierarchy is actually a
workaround or special purpose implementation in a RDBMS. Explains why ORM frameworks
are so popular today.
•Limited free text search options
The only predicates are “all” (%) or “single” (?) match.
5. Internet powerhouses Google, Facebook, Amazon, eBay,
Digg, Twitter, NetFlix would not have been born if
RDBMS couldn't be killed.
Google creates BigTable, MapReduce while Facebook
creates Apache Cassandra & Twitter comes up with
FlockDB
"We shall never cease from exploration, and the end of all our exploring will be to
arrive where we started, and know the place for the first time." - TS Eliot
Back to the original approach. Use a system for entities, another for
their inter-relations.
2003: BIRTH of NOSQL
6. NOSQL: Product Paradigm
Three primary species
Key-Value Stores
An over-simple explanation is that it stores everything as a multi-dimensional
HashMap, basically dump all attributes of an entity in a single bucket. This means
nothing to join and more fit-to-programming models and scalable. Facebook can
collate user information for India in a commodity server (cloud) at Singapore and
for South America in Sao Paolo. There is no single point entry to database engine.
By nature, this architecture is fault-tolerant.
Graph database
One of the first paradigms of computer design has made a late entry. Store the
system as you would draw it on a white-board. Create nodes and draw edges
between those, add a direction if required, colour the nodes and edges. Traverse
and get beautiful insights from the data. And answer the interesting questions -
find all my friends who like Chinese food with a taste for single malt. In fact,
SPARQL query language syntax is like a natural way of asking questions.
Document store
Extension of key-value stores except that values are binary.
All of them support multi-tenancy.
All of them have extensive in-memory operation.
7. Combine the best and go for a hybrid model. Why ?
There might not be a population explosion but people will post status updates in a frenzy.
Store the relationships in a graph and everything else in a key-value store. Connect via an “id”.
Person
id
Org id
Person
id
Person
id
Person
id
Org id
Org id
Org id
associated
member
knows
founder
GRAPH DB
names
emails
balance
sheets
addresses
org type
products
status
feeds
KEY VALUE
STORE