Presentation on how to chat with PDF using ChatGPT code interpreter
No sql3 rmoug
1.
2. I'm from California – where mountain biking and
startups were invented. My friends work at
Facebook, eBay, Linked-In, and often I'm the only
DBA they will talk to. That is how I hear about the
decision model around NoSQL usage.
3. We are a managed service AND a solution provider of elite database and
System Administration skills in Oracle, MySQL and SQL Server
3
4.
5. MySQL for front-end and ad serving
Oracle as a data warehouse
Hadoop for analytics and ETL
Hive as a more structured Hadoop frontend
Cassandra for mailbox search
While an excellent RDBMS such as Oracle can solve 90% of the problems, we
need multiple, special purpose databases for the other 10%.
Every developer knows more than one language, and most of them will happily
learn more if the job requires. The good ones are “software engineers” and
not “Java programmers”. We need to turn “Senior Oracle DBAs” into
“Database Engineers”.
5
6. * Marketing term. These days everything is NoSQL
(including Oracle!)
* Anything from file-system to cache can be called
NoSQL
* Key value stores, document stores, column stores,
OLTP or DW, RAM or Disk,
7. Some people say: Why worry about scale before you have even 100
users?
Not true. Some startups like eBay or LinkedIn have a scale or fail business
model from the beginning. They know that if they don't get 250M users,
they will fail. So they plan for 250M from the beginning.
While initially most NoSQL databases are easier for developers, due to
simpler data models and easier access methods than JDBC+SQL.
Eventually NoSQL databases lack many of the services an RDBMS will
provide, forcing your developers to do more work.
8. You can do – pk access, range scan, group by – but
not joins
You may be able to update a single row
(“document”, “column family”) as an atomic
operation. But that is the absolute limit.
10. Note that these are not traditional RDBMS
problems:
Checkout requires access by key only.
Monitoring is write a lot query very little.
Page-rank and “People you might know” require
quick updates and selects are done with batch
offline jobs.
Word completion is just set selection
11. Hadoop – so big it deserves its own presentation
16. ... or when node 3 crashes?
You need to remap every single datapoint to a new
node. Causing lots of data copy and scanning.
Lots of extra work. Some of it may require locking.
Actually when you add node #5, you only need to
mode 3000/5 datapoints, not 3000. Obviously, the
more nodes you have, the more advantage there
is to a smarter way of partitioning.
23. When you decide to go with a distributed and
replicated model, there's an obvious question:
What do I do when some of the nodes needed for
the operation are not available (either due to
network issues or to crashes):
1. Fail the operation
2. Wait for the node to come back
3. Perform the operation on reachable nodes and
update the extra node when its back.
24. Writes don't get lost, because at least one node
keeps them and attempts to communicate them to
other nodes in the system
25. Important – the application must know how to
resolve conflicts. If you don't have a good method
of resolving conflicts – don't do eventual
consistency.
28. Storage nodes are the physical servers
They contain “partitions”. Keys are mapped to partitions. Partitions are
grouped into “replication groups”, each containing a set number of
partitions on separate servers, and if needed – separate data centers.
All partitions in a replication group contain identical data.
One partition in a replication group is designated the “master”. Writes are
done on the master only. If the master fails, a new master is elected in
the group.
Client drivers keep track of the hash map – which key will map to which
partition, who is the master of each replication group and the load on
each node in the group. This allows the client to work to the right node.
28
31. Major key controls the location of the key. This
means that all keys with same major key are kept
on same replica, and can be updated in one
transactions. It also means that many different
major keys should be used to fully utilize all
storage nodes.
31
37. * New products = lots of bugs, few features
Oracle is at 11gR2. MS SQLServer is the equivalent
of Oracle 8i (maybe 9?), MySQL is somewhere
between 6 and 7. NoSQL is between 2 and 3.
* Open source = no support
* Many companies decide to built their own – most
of the algorithms are published, you can use
existing code, there is no support anyway, solving
specific problems is easier