Most developers get started building applications without giving a lot of thought to their database. Either they get told that the company does everything with Database X, or they Google around and end up using MySQL or MS Access. Neither of these is the wrong choice, but it's often not the best choice. I was one of those guys, first with Access and then MySQL. As I've moved through my career, I've used a lot of database systems, both relational and non-relational ("NoSQL"), and my go-to choice has become PostgreSQL.
I'm not going to spend much time on the "SQL vs NoSQL" debate. It's sort of a straw man argument, because they're solving a variety of fundamentally different problems. It's also much better in discussion format, for that same reason: so much variety.
Like everything else in technology, there is no one-size-fits-all solution, but I want to show why I think it's a great first choice for most things, and the reasons why a lot of other options fall short. Your choice of database can shape a lot of how you build your application, how well it performs, and how it can grow over time. It can be a productivity boon, or force you to constantly working around it's limitations. As application developers, we should be spending our time solving business problems, not on low-level technology plumbing.
This session is intended for people who have built a few database-backed applications, and are curious about what options are out there and how to go about choosing between them.
1. The RDBMS You Should
Be Using
Barney Boisvert - dev.Objective() 2016
About Me
Software Craftsman
Soccer
Scotch
Woodworking
Tooling
2. What is a Database?
What is a Database?
a place to store things "permanently"
a place applications collaborate
a place to cache complex results
a place to write audit logs
3. What is a Database?
durable storage (not Redis!)
structured data (not MongoDB!)
concurrent interaction (not SQLite!)
query language (not Couchbase!)
What is a Database?
query language
durable storage
4. Lingua Franca
success OR error
Relational database
SQL as query language
durable storage
relational model
ACID transactions
CAP Theorem
Consistency: one single data state
Availability: all requests get responses
Partition Tolerance: network partitioning
5. ACID
Atomic: all or nothing
Consistent: all constraints are met before commit
Isolated: serialized behaviour under concurrency
Durable: once committed, always committed
DDL & DML
Data Definition Language: the portion of SQL used to manipulate the structure of
your database
Data Manipulation Language: the portion of SQL used to manipulate the data in
your database
6. why not both?!
DDL & DML
ACID applies to both. Except in MariaDB (née MySQL):
ALTER DATABASE ... UPGRADE DATA DIRECTORY NAME ALTER EVENT
ALTER FUNCTION ALTER PROCEDURE ALTER SERVER
ALTER TABLE ALTER VIEW ANALYZE TABLE
BEGIN CACHE INDEX CHANGE MASTER TO
CHECK TABLE CREATE DATABASE CREATE EVENT
CREATE FUNCTION CREATE INDEX CREATE PROCEDURE
CREATE ROLE CREATE SERVER CREATE TABLE
CREATE TRIGGER CREATE USER CREATE VIEW
DROP DATABASE DROP EVENT DROP FUNCTION
DROP INDEX DROP PROCEDURE DROP TABLE
DROP TRIGGER DROP ROLE DROP SERVER
DROP USER DROP VIEW FLUSH
GRANT LOAD INDEX INTO CACHE LOCK TABLES
OPTIMIZE TABLE RENAME TABLE RENAME USER
REPAIR TABLE RESET REVOKE
SET PASSWORD SHUTDOWN START SLAVE
START TRANSACTION STOP SLAVE TRUNCATE TABLE
https://mariadb.com/kb/en/mariadb/sql-statements-that-cause-an-implicit-commit/
Programmability
in application or in database?
surrogate key generation
constraints
cascade deletes
calculated fields
"hot" materialized views
7. Databases as Artifacts
test queries
query tuning
mid-transaction inspection via debugger
ad hoc reports
troubleshooting production
Concurrency Control
pray (MS Access)
table locking (MyISAM)
row locking (InnoDB)
MVCC (PostgreSQL, Oracle, SQL Server)
8. this is changing!
Administration Overhead
shared environments
local environment
MariaDB is super simple
PostgreSQL is pretty easy
Oracle, SQL Server licensing
SQL Server is Windows-Only
Monetary Cost
Oracle will cost you your firstborn
SQL Server is expensive, but has free Express Edition
small (<10gb) databases only
MariaDB and PostgreSQL are free
9. the last nail for MariaDB
because Node doesn't help in-DB
Elephants Are Cool?
Same roots as SQL Server (and Sybase)
Heroku keeps adding features
clustering/replication
upsert
JSON (and hstore) for "loose" models
plv8
inheritance
Elephants Aren't Cool?
Existing knowledge/experience
Cost isn't a differentiator
Downtime for upgrades is fine
"NoSQL" use cases
Facebook
key-value caching
unstructured data
why not both?
10. * I'm going to read this one verbatim. Sorry.
Elephants Are Cool!
ACID DDL
concurrency control
query tuning
price
ancestry
open source contributions
flexibility
Soapbox *
If you're not using version control, start. Before you write another line of
code.
If you write everything yourself, stop. Leverage libraries available to you.
Learn about your tools. They're all far more powerful than you believe.