12. MongoDB
□Document-oriented database
JSON-style documents: Lists, Maps, primitives
Schema-less
□Transaction = update of a single document
□Rich query language for dynamic queries
□Tunable writes: speed reliability
□Highly scalable and available
11
13. MongoDB 사용예
□Use cases
High volume writes
Complex data
Semi-structured data
□주요 고객
Foursquare
Bit.ly Intuit
SourceForge, NY Times
GILT Groupe, Evite,
SugarCRM
12
14. Apache Cassandra
□Column-oriented database/Extensible row store
Think Row ~= java.util.SortedMap
□Transaction = update of a row
□Fast writes = append to a log
□Tunable reads/writes: consistency / availability
□Extremely scalable
Transparent and dynamic clustering
Rack and datacenter aware data replication
□CQL = “SQL”-like DDL and DML
13
15. Apache Cassandra 사용 예
□사용 예
Big data
Multiple Data Center distributed database
Persistent cache
(Write intensive) Logging
High-availability (writes)
□주요 고객
Digg, Facebook, Twitter, Reddit, Rackspace
Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
The largest production cluster has over 100 TB of data in
over 150 machines.“ – Casssandra web site
14
16. Solutions – Use NewSQL
□새로운 관계형 데이터베이스
□SQL과 ACID 트랜잭션을 유지
□새롭고 개선된 분산 아키텍처
□뛰어난 확장성과 성능을 지원
□NewSQL vendors: TokuDB, ScaleDB,
NimbusDB, ..., VoltDB
15
18. NewSQL 정의 – Wikipedia
NewSQL is a class of modern relational
database management systems that seek to
provide the same scalable performance of
NoSQL systems for OLTP workloads while still
maintaining the ACID guarantees of a
traditional single-node database system
http://en.wikipedia.org/wiki/NewSQL
17
19. NewSQL 정의 – 451 Group
A DBMS that delivers the scalability and
flexibility promised by NoSQL while retaining
the support for SQL queries and/or ACID, or
to improve performance for appropriate
workloads.
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
18
20. NewSQL 정의 – Stonbraker
SQL as the primary interface.
ACID support for transactions
Non-locking concurrency control.
High per-node performance.
Parallel, shared-nothing architecture.
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
19
21. NewSQL Category
New Database
New MySQL Storage Engines
Transparent Clustering
20
25. New Database
□ Newly designed from scratch to achieve
scalability and performance
One of the key considerations in improving the performance is
making non-disk (memory) or new kinds of disks (flash/SSD) the
primary data store.
some (hopefully minor) changes to the code will be required and
data migration is still needed.
□Solutions
Software-Only: VoltDB, NuoDB, Drizzle, Google Spanner
Supported as an appliance: Clustrix, Translattice.
http://www.linuxforu.com/2012/01/newsql-handle-big-data/
24
26. New MySQL Storage Engines
□Highly optimized storage engines for MySQL
□Scale better than built-in engines, such as
InnoDB.
Good part: the usage of the MySQL interface
Downside part: data migration from other databases
□Solutions
TokuDB, MemSQL, Xeround, Akiban, NDB
http://www.linuxforu.com/2012/01/newsql-handle-big-data/
25
27. Transparent Clustering
□Retain the OLTP databases in their original
format, but provide a pluggable feature
Cluster transparently
Ensure Scalability
□Avoid the rewrite code or perform any data
migration
□Solutions
Cluster transparently: Schooner MySQL, Continuent
Tungsten, ScalArc
Ensure Scalability: ScaleBase, dbShards
http://www.linuxforu.com/2012/01/newsql-handle-big-data/
26
30. VoltDB
□ VoltDB, 2010, AGPLv3/VoltDB Proprietary License, Java/C++
□ Type: NewSQL, New Database
□ Main Point: In-memory Database, Java Stored Procedure, VoltDB
implements the design of the academic H-Store project
□ Protocol: SQL
□ Transaction: Yes
□ Data Storage: Memory
□ Features
□ in-memory relational database
□ Durability thru replication, snapshots, logging
□ Transparent partitioning
□ ACID-level consistency
□ Synchronous multi-master replication
□ Database Replication
http://voltdb.com/products-services/products, http://www.slideshare.net/chris.e.richardson/polygot-persistenceforjavadevs-jfokus2012reorgpptx
29
31. VoltDB- Technical Overview
“OLTP Through the Looking Glass”
VoltDB는 전통 DB의 오버헤드 회피함
– 데이터 및 관련 프로세싱은 함께 분할 되고, CPU 코어
단위로 분산됨
Shared-Nothing Architecture
Horizontally Scale
– 데이터는 메인 메모리에서 유지됨
버퍼관리가 필요 없음
– Transaction은 메모리에서 순차적으로 실행됨
no locking & latching
– Synchronous multi-master replication: HA
– Command Logging: “write-ahead” 데이터 로깅을 대체
하여 고성능을 제공
출처: http://odbms.org/download/VoltDBTechnicalOverview.pdf, http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf
30
32. VoltDB - Partitions (1/3)
1 partition per physical CPU core
– Each physical server has multiple VoltDB partitions
Data - Two types of tables
– Partitioned
Single column serves as partitioning key
Rows are spread across all VoltDB partitions by partition column
Transactional data (high frequency of modification)
X
X
– Replicated
All rows exist within all VoltDB partitions Relatively static data (low frequency of
modification)
Code - Two types of work – both ACID
– Single-Partition
X
X
X
All insert/update/delete operations within single partition
Majority of transactional workload
– Multi-Partition
CRUD against partitioned tables across multiple partitions
Insert/update/delete on replicated tables
출처: http://strataconf.com/stratany2013/public/schedule/detail/31731
31
34. VoltDB - Partitions (3/3)
Looking inside a VoltDB partition…
– Each partition contains data and an
execution engine.
– The execution engine contains a queue
for transaction requests.
– Requests are executed sequentially
(single threaded).
Work
Queue
execution engine
Table Data
Index Data
- Complete copy of all replicated tables
- Portion of rows (about 1/partitions) of
all partitioned tables
33
35. VoltDB - Compiling
Stored Procedures
Schema
The database is constructed from
– The schema (DDL)
– The work load (Java stored procedures)
– The Project (users, groups, partitioning)
VoltCompiler creates application
catalog
– Copy to servers along with 1 .jar and
1 .so
– Start servers
CREATE TABLE HELLOWORLD (
HELLO CHAR(15),
WORLD CHAR(15),
DIALECT CHAR(15),
PRIMARY KEY (DIALECT)
);
import org.voltdb. * ;
import org.voltdb. * ;
import
@ProcInfo( org.voltdb. * ;
@ProcInfo(
partitionInfo = "HELLOWORLD.DIA
partitionInfo true "HE
@ProcInfo(
singlePartition = =
partitionInfo = "HELLOWORLD.DIA
)singlePartition = t
singlePartition = true
)
public class Insert extends VoltPr
public final SQLStmt sql =
public final SQLStmt
public class Insert extends VoltPr
new SQLStmt("INSERT INTO HELLO
public VoltTable[] sql =
public final SQLStmt run
new SQLStmt("INSERT INTO HELLO
public VoltTable[] run( String hel
public VoltTable[] run( String hel
Project.xml
<?xml version="1.0"?>
<project>
<database name='data
<schema path='ddl.
<partition table=‘
</database>
</project>
34
36. VoltDB - Transactions
All access to VoltDB is via Java stored procedures (Java + SQL)
A single invocation of a stored procedure is a transaction
(committed on success)
SQL
Limits round trips between DBMS and application
High performance client applications communicate
synchronously with VoltDB
35
37. VoltDB - Clusters/Durability
Scalability
– Increase RAM in servers to add capacity
– Add servers to increase performance / capacity
– Consistently measuring 90% of single-node performance increase per additional
node
High availability
– K-safety for redundancy
Snapshots
– Scheduled, continuous, on demand
Spooling to data warehouse
Disaster Recovery/WAN replication (Future)
– Asynchronous replication
36
39. Google Spanner Overview
구글의 확장성, 다중버전, 전 세계적으로 분산, 동기적으로 복제된
데이터베이스
Google, 2012, Paper, C++
Type: NewSQL, New Database
Main Point: Distributed multiversion database
특징
– General-purpose transactions (ACID)
– SQL query language
– Schematized tables
– Semi-relational data model
Running in production
– Storage for Google’s ad data(F1)
– Replaced a sharded MySQL database
출처: http://research.google.com/archive/spanner.html
38
40. Google Spanner - Key Features
Temporal Multi-version database
Externally consistent global write-transactions with synchronous
replication.
Transactions across Datacenters.
Lock-free read-only transactions.
Schematized, semi-relational (tabular) data model.
SQL-like query interface.
Auto-sharding, auto-rebalancing, automatic failure response.
Exposes control of data replication and placement to user/application.
Enables transaction serialization via global timestamps
Acknowledges clock uncertainty and guarantees a bound on it
Uses novel TrueTime API to accomplish concurrency control
Uses GPS devices and Atomic clocks to get accurate time
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
39
41. Google Spanner - Design Goals
출처: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
40
42. Google Spanner - Architecture
•
•
•
•
•
Universe: Spanner 전체 집합
Zone: 물리적으로 독립되어 운영할 수 있는 단위
1대의 존마스터(zonemaster)+ 수백~수천 대의 스팬서버(spanserver)로 구성됨
존마스터는 스팬서버에 데이터를 할당, 스팬서버는 실제 데이터를 저장하고 처리
로케이션 프록시: 클라이언트에 의해 호출되어, 접근해야 할 데이터가 어느 스팬서버에 있는지 알려줌
출처: http://helloworld.naver.com/helloworld/216593
41
43. Google Spanner - Software Stack (1/3)
•
•
•
•
스팬서버는 100~1000개의 태블릿(tablet)을 관리.
태블릿: ‘(key:string, timestamp:int64) string’ 형태의 매핑을 다수개 저장, 멀티다중 버전 데이터베이스 특성
태블릿의 상태는 B-트리 파일과 WAL로 CFS에 저장됨
스팬서버간 데이터 복제를 지원하기 위해 팍소스 스테이트 머신을 이용
출처: http://helloworld.naver.com/helloworld/216593
42
44. Google Spanner - Software Stack (2/3)
(key:string, timestamp:int64) → string
Back End: Colossus (successor to GFS)
Paxos State Machine on top of each tablet stores meta data
and logs of the tablet.
Leader among replicas in a Paxos group is chosen and all
write requests for replicas in that group initiate at leader.
Transaction Leader
–Is Paxos Leader if transaction involves one Paxos group
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
43
45. Google Spanner - Software Stack (3/3)
Directory – 같은 접두어를 사용하는 연속된 키 모음
– 빅테이블(BigTable)의 버킷과 유사
– 데이터 배치의 최소 단위
– 지리적인 리플리카 배치의 최소단위
디렉터리의 크기가 너무 커질 경우, 하나의 디렉터리를 여러 개의
프래그먼트(fragment)로 분할 가능
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
44
46. Google Spanner - Data Model
One or more databases supported in Spanner Universe
Database can contain unlimited schematized tables
Not purely relational
– Requires rows to have names
– Names are nothing but a set(can be singleton) of primary keys
– In a way, it’s a key value store with primary keys mapped to non-key columns as values
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
45
47. Google Spanner - TrueTime
“Global wall-clock time” with bounded uncertainty
Novel API behind Spanner’s core innovation
Leverages hardware features like GPS and Atomic Clocks
Implemented via TrueTime API.
– Key method being now() which not only returns current system time but
also another value (ε) which tells the maximum uncertainty in the time
returned
Set of time master server per datacenters and time slave
daemon per machines.
Majority of time masters are GPS fitted and few others are
atomic clock fitted (Armageddon masters).
Daemon polls variety of masters and reaches a consensus
about correct timestamp.
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
46
48. Google Spanner - True Time
TT.now()
earliest
time
latest
2*ε
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
47
49. Google Spanner - TrueTime Transaction
Read-Write – requires lock.
Read-Only – lock free.
– Requires declaration before start of transaction.
– Reads information that is up to date
Snapshot Read – Read information from past by specifying a
timestamp or bound
– Use specifies specific timestamp from past or timestamp bound so that
data till that point will be read.
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
48
50. Google Spanner - Evaluation(1/2)
Evaluated for replication, transactions and availability.
Results on epsilon of TrueTime
Benchmarked on Spanner System with
–50 Paxos groups
–250 Directories
–Clients(applicatons) and Zones are at a network distance of
1ms
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
49
51. Google Spanner - Evaluation (2/2)
Effect of killing servers on throughput
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt
Distribution of TrueTime Epsilon values
50
56. Database 업계의 3가지 Trends
□NoSQL 데이터베이스:
분산 아키텍처의 확장성 등의 요구 사항을 충족하며, 스키마 없는 데이터
관리 요구 사항에 부합하도록 설계됨.
□NewSQL 데이터베이스:
분산 아키텍처의 확장성 등의 요구 사항을 충족하거나 혹은 수평 확장을
필요로하지 않지만 성능을 개선은 되도록 설계됨.
□Data Grid/Cache 제품:
응용 프로그램 및 데이터베이스 성능을 높이기 위해 메모리에 데이터를
저장하도록 설계됨.
55
57. 결론
□ 데이터 저장을 위한 많은 솔루션이 존재
□ Oracle, MySQL만 있다는 생각은 버려야 함
□ 먼저 시스템의 데이터 속성과 요구사항을 파악(CAP, ACID/BASE)
□ 한 시스템에 여러 솔루션을 적용
소규모/복잡한 관계 데이터: RDBMS
대규모 실시간 처리 데이터: NoSQL, NewSQL
대규모 저장용 데이터: Hadoop 등
□ 적절한 솔루션 선택
□ 반드시 운영 중 발생할 수 있는 이슈에 대해 검증 후 도입 필요
□ 대부분의 NewSQL 솔루션은 베타 상태(섣부른 선택은 독이 될 수 있음)
□ 솔루션의 프로그램 코드 수준으로 검증 필요
□ NewSQL 솔루션에 대한 안정성 확보
□ 솔루션 자체의 안정성은 검증이 필요하며 현재의 DBMS 수준의 안정성은 지원하
지 않음
□ 반드시 안정적인 데이터 저장 방안 확보 후 적용 필요
□ 운영 및 개발 경험을 가진 개발자 확보 어려움
□ 요구사항에 부합되는 NewSQL 선정 필요
□ 처음부터 중요 시스템에 적용하기 보다는 시범 적용 필요
□ 선정된 솔루션 검증, 기술력 내재화
56
60. Early – 2000s
□All the big players were heavyweight
and expensive.
Oracle, DB2, Sybase, SQL Server, etc.
□Open-source databases were missing
important features.
Postgres, mSQL, and MySQL.
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
59
61. Early – 2000s : eBay Architecture
http://highscalability.com/ebay-architecture
60
62. Early – 2000s : eBay Architecture
Push functionality to application:
Joins
Referential integrity
Sorting done
No distributed transactions
http://highscalability.com/ebay-architecture
61
63. Mid– 2000s
□MySQL + InnoDB is widely adopted by
new web companies:
Supported transactions, replication,
recovery.
Still must use custom middleware to scale
out across multiple machines.
Memcache for caching queries.
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
62
65. Mid – 2000s : Facebook Architecture
Scale out using custom middleware.
Store ~75% of database in Memcache.
No distributed transactions.
http://www.techthebest.com/2011/11/29/technology-used-in-facebook/
64
66. Late – 2000s
□MySQL + InnoDB is widely adopted by new web
companies:
Supported transactions, replication, recovery.
Still must use custom middleware to scale out across
multiple machines.
Memcache for caching queries.
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
65
67. Late – 2000s : MongoDB Architecture
http://sett.ociweb.com/sett/settAug2011.html
66
68. Late – 2000s : MongoDB Architecture
Easy to use.
Becoming more like a DBMS over time.
No transactions.
http://sett.ociweb.com/sett/settAug2011.html
67
69. Early – 2010s
□New DBMSs that can scale across multiple machines
natively and provide ACID guarantees.
MySQL Middleware
Brand New Architectures
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf
68
71. Database SPRAIN
□“An injury to ligaments... caused by being
stretched beyond normal capacity”
□Six key drivers for NoSQL/NewSQL/DDG
adoption
Scalability
Performance
Relaxed consistency
Agility
Intricacy
Necessity
70
72. Database SPRAIN - Scalability
□Associated sub-driver: Hardware economics
Scale-out across clusters of commodity servers
□Example project/service/vendor
BigTable HBase Riak MongoDB Couchbase, Hadoop
Amazon RDS, Xeround, SQL Azure, NimbusDB
Data grid/cache
□Associated use case:
Large-scale distributed data storage
Analysis of continuously updated data
Multi-tenant PaaS data layer
71
73. Database SPRAIN - Scalability
□User: StumbleUpon
□Problem:
Scaling problems with recommendation engine on MySQL
□Solution: HBase
Started using Apache HBase to provide real-time analytics on Su.pr
MySQL lacked the performance headroom and scale
Multiple benefits including avoiding declaring schema
Enables the data to be used for multiple applications and use cases
72
74. Database SPRAIN - Performance
□Associated sub-driver: MySQL limitations
Inability to perform consistently at scale
□Example project/service/vendor
Hypertable Couchbase Membrain MongoDB Redis
Data grid/cache
VoltDB, Clustrix
□Associated use case:
Real time data processing of mixed read/write workloads
Data caching
Large-scale data ingestion
73
75. Database SPRAIN - Performance
□User: AOL Advertising
□Problem:
Real-time data processing to support targeted advertising
□Solution: Membase Server
Segmentation analysis runs in CDH, results passed into Membase
Make use of its sub-millisecond data delivery
More time for analysis as part of a 40ms targeted and response time
Also real time log and event management
74
76. Database SPRAIN – Relaxed Consistency
□Associated sub-driver: CAP theorem
The need to relax consistency in order to maintain availability
□Example project/service/vendor:
Dynamo, Voldemort, Cassandra
Amazon SimpleDB
□Associated use case:
Multi-data center replication
Service availability
Non-transactional data off-load
75
77. Database SPRAIN – Relaxed Consistency
□User: Wordnik
□Problem:
MySQL too consistent –blocked access to data during inserts and
created numerous temp files to stay consistent.
□Solution: MongoDB
Single word definition contains multiple data items from various
sources
MongoDB stores data as a complete document
Reduced the complexity of data storage
76
78. Database SPRAIN – Agility
□ Associated sub-driver: Polyglot persistence
Choose most appropriate storage technology for app in development
□Example project/service/vendor
MongoDB, CouchDB, Cassandra
Google App Engine, SimpleDB, SQL Azure
□Associated use case:
Mobile/remote device synchronization
Agile development
Data caching
77
79. Database SPRAIN – Agility
□ User: Dimagi BHOMA (Better Health Outcomes
through Mentoring and Assessments) project
□Problem:
Deliver patient information to clinics despite a lack of reliable Internet
connections
□Solution: Apache CouchDB
Replicates data from regional to national database
When Internet connection, and power, is available
Upload patient data from cell phones to local clinic
78
80. Database SPRAIN – Intricacy
□ Associated sub-driver: Big data, total data
Rising data volume, variety and velocity
□Example project/service/vendor
Neo4j GraphDB, InfiniteGraph
Apache Cassandra, Hadoop,
VoltDB, Clustrix
□Associated use case:
Social networking applications
Geo-locational applications
Configuration management database
79
81. Database SPRAIN – Intricacy
□User: Evident Software
□Problem:
Mapping infrastructure dependencies for application performance
management
□Solution: Neo4j
Apache Cassandra stores performance data
Neo4j used to map the correlations between different elements
Enables users to follow relationships between resources while
investigating issues
80
82. Database SPRAIN – Necessity
□ Associated sub-driver: Open source
The failure of existing suppliers to address the performance,
scalability and flexibility requirements of large-scale data processing
□ Example project/service/vendor
BigTable, Dynamo, MapReduce, Memcached
Hadoop HBase, Hypertable, Cassandra, Membase
Voldemort, Riak, BigCouch
MongoDB, Redis, CouchDB, Neo4J
□Associated use case:
All of the above
81