SlideShare uma empresa Scribd logo
1 de 83
Baixar para ler offline
[SSA] Big Data Analytics

NewSQL Database

민형기
hg.min@samsung.com
2014. 2. 5.
Contents
I. Why NewSQL?
II. NewSQL 기본 개념
III. NewSQL 종류
IV.NewSQL 정리

1
Why NewSQL?

2
Thinking – Extreme Data

Qcon London 2012

3
Thinking - Traffic Explosion

출처 : Netflix in the Cloud (http://www.slideshare.net/adrianco/netflix-in-the-cloud-2011)

4
Organizations need deeper insights

Qcon London 2012

5
Solutions

□Buy High end Technology
□Higher more developers
□Using NoSQL
□Using NewSQL

6
Solution – Buy High End Technology

Oracle, IBM

7
Solution – Higher more developers

□Application Level Sharding
□Build your replication middleware
□…

http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5

8
Solutions – Use NoSQL

□새로운 비 관계형 데이터 베이스
□분산 아키텍처
□수평 확장성
□고정된 테이블 스키마가 없음
□Join, UPDATE, DELETE 연산이 없음
□트랜잭션이 없음
□SQL 지원이 없음

9
NoSQL Ecosystems

451 group

10
MongoDB

□Document-oriented database
 JSON-style documents: Lists, Maps, primitives
 Schema-less

□Transaction = update of a single document
□Rich query language for dynamic queries
□Tunable writes: speed reliability
□Highly scalable and available

11
MongoDB 사용예

□Use cases
 High volume writes
 Complex data
 Semi-structured data

□주요 고객






Foursquare
Bit.ly Intuit
SourceForge, NY Times
GILT Groupe, Evite,
SugarCRM
12
Apache Cassandra
□Column-oriented database/Extensible row store


Think Row ~= java.util.SortedMap

□Transaction = update of a row
□Fast writes = append to a log
□Tunable reads/writes: consistency / availability
□Extremely scalable
 Transparent and dynamic clustering
 Rack and datacenter aware data replication

□CQL = “SQL”-like DDL and DML

13
Apache Cassandra 사용 예

□사용 예






Big data
Multiple Data Center distributed database
Persistent cache
(Write intensive) Logging
High-availability (writes)

□주요 고객
 Digg, Facebook, Twitter, Reddit, Rackspace
 Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
 The largest production cluster has over 100 TB of data in
over 150 machines.“ – Casssandra web site
14
Solutions – Use NewSQL

□새로운 관계형 데이터베이스
□SQL과 ACID 트랜잭션을 유지
□새롭고 개선된 분산 아키텍처
□뛰어난 확장성과 성능을 지원
□NewSQL vendors: TokuDB, ScaleDB,
NimbusDB, ..., VoltDB

15
http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

16
NewSQL 정의 – Wikipedia
NewSQL is a class of modern relational
database management systems that seek to
provide the same scalable performance of

NoSQL systems for OLTP workloads while still
maintaining the ACID guarantees of a
traditional single-node database system

http://en.wikipedia.org/wiki/NewSQL

17
NewSQL 정의 – 451 Group
A DBMS that delivers the scalability and
flexibility promised by NoSQL while retaining
the support for SQL queries and/or ACID, or

to improve performance for appropriate
workloads.

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

18
NewSQL 정의 – Stonbraker
 SQL as the primary interface.
 ACID support for transactions
 Non-locking concurrency control.
 High per-node performance.
 Parallel, shared-nothing architecture.

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

19
NewSQL Category
 New Database
 New MySQL Storage Engines
 Transparent Clustering

20
The evolving database landscape

OSBC

21
MySQL Ecosystem

22
NewSQL Ecosystem

23
New Database
□ Newly designed from scratch to achieve
scalability and performance
 One of the key considerations in improving the performance is
making non-disk (memory) or new kinds of disks (flash/SSD) the
primary data store.
 some (hopefully minor) changes to the code will be required and
data migration is still needed.

□Solutions
 Software-Only: VoltDB, NuoDB, Drizzle, Google Spanner
 Supported as an appliance: Clustrix, Translattice.

http://www.linuxforu.com/2012/01/newsql-handle-big-data/

24
New MySQL Storage Engines

□Highly optimized storage engines for MySQL
□Scale better than built-in engines, such as
InnoDB.
 Good part: the usage of the MySQL interface
 Downside part: data migration from other databases

□Solutions
 TokuDB, MemSQL, Xeround, Akiban, NDB

http://www.linuxforu.com/2012/01/newsql-handle-big-data/

25
Transparent Clustering

□Retain the OLTP databases in their original
format, but provide a pluggable feature
 Cluster transparently
 Ensure Scalability

□Avoid the rewrite code or perform any data
migration
□Solutions
 Cluster transparently: Schooner MySQL, Continuent
Tungsten, ScalArc
 Ensure Scalability: ScaleBase, dbShards

http://www.linuxforu.com/2012/01/newsql-handle-big-data/

26
NewSQL Products
 VoltDB
 Google Spanner
 MySQL Cluster Architecture

27
28
VoltDB
□ VoltDB, 2010, AGPLv3/VoltDB Proprietary License, Java/C++
□ Type: NewSQL, New Database
□ Main Point: In-memory Database, Java Stored Procedure, VoltDB
implements the design of the academic H-Store project
□ Protocol: SQL
□ Transaction: Yes
□ Data Storage: Memory
□ Features
□ in-memory relational database
□ Durability thru replication, snapshots, logging
□ Transparent partitioning
□ ACID-level consistency
□ Synchronous multi-master replication
□ Database Replication

http://voltdb.com/products-services/products, http://www.slideshare.net/chris.e.richardson/polygot-persistenceforjavadevs-jfokus2012reorgpptx

29
VoltDB- Technical Overview
 “OLTP Through the Looking Glass”
 VoltDB는 전통 DB의 오버헤드 회피함

– 데이터 및 관련 프로세싱은 함께 분할 되고, CPU 코어
단위로 분산됨
 Shared-Nothing Architecture
 Horizontally Scale
– 데이터는 메인 메모리에서 유지됨
 버퍼관리가 필요 없음
– Transaction은 메모리에서 순차적으로 실행됨
 no locking & latching
– Synchronous multi-master replication: HA
– Command Logging: “write-ahead” 데이터 로깅을 대체
하여 고성능을 제공

출처: http://odbms.org/download/VoltDBTechnicalOverview.pdf, http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf

30
VoltDB - Partitions (1/3)
 1 partition per physical CPU core

– Each physical server has multiple VoltDB partitions

 Data - Two types of tables
– Partitioned

Single column serves as partitioning key
Rows are spread across all VoltDB partitions by partition column
Transactional data (high frequency of modification)

X

X

– Replicated

All rows exist within all VoltDB partitions Relatively static data (low frequency of
modification)

 Code - Two types of work – both ACID
– Single-Partition

X

X

X

All insert/update/delete operations within single partition
Majority of transactional workload

– Multi-Partition

CRUD against partitioned tables across multiple partitions
Insert/update/delete on replicated tables
출처: http://strataconf.com/stratany2013/public/schedule/detail/31731

31
VoltDB - Partitions (2/3)
 Single-partition vs. Multi-partition
select count(*) from orders where customer_id = 5
single-partition
select count(*) from orders where product_id = 3
multi-partition
insert into orders (customer_id, order_id, product_id) values (3,303,2)
single-partition
update products set product_name = ‘spork’ where product_id = 3
multi-partition

Partition 1
1
1
4

101
101
401

1
2
3

knife
spoon
fork

Partition 2
2
3
2

2
5
5

201
501
502

1
2
3

knife
spoon
fork

Partition 3
1
3
2

3
6
6

201
601
601

1
2
3

knife
spoon
fork

1
1
2

table orders :
(partitioned)

customer_id (partition key)
order_id
product_id

table products : product_id
(replicated)
product_name

32
VoltDB - Partitions (3/3)
 Looking inside a VoltDB partition…
– Each partition contains data and an
execution engine.
– The execution engine contains a queue
for transaction requests.
– Requests are executed sequentially
(single threaded).

Work
Queue

execution engine
Table Data
Index Data

- Complete copy of all replicated tables
- Portion of rows (about 1/partitions) of
all partitioned tables
33
VoltDB - Compiling
Stored Procedures

Schema
 The database is constructed from
– The schema (DDL)
– The work load (Java stored procedures)
– The Project (users, groups, partitioning)

 VoltCompiler creates application
catalog
– Copy to servers along with 1 .jar and
1 .so
– Start servers

CREATE TABLE HELLOWORLD (
HELLO CHAR(15),
WORLD CHAR(15),
DIALECT CHAR(15),
PRIMARY KEY (DIALECT)
);

import org.voltdb. * ;
import org.voltdb. * ;

import
@ProcInfo( org.voltdb. * ;
@ProcInfo(
partitionInfo = "HELLOWORLD.DIA
partitionInfo true "HE
@ProcInfo(
singlePartition = =
partitionInfo = "HELLOWORLD.DIA
)singlePartition = t
singlePartition = true
)
public class Insert extends VoltPr

public final SQLStmt sql =
public final SQLStmt
public class Insert extends VoltPr
new SQLStmt("INSERT INTO HELLO
public VoltTable[] sql =
public final SQLStmt run
new SQLStmt("INSERT INTO HELLO
public VoltTable[] run( String hel
public VoltTable[] run( String hel

Project.xml
<?xml version="1.0"?>
<project>
<database name='data
<schema path='ddl.
<partition table=‘
</database>
</project>

34
VoltDB - Transactions
 All access to VoltDB is via Java stored procedures (Java + SQL)
 A single invocation of a stored procedure is a transaction
(committed on success)

SQL

 Limits round trips between DBMS and application
 High performance client applications communicate
synchronously with VoltDB

35
VoltDB - Clusters/Durability
 Scalability
– Increase RAM in servers to add capacity
– Add servers to increase performance / capacity
– Consistently measuring 90% of single-node performance increase per additional
node

 High availability
– K-safety for redundancy

 Snapshots
– Scheduled, continuous, on demand

 Spooling to data warehouse
 Disaster Recovery/WAN replication (Future)
– Asynchronous replication

36
Google Spanner

37
Google Spanner Overview
구글의 확장성, 다중버전, 전 세계적으로 분산, 동기적으로 복제된
데이터베이스
 Google, 2012, Paper, C++
 Type: NewSQL, New Database
 Main Point: Distributed multiversion database
 특징
– General-purpose transactions (ACID)
– SQL query language
– Schematized tables
– Semi-relational data model

 Running in production
– Storage for Google’s ad data(F1)
– Replaced a sharded MySQL database

출처: http://research.google.com/archive/spanner.html

38
Google Spanner - Key Features
 Temporal Multi-version database
 Externally consistent global write-transactions with synchronous
replication.
 Transactions across Datacenters.
 Lock-free read-only transactions.
 Schematized, semi-relational (tabular) data model.
 SQL-like query interface.
 Auto-sharding, auto-rebalancing, automatic failure response.
 Exposes control of data replication and placement to user/application.
 Enables transaction serialization via global timestamps
 Acknowledges clock uncertainty and guarantees a bound on it
 Uses novel TrueTime API to accomplish concurrency control
 Uses GPS devices and Atomic clocks to get accurate time

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

39
Google Spanner - Design Goals

출처: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

40
Google Spanner - Architecture

•
•
•
•
•

Universe: Spanner 전체 집합
Zone: 물리적으로 독립되어 운영할 수 있는 단위
1대의 존마스터(zonemaster)+ 수백~수천 대의 스팬서버(spanserver)로 구성됨
존마스터는 스팬서버에 데이터를 할당, 스팬서버는 실제 데이터를 저장하고 처리
로케이션 프록시: 클라이언트에 의해 호출되어, 접근해야 할 데이터가 어느 스팬서버에 있는지 알려줌

출처: http://helloworld.naver.com/helloworld/216593

41
Google Spanner - Software Stack (1/3)

•
•
•
•

스팬서버는 100~1000개의 태블릿(tablet)을 관리.
태블릿: ‘(key:string, timestamp:int64) string’ 형태의 매핑을 다수개 저장, 멀티다중 버전 데이터베이스 특성
태블릿의 상태는 B-트리 파일과 WAL로 CFS에 저장됨
스팬서버간 데이터 복제를 지원하기 위해 팍소스 스테이트 머신을 이용

출처: http://helloworld.naver.com/helloworld/216593

42
Google Spanner - Software Stack (2/3)
(key:string, timestamp:int64) → string
 Back End: Colossus (successor to GFS)
 Paxos State Machine on top of each tablet stores meta data
and logs of the tablet.
 Leader among replicas in a Paxos group is chosen and all
write requests for replicas in that group initiate at leader.
 Transaction Leader
–Is Paxos Leader if transaction involves one Paxos group
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

43
Google Spanner - Software Stack (3/3)
 Directory – 같은 접두어를 사용하는 연속된 키 모음
– 빅테이블(BigTable)의 버킷과 유사
– 데이터 배치의 최소 단위
– 지리적인 리플리카 배치의 최소단위

 디렉터리의 크기가 너무 커질 경우, 하나의 디렉터리를 여러 개의
프래그먼트(fragment)로 분할 가능

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

44
Google Spanner - Data Model
 One or more databases supported in Spanner Universe
 Database can contain unlimited schematized tables
 Not purely relational
– Requires rows to have names
– Names are nothing but a set(can be singleton) of primary keys
– In a way, it’s a key value store with primary keys mapped to non-key columns as values

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

45
Google Spanner - TrueTime
 “Global wall-clock time” with bounded uncertainty
 Novel API behind Spanner’s core innovation
 Leverages hardware features like GPS and Atomic Clocks
 Implemented via TrueTime API.
– Key method being now() which not only returns current system time but
also another value (ε) which tells the maximum uncertainty in the time
returned

 Set of time master server per datacenters and time slave
daemon per machines.
 Majority of time masters are GPS fitted and few others are
atomic clock fitted (Armageddon masters).
 Daemon polls variety of masters and reaches a consensus
about correct timestamp.
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

46
Google Spanner - True Time

TT.now()

earliest

time
latest

2*ε
출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

47
Google Spanner - TrueTime Transaction
 Read-Write – requires lock.
 Read-Only – lock free.
– Requires declaration before start of transaction.
– Reads information that is up to date

 Snapshot Read – Read information from past by specifying a
timestamp or bound
– Use specifies specific timestamp from past or timestamp bound so that
data till that point will be read.

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

48
Google Spanner - Evaluation(1/2)
Evaluated for replication, transactions and availability.
Results on epsilon of TrueTime
Benchmarked on Spanner System with
–50 Paxos groups
–250 Directories
–Clients(applicatons) and Zones are at a network distance of
1ms

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

49
Google Spanner - Evaluation (2/2)

Effect of killing servers on throughput

출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt

Distribution of TrueTime Epsilon values

50
MySQL Cluster Architecture

http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-overview.html

51
Schooner MySQL Active Cluster

http://www.snia.org/sites/default/files2/SDC2011/presentations/monday/DrJohnBuschHow_ScaleUp_Scale_Out.pdf

52
dbShards Architecture

http://www.linuxforu.com/2012/01/newsql-handle-big-data/

53
NewSQL 정리

54
Database 업계의 3가지 Trends
□NoSQL 데이터베이스:
 분산 아키텍처의 확장성 등의 요구 사항을 충족하며, 스키마 없는 데이터
관리 요구 사항에 부합하도록 설계됨.

□NewSQL 데이터베이스:
 분산 아키텍처의 확장성 등의 요구 사항을 충족하거나 혹은 수평 확장을
필요로하지 않지만 성능을 개선은 되도록 설계됨.

□Data Grid/Cache 제품:
 응용 프로그램 및 데이터베이스 성능을 높이기 위해 메모리에 데이터를
저장하도록 설계됨.

55
결론
□ 데이터 저장을 위한 많은 솔루션이 존재

□ Oracle, MySQL만 있다는 생각은 버려야 함
□ 먼저 시스템의 데이터 속성과 요구사항을 파악(CAP, ACID/BASE)
□ 한 시스템에 여러 솔루션을 적용
 소규모/복잡한 관계 데이터: RDBMS
 대규모 실시간 처리 데이터: NoSQL, NewSQL
 대규모 저장용 데이터: Hadoop 등

□ 적절한 솔루션 선택

□ 반드시 운영 중 발생할 수 있는 이슈에 대해 검증 후 도입 필요
□ 대부분의 NewSQL 솔루션은 베타 상태(섣부른 선택은 독이 될 수 있음)
□ 솔루션의 프로그램 코드 수준으로 검증 필요

□ NewSQL 솔루션에 대한 안정성 확보

□ 솔루션 자체의 안정성은 검증이 필요하며 현재의 DBMS 수준의 안정성은 지원하
지 않음
□ 반드시 안정적인 데이터 저장 방안 확보 후 적용 필요
□ 운영 및 개발 경험을 가진 개발자 확보 어려움
□ 요구사항에 부합되는 NewSQL 선정 필요

□ 처음부터 중요 시스템에 적용하기 보다는 시범 적용 필요
□ 선정된 솔루션 검증, 기술력 내재화

56
감사합니다.

57
Appendix.

58
Early – 2000s
□All the big players were heavyweight
and expensive.
 Oracle, DB2, Sybase, SQL Server, etc.

□Open-source databases were missing
important features.
 Postgres, mSQL, and MySQL.

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

59
Early – 2000s : eBay Architecture

http://highscalability.com/ebay-architecture

60
Early – 2000s : eBay Architecture

 Push functionality to application:
 Joins
 Referential integrity
 Sorting done
 No distributed transactions

http://highscalability.com/ebay-architecture

61
Mid– 2000s

□MySQL + InnoDB is widely adopted by
new web companies:
 Supported transactions, replication,
recovery.
 Still must use custom middleware to scale
out across multiple machines.
 Memcache for caching queries.

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

62
Mid – 2000s : Facebook Architecture

http://www.techthebest.com/2011/11/29/technology-used-in-facebook/

63
Mid – 2000s : Facebook Architecture

 Scale out using custom middleware.
 Store ~75% of database in Memcache.
 No distributed transactions.

http://www.techthebest.com/2011/11/29/technology-used-in-facebook/

64
Late – 2000s
□MySQL + InnoDB is widely adopted by new web
companies:
 Supported transactions, replication, recovery.
 Still must use custom middleware to scale out across
multiple machines.
 Memcache for caching queries.

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

65
Late – 2000s : MongoDB Architecture

http://sett.ociweb.com/sett/settAug2011.html

66
Late – 2000s : MongoDB Architecture

 Easy to use.
 Becoming more like a DBMS over time.
 No transactions.

http://sett.ociweb.com/sett/settAug2011.html

67
Early – 2010s
□New DBMSs that can scale across multiple machines
natively and provide ACID guarantees.
 MySQL Middleware
 Brand New Architectures

http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf

68
Database SPRAIN
(by 451Group)

69
Database SPRAIN

□“An injury to ligaments... caused by being
stretched beyond normal capacity”
□Six key drivers for NoSQL/NewSQL/DDG
adoption







Scalability
Performance
Relaxed consistency
Agility
Intricacy
Necessity
70
Database SPRAIN - Scalability
□Associated sub-driver: Hardware economics
 Scale-out across clusters of commodity servers

□Example project/service/vendor
 BigTable HBase Riak MongoDB Couchbase, Hadoop
 Amazon RDS, Xeround, SQL Azure, NimbusDB
 Data grid/cache

□Associated use case:
 Large-scale distributed data storage
 Analysis of continuously updated data
 Multi-tenant PaaS data layer

71
Database SPRAIN - Scalability
□User: StumbleUpon
□Problem:
 Scaling problems with recommendation engine on MySQL

□Solution: HBase





Started using Apache HBase to provide real-time analytics on Su.pr
MySQL lacked the performance headroom and scale
Multiple benefits including avoiding declaring schema
Enables the data to be used for multiple applications and use cases

72
Database SPRAIN - Performance
□Associated sub-driver: MySQL limitations
 Inability to perform consistently at scale

□Example project/service/vendor
 Hypertable Couchbase Membrain MongoDB Redis
 Data grid/cache
 VoltDB, Clustrix

□Associated use case:
 Real time data processing of mixed read/write workloads
 Data caching
 Large-scale data ingestion

73
Database SPRAIN - Performance
□User: AOL Advertising
□Problem:
 Real-time data processing to support targeted advertising

□Solution: Membase Server





Segmentation analysis runs in CDH, results passed into Membase
Make use of its sub-millisecond data delivery
More time for analysis as part of a 40ms targeted and response time
Also real time log and event management

74
Database SPRAIN – Relaxed Consistency
□Associated sub-driver: CAP theorem
 The need to relax consistency in order to maintain availability

□Example project/service/vendor:
 Dynamo, Voldemort, Cassandra
 Amazon SimpleDB

□Associated use case:
 Multi-data center replication
 Service availability
 Non-transactional data off-load

75
Database SPRAIN – Relaxed Consistency
□User: Wordnik
□Problem:
 MySQL too consistent –blocked access to data during inserts and
created numerous temp files to stay consistent.

□Solution: MongoDB
 Single word definition contains multiple data items from various
sources
 MongoDB stores data as a complete document
 Reduced the complexity of data storage

76
Database SPRAIN – Agility
□ Associated sub-driver: Polyglot persistence
 Choose most appropriate storage technology for app in development

□Example project/service/vendor
 MongoDB, CouchDB, Cassandra
 Google App Engine, SimpleDB, SQL Azure

□Associated use case:
 Mobile/remote device synchronization
 Agile development
 Data caching

77
Database SPRAIN – Agility
□ User: Dimagi BHOMA (Better Health Outcomes
through Mentoring and Assessments) project
□Problem:
 Deliver patient information to clinics despite a lack of reliable Internet
connections

□Solution: Apache CouchDB
 Replicates data from regional to national database
 When Internet connection, and power, is available
 Upload patient data from cell phones to local clinic

78
Database SPRAIN – Intricacy
□ Associated sub-driver: Big data, total data


Rising data volume, variety and velocity

□Example project/service/vendor
 Neo4j GraphDB, InfiniteGraph
 Apache Cassandra, Hadoop,
 VoltDB, Clustrix

□Associated use case:
 Social networking applications
 Geo-locational applications
 Configuration management database

79
Database SPRAIN – Intricacy
□User: Evident Software
□Problem:
 Mapping infrastructure dependencies for application performance
management

□Solution: Neo4j
 Apache Cassandra stores performance data
 Neo4j used to map the correlations between different elements
 Enables users to follow relationships between resources while
investigating issues

80
Database SPRAIN – Necessity
□ Associated sub-driver: Open source


The failure of existing suppliers to address the performance,
scalability and flexibility requirements of large-scale data processing

□ Example project/service/vendor





BigTable, Dynamo, MapReduce, Memcached
Hadoop HBase, Hypertable, Cassandra, Membase
Voldemort, Riak, BigCouch
MongoDB, Redis, CouchDB, Neo4J

□Associated use case:
 All of the above

81
Database SPRAIN – Necessity
□BigTable: Google
□Dynamo: Amazon
□Cassandra: Facebook
□HBase: Powerset
□Voldemort: LinkedIn
□Hypertable: Zvents
□Neo4j: Windh Technologies
 Yahoo: Apache Hadoop and Apache HBase
 Digg: Apache Cassandra
 Twitter: Apache Cassandra, Apache Hadoop and FlockDB

82

Mais conteúdo relacionado

Mais procurados

Minio ♥ Go
Minio ♥ GoMinio ♥ Go
Minio ♥ GoMinio
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseMohamed Tawfik
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
MySQL Architecture and Engine
MySQL Architecture and EngineMySQL Architecture and Engine
MySQL Architecture and EngineAbdul Manaf
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...confluent
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?Elvis Saravia
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database ReplicationMehdi Valikhani
 

Mais procurados (20)

Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
Minio ♥ Go
Minio ♥ GoMinio ♥ Go
Minio ♥ Go
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data Warehouse
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
MySQL Architecture and Engine
MySQL Architecture and EngineMySQL Architecture and Engine
MySQL Architecture and Engine
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 

Destaque

Cloud Music v1.0
Cloud Music v1.0Cloud Music v1.0
Cloud Music v1.0Steve Min
 
Scala overview
Scala overviewScala overview
Scala overviewSteve Min
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoopMinJae Kang
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Matthew (정재화)
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Steve Min
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
빅데이터_ISP수업
빅데이터_ISP수업빅데이터_ISP수업
빅데이터_ISP수업jrim Choi
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Cloud Computing v1.0
Cloud Computing v1.0Cloud Computing v1.0
Cloud Computing v1.0Steve Min
 
Apache Htrace overview (20160520)
Apache Htrace overview (20160520)Apache Htrace overview (20160520)
Apache Htrace overview (20160520)Steve Min
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoMatthew (정재화)
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter
 
[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)Steve Min
 

Destaque (17)

Cloud Music v1.0
Cloud Music v1.0Cloud Music v1.0
Cloud Music v1.0
 
Scala overview
Scala overviewScala overview
Scala overview
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoop
 
vertica_tmp_4.5
vertica_tmp_4.5vertica_tmp_4.5
vertica_tmp_4.5
 
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
 
BigData, Hadoop과 Node.js
BigData, Hadoop과 Node.jsBigData, Hadoop과 Node.js
BigData, Hadoop과 Node.js
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
빅데이터_ISP수업
빅데이터_ISP수업빅데이터_ISP수업
빅데이터_ISP수업
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Cloud Computing v1.0
Cloud Computing v1.0Cloud Computing v1.0
Cloud Computing v1.0
 
Apache Htrace overview (20160520)
Apache Htrace overview (20160520)Apache Htrace overview (20160520)
Apache Htrace overview (20160520)
 
Html5 video
Html5 videoHtml5 video
Html5 video
 
RESTful API Design, Second Edition
RESTful API Design, Second EditionRESTful API Design, Second Edition
RESTful API Design, Second Edition
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
 
[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)[SSA] 01.bigdata database technology (2014.02.05)
[SSA] 01.bigdata database technology (2014.02.05)
 

Semelhante a [SSA] 03.newsql database (2014.02.05)

VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Bobby Curtis
 
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015
MySQL's NoSQL  -- Texas Linuxfest August 22nd 2015MySQL's NoSQL  -- Texas Linuxfest August 22nd 2015
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015Dave Stokes
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationMarkus Michalewicz
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfScyllaDB
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierKellyn Pot'Vin-Gorman
 
Mysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New FeaturesMysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New FeaturesTarique Saleem
 
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
 Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL SupportMysql User Camp
 
Ohio Linux Fest -- MySQL's NoSQL
Ohio Linux Fest -- MySQL's NoSQLOhio Linux Fest -- MySQL's NoSQL
Ohio Linux Fest -- MySQL's NoSQLDave Stokes
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features" Anar Godjaev
 
Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Ted Wennmark
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8 Ted Wennmark
 
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...Principled Technologies
 
MV2ADB - Move to Oracle Autonomous Database in One-click
MV2ADB - Move to Oracle Autonomous Database in One-clickMV2ADB - Move to Oracle Autonomous Database in One-click
MV2ADB - Move to Oracle Autonomous Database in One-clickRuggero Citton
 
plug-into-cloud-wp-12c-1896100
plug-into-cloud-wp-12c-1896100plug-into-cloud-wp-12c-1896100
plug-into-cloud-wp-12c-1896100Prithvi Rajkumar
 

Semelhante a [SSA] 03.newsql database (2014.02.05) (20)

VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)
 
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015
MySQL's NoSQL  -- Texas Linuxfest August 22nd 2015MySQL's NoSQL  -- Texas Linuxfest August 22nd 2015
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015
 
Copy Data Management for the DBA
Copy Data Management for the DBACopy Data Management for the DBA
Copy Data Management for the DBA
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
 
Mysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New FeaturesMysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New Features
 
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
 Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
 
Ohio Linux Fest -- MySQL's NoSQL
Ohio Linux Fest -- MySQL's NoSQLOhio Linux Fest -- MySQL's NoSQL
Ohio Linux Fest -- MySQL's NoSQL
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8
 
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...
Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID con...
 
MV2ADB - Move to Oracle Autonomous Database in One-click
MV2ADB - Move to Oracle Autonomous Database in One-clickMV2ADB - Move to Oracle Autonomous Database in One-click
MV2ADB - Move to Oracle Autonomous Database in One-click
 
plug-into-cloud-wp-12c-1896100
plug-into-cloud-wp-12c-1896100plug-into-cloud-wp-12c-1896100
plug-into-cloud-wp-12c-1896100
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

[SSA] 03.newsql database (2014.02.05)

  • 1. [SSA] Big Data Analytics NewSQL Database 민형기 hg.min@samsung.com 2014. 2. 5.
  • 2. Contents I. Why NewSQL? II. NewSQL 기본 개념 III. NewSQL 종류 IV.NewSQL 정리 1
  • 4. Thinking – Extreme Data Qcon London 2012 3
  • 5. Thinking - Traffic Explosion 출처 : Netflix in the Cloud (http://www.slideshare.net/adrianco/netflix-in-the-cloud-2011) 4
  • 6. Organizations need deeper insights Qcon London 2012 5
  • 7. Solutions □Buy High end Technology □Higher more developers □Using NoSQL □Using NewSQL 6
  • 8. Solution – Buy High End Technology Oracle, IBM 7
  • 9. Solution – Higher more developers □Application Level Sharding □Build your replication middleware □… http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5 8
  • 10. Solutions – Use NoSQL □새로운 비 관계형 데이터 베이스 □분산 아키텍처 □수평 확장성 □고정된 테이블 스키마가 없음 □Join, UPDATE, DELETE 연산이 없음 □트랜잭션이 없음 □SQL 지원이 없음 9
  • 12. MongoDB □Document-oriented database  JSON-style documents: Lists, Maps, primitives  Schema-less □Transaction = update of a single document □Rich query language for dynamic queries □Tunable writes: speed reliability □Highly scalable and available 11
  • 13. MongoDB 사용예 □Use cases  High volume writes  Complex data  Semi-structured data □주요 고객      Foursquare Bit.ly Intuit SourceForge, NY Times GILT Groupe, Evite, SugarCRM 12
  • 14. Apache Cassandra □Column-oriented database/Extensible row store  Think Row ~= java.util.SortedMap □Transaction = update of a row □Fast writes = append to a log □Tunable reads/writes: consistency / availability □Extremely scalable  Transparent and dynamic clustering  Rack and datacenter aware data replication □CQL = “SQL”-like DDL and DML 13
  • 15. Apache Cassandra 사용 예 □사용 예      Big data Multiple Data Center distributed database Persistent cache (Write intensive) Logging High-availability (writes) □주요 고객  Digg, Facebook, Twitter, Reddit, Rackspace  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX  The largest production cluster has over 100 TB of data in over 150 machines.“ – Casssandra web site 14
  • 16. Solutions – Use NewSQL □새로운 관계형 데이터베이스 □SQL과 ACID 트랜잭션을 유지 □새롭고 개선된 분산 아키텍처 □뛰어난 확장성과 성능을 지원 □NewSQL vendors: TokuDB, ScaleDB, NimbusDB, ..., VoltDB 15
  • 18. NewSQL 정의 – Wikipedia NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for OLTP workloads while still maintaining the ACID guarantees of a traditional single-node database system http://en.wikipedia.org/wiki/NewSQL 17
  • 19. NewSQL 정의 – 451 Group A DBMS that delivers the scalability and flexibility promised by NoSQL while retaining the support for SQL queries and/or ACID, or to improve performance for appropriate workloads. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 18
  • 20. NewSQL 정의 – Stonbraker  SQL as the primary interface.  ACID support for transactions  Non-locking concurrency control.  High per-node performance.  Parallel, shared-nothing architecture. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 19
  • 21. NewSQL Category  New Database  New MySQL Storage Engines  Transparent Clustering 20
  • 22. The evolving database landscape OSBC 21
  • 25. New Database □ Newly designed from scratch to achieve scalability and performance  One of the key considerations in improving the performance is making non-disk (memory) or new kinds of disks (flash/SSD) the primary data store.  some (hopefully minor) changes to the code will be required and data migration is still needed. □Solutions  Software-Only: VoltDB, NuoDB, Drizzle, Google Spanner  Supported as an appliance: Clustrix, Translattice. http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 24
  • 26. New MySQL Storage Engines □Highly optimized storage engines for MySQL □Scale better than built-in engines, such as InnoDB.  Good part: the usage of the MySQL interface  Downside part: data migration from other databases □Solutions  TokuDB, MemSQL, Xeround, Akiban, NDB http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 25
  • 27. Transparent Clustering □Retain the OLTP databases in their original format, but provide a pluggable feature  Cluster transparently  Ensure Scalability □Avoid the rewrite code or perform any data migration □Solutions  Cluster transparently: Schooner MySQL, Continuent Tungsten, ScalArc  Ensure Scalability: ScaleBase, dbShards http://www.linuxforu.com/2012/01/newsql-handle-big-data/ 26
  • 28. NewSQL Products  VoltDB  Google Spanner  MySQL Cluster Architecture 27
  • 29. 28
  • 30. VoltDB □ VoltDB, 2010, AGPLv3/VoltDB Proprietary License, Java/C++ □ Type: NewSQL, New Database □ Main Point: In-memory Database, Java Stored Procedure, VoltDB implements the design of the academic H-Store project □ Protocol: SQL □ Transaction: Yes □ Data Storage: Memory □ Features □ in-memory relational database □ Durability thru replication, snapshots, logging □ Transparent partitioning □ ACID-level consistency □ Synchronous multi-master replication □ Database Replication http://voltdb.com/products-services/products, http://www.slideshare.net/chris.e.richardson/polygot-persistenceforjavadevs-jfokus2012reorgpptx 29
  • 31. VoltDB- Technical Overview  “OLTP Through the Looking Glass”  VoltDB는 전통 DB의 오버헤드 회피함 – 데이터 및 관련 프로세싱은 함께 분할 되고, CPU 코어 단위로 분산됨  Shared-Nothing Architecture  Horizontally Scale – 데이터는 메인 메모리에서 유지됨  버퍼관리가 필요 없음 – Transaction은 메모리에서 순차적으로 실행됨  no locking & latching – Synchronous multi-master replication: HA – Command Logging: “write-ahead” 데이터 로깅을 대체 하여 고성능을 제공 출처: http://odbms.org/download/VoltDBTechnicalOverview.pdf, http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf 30
  • 32. VoltDB - Partitions (1/3)  1 partition per physical CPU core – Each physical server has multiple VoltDB partitions  Data - Two types of tables – Partitioned Single column serves as partitioning key Rows are spread across all VoltDB partitions by partition column Transactional data (high frequency of modification) X X – Replicated All rows exist within all VoltDB partitions Relatively static data (low frequency of modification)  Code - Two types of work – both ACID – Single-Partition X X X All insert/update/delete operations within single partition Majority of transactional workload – Multi-Partition CRUD against partitioned tables across multiple partitions Insert/update/delete on replicated tables 출처: http://strataconf.com/stratany2013/public/schedule/detail/31731 31
  • 33. VoltDB - Partitions (2/3)  Single-partition vs. Multi-partition select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition Partition 1 1 1 4 101 101 401 1 2 3 knife spoon fork Partition 2 2 3 2 2 5 5 201 501 502 1 2 3 knife spoon fork Partition 3 1 3 2 3 6 6 201 601 601 1 2 3 knife spoon fork 1 1 2 table orders : (partitioned) customer_id (partition key) order_id product_id table products : product_id (replicated) product_name 32
  • 34. VoltDB - Partitions (3/3)  Looking inside a VoltDB partition… – Each partition contains data and an execution engine. – The execution engine contains a queue for transaction requests. – Requests are executed sequentially (single threaded). Work Queue execution engine Table Data Index Data - Complete copy of all replicated tables - Portion of rows (about 1/partitions) of all partitioned tables 33
  • 35. VoltDB - Compiling Stored Procedures Schema  The database is constructed from – The schema (DDL) – The work load (Java stored procedures) – The Project (users, groups, partitioning)  VoltCompiler creates application catalog – Copy to servers along with 1 .jar and 1 .so – Start servers CREATE TABLE HELLOWORLD ( HELLO CHAR(15), WORLD CHAR(15), DIALECT CHAR(15), PRIMARY KEY (DIALECT) ); import org.voltdb. * ; import org.voltdb. * ; import @ProcInfo( org.voltdb. * ; @ProcInfo( partitionInfo = "HELLOWORLD.DIA partitionInfo true "HE @ProcInfo( singlePartition = = partitionInfo = "HELLOWORLD.DIA )singlePartition = t singlePartition = true ) public class Insert extends VoltPr public final SQLStmt sql = public final SQLStmt public class Insert extends VoltPr new SQLStmt("INSERT INTO HELLO public VoltTable[] sql = public final SQLStmt run new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel public VoltTable[] run( String hel Project.xml <?xml version="1.0"?> <project> <database name='data <schema path='ddl. <partition table=‘ </database> </project> 34
  • 36. VoltDB - Transactions  All access to VoltDB is via Java stored procedures (Java + SQL)  A single invocation of a stored procedure is a transaction (committed on success) SQL  Limits round trips between DBMS and application  High performance client applications communicate synchronously with VoltDB 35
  • 37. VoltDB - Clusters/Durability  Scalability – Increase RAM in servers to add capacity – Add servers to increase performance / capacity – Consistently measuring 90% of single-node performance increase per additional node  High availability – K-safety for redundancy  Snapshots – Scheduled, continuous, on demand  Spooling to data warehouse  Disaster Recovery/WAN replication (Future) – Asynchronous replication 36
  • 39. Google Spanner Overview 구글의 확장성, 다중버전, 전 세계적으로 분산, 동기적으로 복제된 데이터베이스  Google, 2012, Paper, C++  Type: NewSQL, New Database  Main Point: Distributed multiversion database  특징 – General-purpose transactions (ACID) – SQL query language – Schematized tables – Semi-relational data model  Running in production – Storage for Google’s ad data(F1) – Replaced a sharded MySQL database 출처: http://research.google.com/archive/spanner.html 38
  • 40. Google Spanner - Key Features  Temporal Multi-version database  Externally consistent global write-transactions with synchronous replication.  Transactions across Datacenters.  Lock-free read-only transactions.  Schematized, semi-relational (tabular) data model.  SQL-like query interface.  Auto-sharding, auto-rebalancing, automatic failure response.  Exposes control of data replication and placement to user/application.  Enables transaction serialization via global timestamps  Acknowledges clock uncertainty and guarantees a bound on it  Uses novel TrueTime API to accomplish concurrency control  Uses GPS devices and Atomic clocks to get accurate time 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 39
  • 41. Google Spanner - Design Goals 출처: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf 40
  • 42. Google Spanner - Architecture • • • • • Universe: Spanner 전체 집합 Zone: 물리적으로 독립되어 운영할 수 있는 단위 1대의 존마스터(zonemaster)+ 수백~수천 대의 스팬서버(spanserver)로 구성됨 존마스터는 스팬서버에 데이터를 할당, 스팬서버는 실제 데이터를 저장하고 처리 로케이션 프록시: 클라이언트에 의해 호출되어, 접근해야 할 데이터가 어느 스팬서버에 있는지 알려줌 출처: http://helloworld.naver.com/helloworld/216593 41
  • 43. Google Spanner - Software Stack (1/3) • • • • 스팬서버는 100~1000개의 태블릿(tablet)을 관리. 태블릿: ‘(key:string, timestamp:int64) string’ 형태의 매핑을 다수개 저장, 멀티다중 버전 데이터베이스 특성 태블릿의 상태는 B-트리 파일과 WAL로 CFS에 저장됨 스팬서버간 데이터 복제를 지원하기 위해 팍소스 스테이트 머신을 이용 출처: http://helloworld.naver.com/helloworld/216593 42
  • 44. Google Spanner - Software Stack (2/3) (key:string, timestamp:int64) → string  Back End: Colossus (successor to GFS)  Paxos State Machine on top of each tablet stores meta data and logs of the tablet.  Leader among replicas in a Paxos group is chosen and all write requests for replicas in that group initiate at leader.  Transaction Leader –Is Paxos Leader if transaction involves one Paxos group 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 43
  • 45. Google Spanner - Software Stack (3/3)  Directory – 같은 접두어를 사용하는 연속된 키 모음 – 빅테이블(BigTable)의 버킷과 유사 – 데이터 배치의 최소 단위 – 지리적인 리플리카 배치의 최소단위  디렉터리의 크기가 너무 커질 경우, 하나의 디렉터리를 여러 개의 프래그먼트(fragment)로 분할 가능 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 44
  • 46. Google Spanner - Data Model  One or more databases supported in Spanner Universe  Database can contain unlimited schematized tables  Not purely relational – Requires rows to have names – Names are nothing but a set(can be singleton) of primary keys – In a way, it’s a key value store with primary keys mapped to non-key columns as values 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 45
  • 47. Google Spanner - TrueTime  “Global wall-clock time” with bounded uncertainty  Novel API behind Spanner’s core innovation  Leverages hardware features like GPS and Atomic Clocks  Implemented via TrueTime API. – Key method being now() which not only returns current system time but also another value (ε) which tells the maximum uncertainty in the time returned  Set of time master server per datacenters and time slave daemon per machines.  Majority of time masters are GPS fitted and few others are atomic clock fitted (Armageddon masters).  Daemon polls variety of masters and reaches a consensus about correct timestamp. 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 46
  • 48. Google Spanner - True Time TT.now() earliest time latest 2*ε 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 47
  • 49. Google Spanner - TrueTime Transaction  Read-Write – requires lock.  Read-Only – lock free. – Requires declaration before start of transaction. – Reads information that is up to date  Snapshot Read – Read information from past by specifying a timestamp or bound – Use specifies specific timestamp from past or timestamp bound so that data till that point will be read. 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 48
  • 50. Google Spanner - Evaluation(1/2) Evaluated for replication, transactions and availability. Results on epsilon of TrueTime Benchmarked on Spanner System with –50 Paxos groups –250 Directories –Clients(applicatons) and Zones are at a network distance of 1ms 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt 49
  • 51. Google Spanner - Evaluation (2/2) Effect of killing servers on throughput 출처: http://www.cse.buffalo.edu/~okennedy/courses/cse704fa2012/9.2-Spanner.ppt Distribution of TrueTime Epsilon values 50
  • 53. Schooner MySQL Active Cluster http://www.snia.org/sites/default/files2/SDC2011/presentations/monday/DrJohnBuschHow_ScaleUp_Scale_Out.pdf 52
  • 56. Database 업계의 3가지 Trends □NoSQL 데이터베이스:  분산 아키텍처의 확장성 등의 요구 사항을 충족하며, 스키마 없는 데이터 관리 요구 사항에 부합하도록 설계됨. □NewSQL 데이터베이스:  분산 아키텍처의 확장성 등의 요구 사항을 충족하거나 혹은 수평 확장을 필요로하지 않지만 성능을 개선은 되도록 설계됨. □Data Grid/Cache 제품:  응용 프로그램 및 데이터베이스 성능을 높이기 위해 메모리에 데이터를 저장하도록 설계됨. 55
  • 57. 결론 □ 데이터 저장을 위한 많은 솔루션이 존재 □ Oracle, MySQL만 있다는 생각은 버려야 함 □ 먼저 시스템의 데이터 속성과 요구사항을 파악(CAP, ACID/BASE) □ 한 시스템에 여러 솔루션을 적용  소규모/복잡한 관계 데이터: RDBMS  대규모 실시간 처리 데이터: NoSQL, NewSQL  대규모 저장용 데이터: Hadoop 등 □ 적절한 솔루션 선택 □ 반드시 운영 중 발생할 수 있는 이슈에 대해 검증 후 도입 필요 □ 대부분의 NewSQL 솔루션은 베타 상태(섣부른 선택은 독이 될 수 있음) □ 솔루션의 프로그램 코드 수준으로 검증 필요 □ NewSQL 솔루션에 대한 안정성 확보 □ 솔루션 자체의 안정성은 검증이 필요하며 현재의 DBMS 수준의 안정성은 지원하 지 않음 □ 반드시 안정적인 데이터 저장 방안 확보 후 적용 필요 □ 운영 및 개발 경험을 가진 개발자 확보 어려움 □ 요구사항에 부합되는 NewSQL 선정 필요 □ 처음부터 중요 시스템에 적용하기 보다는 시범 적용 필요 □ 선정된 솔루션 검증, 기술력 내재화 56
  • 60. Early – 2000s □All the big players were heavyweight and expensive.  Oracle, DB2, Sybase, SQL Server, etc. □Open-source databases were missing important features.  Postgres, mSQL, and MySQL. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 59
  • 61. Early – 2000s : eBay Architecture http://highscalability.com/ebay-architecture 60
  • 62. Early – 2000s : eBay Architecture  Push functionality to application:  Joins  Referential integrity  Sorting done  No distributed transactions http://highscalability.com/ebay-architecture 61
  • 63. Mid– 2000s □MySQL + InnoDB is widely adopted by new web companies:  Supported transactions, replication, recovery.  Still must use custom middleware to scale out across multiple machines.  Memcache for caching queries. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 62
  • 64. Mid – 2000s : Facebook Architecture http://www.techthebest.com/2011/11/29/technology-used-in-facebook/ 63
  • 65. Mid – 2000s : Facebook Architecture  Scale out using custom middleware.  Store ~75% of database in Memcache.  No distributed transactions. http://www.techthebest.com/2011/11/29/technology-used-in-facebook/ 64
  • 66. Late – 2000s □MySQL + InnoDB is widely adopted by new web companies:  Supported transactions, replication, recovery.  Still must use custom middleware to scale out across multiple machines.  Memcache for caching queries. http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 65
  • 67. Late – 2000s : MongoDB Architecture http://sett.ociweb.com/sett/settAug2011.html 66
  • 68. Late – 2000s : MongoDB Architecture  Easy to use.  Becoming more like a DBMS over time.  No transactions. http://sett.ociweb.com/sett/settAug2011.html 67
  • 69. Early – 2010s □New DBMSs that can scale across multiple machines natively and provide ACID guarantees.  MySQL Middleware  Brand New Architectures http://www.cs.brown.edu/courses/cs227/slides/newsql/newsql-intro.pdf 68
  • 71. Database SPRAIN □“An injury to ligaments... caused by being stretched beyond normal capacity” □Six key drivers for NoSQL/NewSQL/DDG adoption       Scalability Performance Relaxed consistency Agility Intricacy Necessity 70
  • 72. Database SPRAIN - Scalability □Associated sub-driver: Hardware economics  Scale-out across clusters of commodity servers □Example project/service/vendor  BigTable HBase Riak MongoDB Couchbase, Hadoop  Amazon RDS, Xeround, SQL Azure, NimbusDB  Data grid/cache □Associated use case:  Large-scale distributed data storage  Analysis of continuously updated data  Multi-tenant PaaS data layer 71
  • 73. Database SPRAIN - Scalability □User: StumbleUpon □Problem:  Scaling problems with recommendation engine on MySQL □Solution: HBase     Started using Apache HBase to provide real-time analytics on Su.pr MySQL lacked the performance headroom and scale Multiple benefits including avoiding declaring schema Enables the data to be used for multiple applications and use cases 72
  • 74. Database SPRAIN - Performance □Associated sub-driver: MySQL limitations  Inability to perform consistently at scale □Example project/service/vendor  Hypertable Couchbase Membrain MongoDB Redis  Data grid/cache  VoltDB, Clustrix □Associated use case:  Real time data processing of mixed read/write workloads  Data caching  Large-scale data ingestion 73
  • 75. Database SPRAIN - Performance □User: AOL Advertising □Problem:  Real-time data processing to support targeted advertising □Solution: Membase Server     Segmentation analysis runs in CDH, results passed into Membase Make use of its sub-millisecond data delivery More time for analysis as part of a 40ms targeted and response time Also real time log and event management 74
  • 76. Database SPRAIN – Relaxed Consistency □Associated sub-driver: CAP theorem  The need to relax consistency in order to maintain availability □Example project/service/vendor:  Dynamo, Voldemort, Cassandra  Amazon SimpleDB □Associated use case:  Multi-data center replication  Service availability  Non-transactional data off-load 75
  • 77. Database SPRAIN – Relaxed Consistency □User: Wordnik □Problem:  MySQL too consistent –blocked access to data during inserts and created numerous temp files to stay consistent. □Solution: MongoDB  Single word definition contains multiple data items from various sources  MongoDB stores data as a complete document  Reduced the complexity of data storage 76
  • 78. Database SPRAIN – Agility □ Associated sub-driver: Polyglot persistence  Choose most appropriate storage technology for app in development □Example project/service/vendor  MongoDB, CouchDB, Cassandra  Google App Engine, SimpleDB, SQL Azure □Associated use case:  Mobile/remote device synchronization  Agile development  Data caching 77
  • 79. Database SPRAIN – Agility □ User: Dimagi BHOMA (Better Health Outcomes through Mentoring and Assessments) project □Problem:  Deliver patient information to clinics despite a lack of reliable Internet connections □Solution: Apache CouchDB  Replicates data from regional to national database  When Internet connection, and power, is available  Upload patient data from cell phones to local clinic 78
  • 80. Database SPRAIN – Intricacy □ Associated sub-driver: Big data, total data  Rising data volume, variety and velocity □Example project/service/vendor  Neo4j GraphDB, InfiniteGraph  Apache Cassandra, Hadoop,  VoltDB, Clustrix □Associated use case:  Social networking applications  Geo-locational applications  Configuration management database 79
  • 81. Database SPRAIN – Intricacy □User: Evident Software □Problem:  Mapping infrastructure dependencies for application performance management □Solution: Neo4j  Apache Cassandra stores performance data  Neo4j used to map the correlations between different elements  Enables users to follow relationships between resources while investigating issues 80
  • 82. Database SPRAIN – Necessity □ Associated sub-driver: Open source  The failure of existing suppliers to address the performance, scalability and flexibility requirements of large-scale data processing □ Example project/service/vendor     BigTable, Dynamo, MapReduce, Memcached Hadoop HBase, Hypertable, Cassandra, Membase Voldemort, Riak, BigCouch MongoDB, Redis, CouchDB, Neo4J □Associated use case:  All of the above 81
  • 83. Database SPRAIN – Necessity □BigTable: Google □Dynamo: Amazon □Cassandra: Facebook □HBase: Powerset □Voldemort: LinkedIn □Hypertable: Zvents □Neo4j: Windh Technologies  Yahoo: Apache Hadoop and Apache HBase  Digg: Apache Cassandra  Twitter: Apache Cassandra, Apache Hadoop and FlockDB 82