SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Best Better practice of Cassandra
Cassandraに不向きなCassandraデータモデリング基礎
Hayato Tsutsumi
Works Applications
Hayato Tsutsumi
堤 勇人
Cassandra experience :
7 years (from ver.0.6)
Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes:about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.
Works Applications Co., Ltd
自己紹介 Speaker
Target
● Mid-range System
Data size
1TB ~ 1PB
Data amount
10 Mil ~ 100 Bil
+ high speed processing
What is better?
Best practice = Right people, right place
適材適所は確かにベスト
Suitable
Data
But not all data is suitable
じゃあベストじゃない部分は?
Our system
Suitable
Data
Un-suitable data for Cassandra
Use both Cassandra and RDB?
O*
or
M*
Suitable
Data
You may use only RDB...
O*
or
M*
Suitable
Data
Another way : Manage data only with Cassandra
Suitable
Data
3 models unlike NoSQL
Historical data
履歴管理データ
Tree structure
ツリー構造
Summarized data
計上データ
How Cassandra
read data in 3mins
前
提
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB text,
ckA text,
ckB text,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
where pkA = "pkA1"; //NG
where pkA = "pkA1" and pkB = "pkB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB >= "ckB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckA < "ckA2"; //OK
Historical data
履歴管理データ
photo by Bryan Wright
(https://secure.flickr.com/photos/spidermandragon5/2922128673/)
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, e, no)
);
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, no)
);
CREATE CUSTOM INDEX fn_e ON
emp_history (e) USING
'org.apache.cassandra.index.sasi.
SASIIndex';
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index
Tree structure
ツリー構造
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
判断のポイント
Criteria
● No join, recursive query
● Anyway need consistency
● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ
不可
● 整合性はどの道別の方
法で取る必要がある
● ジェイウォーク、非正規化
も当たり前
ツリー構造への要求
Requirement to tree model
● show ancestors
● show children
● show descendants
● show sibilings of a
● あるノードからルートまで
の全ての親を取得
● 子供を1段展開
● 子供を全て展開
● 兄弟を取得
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
Worth considering!
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
select * from pathenum
where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A
; U+003B
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
//show ancestors
fqdn.split(":");
//show children of a
select child from pathenum where id = 'test' and
fqdn = 'A:a';
//show descendants of A
select * from fqdntest where id = 'test' and
fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of a
select p from fqdntest where
id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
pros
- one access
cons
- hot spot
- range slice
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestors
select p from closure_path where c = '1';
select * from closure_main where id in [?];
//show children of a
select c from closure_path where p = 'a' and d
= 1;
select * from closure_main where id in [?];
//show descendants of A
select c from closure_path where p = 'A';
select * from closure_main where id in [?];
//show sibilings of a
//load a's parent = A
select * from closure_path where c = 'a';
select c from closure_path where p = 'A' and d
= 1;
select * from closure_main where id in [?];
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
How increase data?
When assume n-children per
node and d-depth tree,
number of data will be
proportional to d.
Summarized
data
計上データ
伝票集計処理
Aggregation of slips
Dr. Cr.
A 200 B 50
C 150
伝票集計処理
Aggregation of slips
parallel batch processing
aggregation
online streaming
要求水準
Requirements
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
= Consistency!
要求水準
Requirements
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v counter
);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v int
);
UPDATE countup set v = 101 where id = 'test' if v =
100;
Use update with LWT
What is the best?
Thanks!

Mais conteúdo relacionado

Semelhante a Cassandraに不向きなcassandraデータモデリング基礎

SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapRodolphe Quiédeville
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CDavid Wheeler
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfssuser785ce21
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution planspaulguerin
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsWorkhorse Computing
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserKiwamu Okabe
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Editionddiers
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scalefabxc
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoMark Wong
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl TechniquesDave Cross
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model Perforce
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)charsbar
 

Semelhante a Cassandraに不向きなcassandraデータモデリング基礎 (20)

SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning C
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Csql Cache Presentation
Csql Cache PresentationCsql Cache Presentation
Csql Cache Presentation
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdf
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parser
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Edition
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)
 

Último

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stageAbc194748
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfsmsksolar
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 

Último (20)

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

Cassandraに不向きなcassandraデータモデリング基礎

  • 1. Best Better practice of Cassandra Cassandraに不向きなCassandraデータモデリング基礎 Hayato Tsutsumi Works Applications
  • 2. Hayato Tsutsumi 堤 勇人 Cassandra experience : 7 years (from ver.0.6) Certification for Apache Cassandra Administarator Data size : about 40TB Nodes:about 40 (would increase soon...) Twitter : 2t3 Site Reliability Engineering Div. Works Applications Co., Ltd 自己紹介 Speaker
  • 3. Target ● Mid-range System Data size 1TB ~ 1PB Data amount 10 Mil ~ 100 Bil + high speed processing
  • 5. Best practice = Right people, right place 適材適所は確かにベスト Suitable Data
  • 6. But not all data is suitable じゃあベストじゃない部分は? Our system Suitable Data Un-suitable data for Cassandra
  • 7. Use both Cassandra and RDB? O* or M* Suitable Data
  • 8. You may use only RDB... O* or M* Suitable Data
  • 9. Another way : Manage data only with Cassandra Suitable Data
  • 10. 3 models unlike NoSQL Historical data 履歴管理データ Tree structure ツリー構造 Summarized data 計上データ
  • 11. How Cassandra read data in 3mins 前 提
  • 12. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 13. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 14. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key where pkA = "pkA1"; //NG where pkA = "pkA1" and pkB = "pkB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK where pkA = "pkA1" and ckA = "ckA1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB >= "pkB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK
  • 15. Historical data 履歴管理データ photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)
  • 16. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21
  • 17. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21 at 3/25 emp001 emp002 emp003 B Div. A Div. E Div. at 4/25 emp001 emp002 emp003 C Div. D Div. E Div.
  • 18. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no) ); select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG ?
  • 19. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no) ); CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi. SASIIndex'; select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK use custom index
  • 21. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table
  • 22. 判断のポイント Criteria ● No join, recursive query ● Anyway need consistency ● Jaywalk or denormalization is Natural ● JOIN、再帰問い合わせ 不可 ● 整合性はどの道別の方 法で取る必要がある ● ジェイウォーク、非正規化 も当たり前
  • 23. ツリー構造への要求 Requirement to tree model ● show ancestors ● show children ● show descendants ● show sibilings of a ● あるノードからルートまで の全ての親を取得 ● 子供を1段展開 ● 子供を全て展開 ● 兄弟を取得
  • 24. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table Worth considering!
  • 25. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); id fqdn child code test A [a,b] A test A:a [1,2] a test A:b [3,4,5] b test A:a:1 1 test A:a:2 2 test A:b:3 3 test A:b:4 4 test A:b:5 5 A a b 1 2 3 4 5
  • 26. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG It needs 'like' search A a b 1 2 3 4 5
  • 27. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG select * from pathenum where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK : U+003A ; U+003B It needs 'like' search A a b 1 2 3 4 5
  • 28. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); //show ancestors fqdn.split(":"); //show children of a select child from pathenum where id = 'test' and fqdn = 'A:a'; //show descendants of A select * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //show sibilings of a select p from fqdntest where id = 'test' and fqdn = 'A'; A a b 1 2 3 4 5
  • 29. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); pros - one access cons - hot spot - range slice - complex process when update pros & cons
  • 30. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); id v A A Div. a a Dept. b b Dept. 1 1 Sec. 2 2 Sec. 3 3 Sec. 4 4 Sec. 5 5 Sec. p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0
  • 31. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0 //show ancestors select p from closure_path where c = '1'; select * from closure_main where id in [?]; //show children of a select c from closure_path where p = 'a' and d = 1; select * from closure_main where id in [?]; //show descendants of A select c from closure_path where p = 'A'; select * from closure_main where id in [?]; //show sibilings of a //load a's parent = A select * from closure_path where c = 'a'; select c from closure_path where p = 'A' and d = 1; select * from closure_main where id in [?];
  • 32. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex';
  • 33. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; How increase data? When assume n-children per node and d-depth tree, number of data will be proportional to d.
  • 36. 伝票集計処理 Aggregation of slips parallel batch processing aggregation online streaming
  • 37. 要求水準 Requirements ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる
  • 38. ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる = Consistency! 要求水準 Requirements
  • 39. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v counter ); UPDATE countup SET v = v + 1 WHERE id = 'test'; Use Counter...? No.
  • 40. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v int ); UPDATE countup set v = 101 where id = 'test' if v = 100; Use update with LWT
  • 41. What is the best?