SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Best Better practice of Cassandra
Cassandraに不向きなCassandraデータモデリング基礎
Hayato Tsutsumi
Works Applications
Hayato Tsutsumi
堤 勇人
Cassandra experience :
7 years (from ver.0.6)
Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes:about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.
Works Applications Co., Ltd
自己紹介 Speaker
Target
● Mid-range System
Data size
1TB ~ 1PB
Data amount
10 Mil ~ 100 Bil
+ high speed processing
What is better?
Best practice = Right people, right place
適材適所は確かにベスト
Suitable
Data
But not all data is suitable
じゃあベストじゃない部分は?
Our system
Suitable
Data
Un-suitable data for Cassandra
Use both Cassandra and RDB?
O*
or
M*
Suitable
Data
You may use only RDB...
O*
or
M*
Suitable
Data
Another way : Manage data only with Cassandra
Suitable
Data
3 models unlike NoSQL
Historical data
履歴管理データ
Tree structure
ツリー構造
Summarized data
計上データ
How Cassandra
read data in 3mins
前
提
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB text,
ckA text,
ckB text,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
where pkA = "pkA1"; //NG
where pkA = "pkA1" and pkB = "pkB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB >= "ckB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckA < "ckA2"; //OK
Historical data
履歴管理データ
photo by Bryan Wright
(https://secure.flickr.com/photos/spidermandragon5/2922128673/)
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, e, no)
);
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, no)
);
CREATE CUSTOM INDEX fn_e ON
emp_history (e) USING
'org.apache.cassandra.index.sasi.
SASIIndex';
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index
Tree structure
ツリー構造
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
判断のポイント
Criteria
● No join, recursive query
● Anyway need consistency
● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ
不可
● 整合性はどの道別の方
法で取る必要がある
● ジェイウォーク、非正規化
も当たり前
ツリー構造への要求
Requirement to tree model
● show ancestors
● show children
● show descendants
● show sibilings of a
● あるノードからルートまで
の全ての親を取得
● 子供を1段展開
● 子供を全て展開
● 兄弟を取得
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
Worth considering!
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
select * from pathenum
where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A
; U+003B
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
//show ancestors
fqdn.split(":");
//show children of a
select child from pathenum where id = 'test' and
fqdn = 'A:a';
//show descendants of A
select * from fqdntest where id = 'test' and
fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of a
select p from fqdntest where
id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
pros
- one access
cons
- hot spot
- range slice
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestors
select p from closure_path where c = '1';
select * from closure_main where id in [?];
//show children of a
select c from closure_path where p = 'a' and d
= 1;
select * from closure_main where id in [?];
//show descendants of A
select c from closure_path where p = 'A';
select * from closure_main where id in [?];
//show sibilings of a
//load a's parent = A
select * from closure_path where c = 'a';
select c from closure_path where p = 'A' and d
= 1;
select * from closure_main where id in [?];
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
How increase data?
When assume n-children per
node and d-depth tree,
number of data will be
proportional to d.
Summarized
data
計上データ
伝票集計処理
Aggregation of slips
Dr. Cr.
A 200 B 50
C 150
伝票集計処理
Aggregation of slips
parallel batch processing
aggregation
online streaming
要求水準
Requirements
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
= Consistency!
要求水準
Requirements
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v counter
);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v int
);
UPDATE countup set v = 101 where id = 'test' if v =
100;
Use update with LWT
What is the best?
Thanks!

Mais conteúdo relacionado

Semelhante a Cassandraに不向きなcassandraデータモデリング基礎

SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapRodolphe Quiédeville
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CDavid Wheeler
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfssuser785ce21
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution planspaulguerin
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsWorkhorse Computing
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserKiwamu Okabe
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Editionddiers
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scalefabxc
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoMark Wong
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl TechniquesDave Cross
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model Perforce
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)charsbar
 

Semelhante a Cassandraに不向きなcassandraデータモデリング基礎 (20)

SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning C
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Csql Cache Presentation
Csql Cache PresentationCsql Cache Presentation
Csql Cache Presentation
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdf
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parser
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Edition
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)
 

Último

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 

Último (20)

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 

Cassandraに不向きなcassandraデータモデリング基礎

  • 1. Best Better practice of Cassandra Cassandraに不向きなCassandraデータモデリング基礎 Hayato Tsutsumi Works Applications
  • 2. Hayato Tsutsumi 堤 勇人 Cassandra experience : 7 years (from ver.0.6) Certification for Apache Cassandra Administarator Data size : about 40TB Nodes:about 40 (would increase soon...) Twitter : 2t3 Site Reliability Engineering Div. Works Applications Co., Ltd 自己紹介 Speaker
  • 3. Target ● Mid-range System Data size 1TB ~ 1PB Data amount 10 Mil ~ 100 Bil + high speed processing
  • 5. Best practice = Right people, right place 適材適所は確かにベスト Suitable Data
  • 6. But not all data is suitable じゃあベストじゃない部分は? Our system Suitable Data Un-suitable data for Cassandra
  • 7. Use both Cassandra and RDB? O* or M* Suitable Data
  • 8. You may use only RDB... O* or M* Suitable Data
  • 9. Another way : Manage data only with Cassandra Suitable Data
  • 10. 3 models unlike NoSQL Historical data 履歴管理データ Tree structure ツリー構造 Summarized data 計上データ
  • 11. How Cassandra read data in 3mins 前 提
  • 12. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 13. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 14. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key where pkA = "pkA1"; //NG where pkA = "pkA1" and pkB = "pkB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK where pkA = "pkA1" and ckA = "ckA1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB >= "pkB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK
  • 15. Historical data 履歴管理データ photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)
  • 16. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21
  • 17. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21 at 3/25 emp001 emp002 emp003 B Div. A Div. E Div. at 4/25 emp001 emp002 emp003 C Div. D Div. E Div.
  • 18. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no) ); select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG ?
  • 19. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no) ); CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi. SASIIndex'; select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK use custom index
  • 21. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table
  • 22. 判断のポイント Criteria ● No join, recursive query ● Anyway need consistency ● Jaywalk or denormalization is Natural ● JOIN、再帰問い合わせ 不可 ● 整合性はどの道別の方 法で取る必要がある ● ジェイウォーク、非正規化 も当たり前
  • 23. ツリー構造への要求 Requirement to tree model ● show ancestors ● show children ● show descendants ● show sibilings of a ● あるノードからルートまで の全ての親を取得 ● 子供を1段展開 ● 子供を全て展開 ● 兄弟を取得
  • 24. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table Worth considering!
  • 25. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); id fqdn child code test A [a,b] A test A:a [1,2] a test A:b [3,4,5] b test A:a:1 1 test A:a:2 2 test A:b:3 3 test A:b:4 4 test A:b:5 5 A a b 1 2 3 4 5
  • 26. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG It needs 'like' search A a b 1 2 3 4 5
  • 27. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG select * from pathenum where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK : U+003A ; U+003B It needs 'like' search A a b 1 2 3 4 5
  • 28. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); //show ancestors fqdn.split(":"); //show children of a select child from pathenum where id = 'test' and fqdn = 'A:a'; //show descendants of A select * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //show sibilings of a select p from fqdntest where id = 'test' and fqdn = 'A'; A a b 1 2 3 4 5
  • 29. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); pros - one access cons - hot spot - range slice - complex process when update pros & cons
  • 30. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); id v A A Div. a a Dept. b b Dept. 1 1 Sec. 2 2 Sec. 3 3 Sec. 4 4 Sec. 5 5 Sec. p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0
  • 31. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0 //show ancestors select p from closure_path where c = '1'; select * from closure_main where id in [?]; //show children of a select c from closure_path where p = 'a' and d = 1; select * from closure_main where id in [?]; //show descendants of A select c from closure_path where p = 'A'; select * from closure_main where id in [?]; //show sibilings of a //load a's parent = A select * from closure_path where c = 'a'; select c from closure_path where p = 'A' and d = 1; select * from closure_main where id in [?];
  • 32. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex';
  • 33. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; How increase data? When assume n-children per node and d-depth tree, number of data will be proportional to d.
  • 36. 伝票集計処理 Aggregation of slips parallel batch processing aggregation online streaming
  • 37. 要求水準 Requirements ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる
  • 38. ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる = Consistency! 要求水準 Requirements
  • 39. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v counter ); UPDATE countup SET v = v + 1 WHERE id = 'test'; Use Counter...? No.
  • 40. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v int ); UPDATE countup set v = 101 where id = 'test' if v = 100; Use update with LWT
  • 41. What is the best?