41. Stado Cluster #2
• Multiple spindles to take advantage of multi-core
processor even within a single server.
– Multiple tablespaces are assigned to each dedicated disk.
Coordinator
Agent
Postgres
tblspc1 tblspc2 tblspc3
testdb1
testdb2
testdb4
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
42. DBT-3 Schema
• Scale factor “10”
– >25GB, including its indexes.
Table Name Purpose # of Records Size Partition Key
customer 顧客マスタ 1,500,000 289,046,528 c_custkey
lineitem 注文明細 59,986,052 8,727,838,720 l_orderkey
nation 国名マスタ 25 8,192 -
orders 注文 15,000,000 2,074,853,376 o_orderkey
part 部品マスタ 2,000,000 332,816,384 p_partkey
partsupp 部品詳細マスタ 8,000,000 1,400,143,872 ps_partkey
region 地域マスタ 5 8,192 -
supplier 販売業者マスタ 100,000 17,858,560 -
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
43. Stado Test #1
• Data Loading / データロード
• Creating indexes / インデックス作成
• Aggregation (count, sum) / 集約(count, sum)
• Aggregation + GROUP BY (daily basis, country basis) /
集約+GROUP BY (日別売上、国別売上)
• JOIN (local) / 結合(ノード内)
• JOIN (cross-node) / 結合(ノード間)
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
44. Summary – Cluster #1
Stado Performance Comparison
(Stado Cluster #1)
3.50
3.00
(compared to standalone)
2.50
Execution Time
Standalone
2.00 1-node
2-nodes
1.50 4-nodes
8-nodes
1.00
0.50
0.00
Loading Building Query 1 Query 2 Query 3 Query 4 Query 5
Tables Indexes
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
45. Summary – Cluster #2
Stado Performance Comparison
(Stado Cluster #2)
1.20
1.00
(compared to standalone)
0.80
Execution Time
Standalone
1-disk
0.60
2-disks
4-disks
0.40
0.20
0.00
Loading Building Query 1 Query 2 Query 3 Query 4 Query 5
Tables Indexes
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
46. Loading Tables
• Loading tables did not take advantage of the cluster.
– Loading 8 tables using gs-loader (for Stado) or psql (for
Standalone)
– Parallel loading seems like highly CPU intensive task.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
3.50 1.20
3.00 1.00
(compared to standalone)
(compared to standalone)
2.50
Standalone 0.80
Execution Time
Execution Time
Standalone
2.00 1-node
1-disk
2-nodes 0.60
2-disks
1.50 4-nodes
4-disks
8-nodes 0.40
1.00
0.20
0.50
0.00 0.00
Loading Tables Loading Tables
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
47. Building Indexes
• Building indexes could benefit by adding servers and/or
spindles.
– Elapsed time shows linear scalability as adding nodes.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Building Indexes Building Indexes
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
48. Aggregation (Query 1, 2)
• Simple scan and aggregation can take advantage of the
cluster.
– Executing SUM() and COUNT() on “orders” table.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Query 1 Query 1
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
49. Aggregation + GROUP BY (Query 3)
• “Aggregation + GROUP BY” as well.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Query 3 Query 3
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
50. JOIN (local) (Query 4)
• “Local Join” between partitioned tables with using a
partition key could scale.
– “orders” and “lineitem” are joined with “orderkey”.
– More CPU power needed.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
2.50 1.20
1.00
2.00
(compared to standalone)
(compared to standalone)
Standalone 0.80
Execution Time
Execution Time
1.50 Standalone
1-node
1-disk
2-nodes 0.60
2-disks
4-nodes
1.00 4-disks
8-nodes 0.40
0.50
0.20
0.00 0.00
Query 4 Query 4
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
51. JOIN (cross-node) (Query 5)
• “Cross-node Join” may not be able to scale.
– Particularly, large “Cross-node Join” must be critical for SQL
performance.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
2.50 1.20
1.00
2.00
(compared to standalone)
(compared to standalone)
Standalone 0.80
Execution Time
Execution Time
1.50 Standalone
1-node
1-disk
2-nodes 0.60
2-disks
1.00 4-nodes
4-disks
8-nodes 0.40
0.50
0.20
0.00 0.00
Query 5 Query 5
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
52. Behind the scene: “cross-node” join
• Collect all rows (Row Shipping) from the entire
cluster into temp table. Node #1
• Process a JOIN which can be done within the
node. customer (1)
• Process a JOIN against the temp table. JOIN (1) orders (1)
lineitem (1)
• Row Shipping is very expensive!
JOIN (2) temp
Row Shipping
Node #2 Node #3 Node #4
customer (2) customer (3) customer (4)
orders (2) orders (3) orders (4)
lineitem (2) lineitem (3) lineitem (4)
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
53. Resource usage: “cross-node” join
http://www2.uptimeforce.com/pgstatview2/20b267567a8f134b970070cc1a212a05/
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
54. Stado Test #2
• DBT-3 Queries
– Q1 - Aggregations on a single partitioned table.
– Q2 - Joining two partitioned tables with a partition key, and also
joining dimension tables.
– Q4 - Joining two partitioned tables with a partition key, and
aggregation.
– Q6 - Aggregations on a single partitioned table, and aggregation.
– Q11 - Joining a partitioned table and dimension tables, and
aggregation.
– Q12 - Joining two partitioned table, and aggregation.
– Q16 - Joining two partitioned tables with a partition key, also
joining dimension tables, and aggregation.
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
55. Summary – Cluster #1
Stado Performance Comparison
(Stado Cluster #1)
2
1.8
1.6
1.4
(compared to standalone)
Execution Time
1.2 Standalone
1-node
1 2-nodes
4-nodes
0.8 8-nodes
0.6
0.4
0.2
0
Q1 Q2 Q4 Q6 Q11 Q12 Q16
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
56. Summary – Cluster #2
Stado Performance Comparison
(Stado Cluster #2)
1.40
1.20
1.00
(compared to standalone)
Execution Time
0.80 Standalone
1-disk
2-disks
0.60 4-disks
0.40
0.20
0.00
Q1 Q2 Q4 Q6 Q11 Q12 Q16
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
57. Q1
• Simple aggregation with conditional scan on a
partitioned table, “lineitem”.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q1 Q1
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
58. Q2
• Joining partitioned talbes, “partsupp” and “partsupp”,
with a partition key, and dimension tables.
– But, the 1-node and 8-node AWS clusters shown some strange
behaviors.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
3.00 1.20
2.50 1.00
(compared to standalone)
(compared to standalone)
2.00 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
1.50 2-nodes 0.60
2-disks
4-nodes
4-disks
1.00 8-nodes 0.40
0.50 0.20
0.00 0.00
Q2 Q2
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
59. Q4
• Joining two partitioned tables, “orders” and “lineitem”,
with a partition key “orderkey”.
– Q4 failed to complete on the AWS cluster.
– Q4 was much faster even with 1-disk on the single server.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.00 1.20
0.90
1.00
0.80
(compared to standalone)
(compared to standalone)
0.70
Standalone 0.80
Execution Time
Execution Time
0.60 Standalone
1-node
1-disk
0.50 2-nodes 0.60
2-disks
0.40 4-nodes
4-disks
8-nodes 0.40
0.30
0.20
0.20
0.10
0.00 0.00
Q4 Q4
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
60. Q6
• Simple conditional scan on a partitioned table,“lineitem”.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q6 Q6
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
61. Q11
• Joining a partitioned table and dimension tables,
“partsupp”, “supplier” and “nation”, with aggregation and
sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q11 Q11
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
62. Q12
• Joining two partitioned tables, “orders” and “lineitem”,
with aggregation and sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q12 Q12
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
63. Q16
• Joining another two partitioned tables, “part” and
“partsupp”, with aggregation and sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.40 1.20
1.20 1.00
(compared to standalone)
(compared to standalone)
1.00
Standalone 0.80
Execution Time
Execution Time
Standalone
0.80 1-node
1-disk
2-nodes 0.60
2-disks
0.60 4-nodes
4-disks
8-nodes 0.40
0.40
0.20 0.20
0.00 0.00
Q16 Q16
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
64. その他
• SQLの互換性はほぼ問題なし
– DBT-3のクエリ22本はほとんどそのままの形で実行可能
– 一部、明示的な型キャストや優先度の指定が必要(parse errorなどが発生)
• statement_timeout相当の処理がStadoには無いので注意
– SET statement_timeout コマンドを送れない
– Row Shipping中(COPYコマンド中)にはpostgresql.confの
statement_timeoutも効かなかった(調査中)
• EXPLAINは、Row Shippingの発生および量を明確には教えてくれない
ので、頭の中で理解しておく
– “Even though, EXPLAIN is still your friend.”
• 今後の課題
– ログ出力の強化、改善
– 管理コマンドの強化(パーティションの再配置等)
– ドキュメントの強化
Copyright 2012 Uptime Technologies, LLC. All rights reserved.