SlideShare uma empresa Scribd logo
1 de 73
Presto in Treasure Data
(Updated)
Mitsunori Komatsu
• Mitsunori Komatsu, 

Software engineer @ Treasure Data.
• Presto, Hive, PlazmaDB, td-android-sdk, 

td-ios-sdk, Mobile SDK backend,

embedded-sdk
• github:komamitsu,

msgpack-java committer, Presto contributor,
etc…
About me
Today’s talk
• What's Presto?
• Pros & Cons
• Architecture
• Recent updates
• Who uses Presto?
• How do we use Presto?
What’s Presto?
Fast
• Distributed SQL query engine (MPP)
• Low latency and good performance
• No disk IO
• Pipelined execution (not Map Reduce)
• Compile a query plan down to byte code
• Off heap memory
• Suitable for ad-hoc query
Pluggable
• Pluggable backends (“connectors”)
• Cassandra / Hive / JMX / Kafka / MySQL /
PostgreSQL / System / TPCH
• We can add a new connector by 

extending SPI
• Treasure Data has been developed a
connector to access our storage
What kind of SQL
• Supports ANSI SQL (Not HiveQL)
• Easy to use Presto compared to HiveQL
• Structural type: Map, Array, JSON, Row
• Window functions
• Approximate queries
• http://blinkdb.org/
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Architectural overview
https://prestodb.io/overview.html
With Hive connector
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 1
Stage 2
Stage 0
Query, stage, task and split
Query
Task 0.0
Split
Task 1.0
Split
Task 1.1 Task 1.2
Split Split Split
Task 2.0
Split
Task 2.1 Task 2.2
Split Split Split Split Split Split Split
Split
For example…
TableScan
(FROM)
Aggregation
(GROUP BY)
Output
@worker#2 @worker#3 @worker#0
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Stage 0
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Multi connectors
Presto MySQL
- test.users
PostgreSQL
- public.accesses
- public.codes
Raptor
- default.

user_codes
create table raptor.default.user_codes
as
select c.text, u.name, count(1) as count
from postgres.public.accesses a
join mysql.test.users u
on cast(a.user as bigint) = u.id
join postgres.public.codes c
on a.code = c.code
where a.time < 1200000000
connector.name=mysql
connection-url=jdbc:mysql://127.0.0.1:3306
connection-user=root
- etc/catalog/mysql.properties
connector.name=postgresql
connection-url=jdbc:postgresql://127.0.0.1:5432/postgres
connection-user=komamitsu
- etc/catalog/postgres.properties
connector.name=raptor
metadata.db.type=h2
metadata.db.filename=var/data/db/MetaStore
- etc/catalog/postgres.properties
Multi connectors
:
2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir.
4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT `id`, `name` FROM `test`.`users`
:
2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir.
5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "text", "code" FROM “public"."codes"
:
2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir.
3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE
(("time" < 1200000000))
:
- log message
Condition pushdown
Join pushdown isn’t supported
Multi connectors
Recent updates (0.108~0.117)
New functions:
normalize(), from_iso8601_timestamp(),
from_iso8601_date(), to_iso8601()
slice(), md5(), array_min(), array_max(), histogram()
element_at(), url_encode(), url_decode()
sha1(), sha256(), sha512()
multimap_agg(), checksum()
Teradata compatibility functions:
index(), char2hexint(), to_char(),
to_date(), to_timestamp()
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
cong.
Queries are failed after the specied duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
cong.
Queries are failed after the specied duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Who uses Presto?
• Facebook
http://www.slideshare.net/dain1/presto-meetup-2015
• Dropbox
Who uses Presto?
• Airbnb
Who uses Presto?
• Qubole
• SaaS
• Treasure Data
• SaaS
• Teradata
• commercial support
Who uses Presto?
As a service…
Today’s talk
• What's Presto?
• How do we use Presto?
• What’s Treasure Data?
• System architecture
• How we manage Presto
What’s Treasure Data?
What’s Treasure Data?
and more…
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Founded in 2011 in the U.S.
Over 85 employees
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Integrations?
Treasure Data in gures
Some of our customers
System architecture
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
result bucket
(S3/RiakCS) Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar le format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
Schema on read
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
User added a new
column “user_id” in
imported data
User can select
this column with
only adding it to
the schema
(w/o reconstruct
the table)
Schema on read
Columnar le format
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
time
code
method
user_id
Columnar le
format
This query accesses
only code column


select code,
count(1) from tbl
group by code
td-presto connector
• MessagePack v07
• off heap
• avoiding “TypeProfile”
• Async IO with Jetty-client
• Scheduling & Resource
management
How we manage Presto
• Blue-Green Deployment
• Stress test tool
• Monitoring with DataDog
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rc
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rcTest Test Test!
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production!Release stable cluster
No downtime
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Monitoring with DataDog
Presto coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
process
td-agent
in_presto_metrics/v1/jmx/mbean
/v1/query
/v1/node
out_metricsense
DataDog
Monitoring with DataDog
Query stalled time
- Most important for us.
- It triggers alert calls to us…
- It can be mainly increased by td-presto
connector problems. Most of them are race
condition issue.
How many queries processed
More than 20000 queries / day
How many queries processed
Most queries nish within 1min
Analytics Infrastructure, Simplied in the Cloud
We’re hiring! https://jobs.lever.co/treasure-data

Mais conteĂşdo relacionado

Mais procurados

Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~HommasSlide
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
10分で分かるデータストレージ
10分で分かるデータストレージ10分で分かるデータストレージ
10分で分かるデータストレージTakashi Hoshino
 
Apache Arrow
Apache ArrowApache Arrow
Apache ArrowKouhei Sutou
 
MySQLを割と一人で300台管理する技術
MySQLを割と一人で300台管理する技術MySQLを割と一人で300台管理する技術
MySQLを割と一人で300台管理する技術yoku0825
 
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)NTT DATA Technology & Innovation
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)Colin Charles
 
HA環境構築のベスト・プラクティス
HA環境構築のベスト・プラクティスHA環境構築のベスト・プラクティス
HA環境構築のベスト・プラクティスEnterpriseDB
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the DataHao Chen
 
Laravelで作成したアプリ紹介
Laravelで作成したアプリ紹介Laravelで作成したアプリ紹介
Laravelで作成したアプリ紹介伸幸 茂木
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François GagnÊ
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
ZFSでストレージ
ZFSでストレージZFSでストレージ
ZFSでストレージ悟 宮崎
 
Githubを使って簡単に helm repoを公開してみよう
Githubを使って簡単に helm repoを公開してみようGithubを使って簡単に helm repoを公開してみよう
Githubを使って簡単に helm repoを公開してみようShingo Omura
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage InternalsDataWorks Summit
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receiveMatthew Ahrens
 
「Oracle Database + Java + Linux」 環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1
「Oracle Database + Java + Linux」環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1「Oracle Database + Java + Linux」環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1
「Oracle Database + Java + Linux」 環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1Shogo Wakayama
 

Mais procurados (20)

Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
10分で分かるデータストレージ
10分で分かるデータストレージ10分で分かるデータストレージ
10分で分かるデータストレージ
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
MySQLを割と一人で300台管理する技術
MySQLを割と一人で300台管理する技術MySQLを割と一人で300台管理する技術
MySQLを割と一人で300台管理する技術
 
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
 
HA環境構築のベスト・プラクティス
HA環境構築のベスト・プラクティスHA環境構築のベスト・プラクティス
HA環境構築のベスト・プラクティス
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
Laravelで作成したアプリ紹介
Laravelで作成したアプリ紹介Laravelで作成したアプリ紹介
Laravelで作成したアプリ紹介
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
ZFSでストレージ
ZFSでストレージZFSでストレージ
ZFSでストレージ
 
Githubを使って簡単に helm repoを公開してみよう
Githubを使って簡単に helm repoを公開してみようGithubを使って簡単に helm repoを公開してみよう
Githubを使って簡単に helm repoを公開してみよう
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receive
 
「Oracle Database + Java + Linux」 環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1
「Oracle Database + Java + Linux」環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1「Oracle Database + Java + Linux」環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1
「Oracle Database + Java + Linux」 環境における性能問題の調査手法 ~ミッションクリティカルシステムの現場から~ Part.1
 

Destaque

Tuning Tips
Tuning TipsTuning Tips
Tuning TipsJun Shimizu
 
SQLチューニング勉強会資料
SQLチューニング勉強会資料SQLチューニング勉強会資料
SQLチューニング勉強会資料Shinnosuke Akita
 
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~Shinnosuke Akita
 
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...Insight Technology, Inc.
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya MoritaInsight Technology, Inc.
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜Michitoshi Yoshida
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦CO-Sol for Community
 
監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜Michitoshi Yoshida
 
35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだことShinnosuke Akita
 
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦 おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦 CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2Ryota Watabe
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Ryota Watabe
 
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜Michitoshi Yoshida
 
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違いバックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違いRyota Watabe
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 

Destaque (20)

Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)
 
Tuning Tips
Tuning TipsTuning Tips
Tuning Tips
 
SQLチューニング勉強会資料
SQLチューニング勉強会資料SQLチューニング勉強会資料
SQLチューニング勉強会資料
 
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
 
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
 
監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜
 
35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと
 
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦 おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
 
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
 
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違いバックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 

Semelhante a Presto in Treasure Data (presented at db tech showcase Sapporo 2015)

Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure DataMitsunori Komatsu
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaOCoderFest
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applicationsPaweł Żurowski
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Fastly
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analyticsdatablend
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wyciekówKonrad Kokosa
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure TechnologiesKoray Kocabas
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Solving the n + 1 query problem
Solving the n + 1 query problemSolving the n + 1 query problem
Solving the n + 1 query problemSebastien Pelletier
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 

Semelhante a Presto in Treasure Data (presented at db tech showcase Sapporo 2015) (20)

Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applications
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analytics
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Solving the n + 1 query problem
Solving the n + 1 query problemSolving the n + 1 query problem
Solving the n + 1 query problem
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 

Último

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Último (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Presto in Treasure Data (presented at db tech showcase Sapporo 2015)

  • 1. Presto in Treasure Data (Updated) Mitsunori Komatsu
  • 2. • Mitsunori Komatsu, 
 Software engineer @ Treasure Data. • Presto, Hive, PlazmaDB, td-android-sdk, 
 td-ios-sdk, Mobile SDK backend,
 embedded-sdk • github:komamitsu,
 msgpack-java committer, Presto contributor, etc… About me
  • 3. Today’s talk • What's Presto? • Pros & Cons • Architecture • Recent updates • Who uses Presto? • How do we use Presto?
  • 5. Fast • Distributed SQL query engine (MPP) • Low latency and good performance • No disk IO • Pipelined execution (not Map Reduce) • Compile a query plan down to byte code • Off heap memory • Suitable for ad-hoc query
  • 6. Pluggable • Pluggable backends (“connectors”) • Cassandra / Hive / JMX / Kafka / MySQL / PostgreSQL / System / TPCH • We can add a new connector by 
 extending SPI • Treasure Data has been developed a connector to access our storage
  • 7. What kind of SQL • Supports ANSI SQL (Not HiveQL) • Easy to use Presto compared to HiveQL • Structural type: Map, Array, JSON, Row • Window functions • Approximate queries • http://blinkdb.org/
  • 8. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres
  • 9. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 11. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 12. Stage 1 Stage 2 Stage 0 Query, stage, task and split Query Task 0.0 Split Task 1.0 Split Task 1.1 Task 1.2 Split Split Split Task 2.0 Split Task 2.1 Task 2.2 Split Split Split Split Split Split Split Split For example… TableScan (FROM) Aggregation (GROUP BY) Output @worker#2 @worker#3 @worker#0
  • 13. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 14. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3
  • 15. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2
  • 16. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1
  • 17. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1 Stage 0
  • 18. What each component does? Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata?
  • 19. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 20. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 21. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 22. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 23. Multi connectors Presto MySQL - test.users PostgreSQL - public.accesses - public.codes Raptor - default.
 user_codes create table raptor.default.user_codes as select c.text, u.name, count(1) as count from postgres.public.accesses a join mysql.test.users u on cast(a.user as bigint) = u.id join postgres.public.codes c on a.code = c.code where a.time < 1200000000
  • 25. : 2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir. 4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT `id`, `name` FROM `test`.`users` : 2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir. 5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "text", "code" FROM “public"."codes" : 2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir. 3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE (("time" < 1200000000)) : - log message Condition pushdown Join pushdown isn’t supported Multi connectors
  • 26. Recent updates (0.108~0.117) New functions: normalize(), from_iso8601_timestamp(), from_iso8601_date(), to_iso8601() slice(), md5(), array_min(), array_max(), histogram() element_at(), url_encode(), url_decode() sha1(), sha256(), sha512() multimap_agg(), checksum() Teradata compatibility functions: index(), char2hexint(), to_char(), to_date(), to_timestamp()
  • 27. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time cong. Queries are failed after the specied duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now.
  • 28. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time cong. Queries are failed after the specied duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now. https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 29. Who uses Presto? • Facebook http://www.slideshare.net/dain1/presto-meetup-2015
  • 32. • Qubole • SaaS • Treasure Data • SaaS • Teradata • commercial support Who uses Presto? As a service…
  • 33. Today’s talk • What's Presto? • How do we use Presto? • What’s Treasure Data? • System architecture • How we manage Presto
  • 35. What’s Treasure Data? and more… • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage Founded in 2011 in the U.S. Over 85 employees
  • 36. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 37. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 38. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 39. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 40. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 41. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 42. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 43. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 44. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 45. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 47. Treasure Data in gures
  • 48. Some of our customers
  • 50. Components in Treasure Data worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker result bucket (S3/RiakCS) Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector
  • 51. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 52. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 53. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 54. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 55. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 56. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 57. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar le format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 58. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read
  • 59. Columnar le format time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table time code method user_id Columnar le format This query accesses only code column 
 select code, count(1) from tbl group by code
  • 60. td-presto connector • MessagePack v07 • off heap • avoiding “TypeProle” • Async IO with Jetty-client • Scheduling & Resource management
  • 61. How we manage Presto • Blue-Green Deployment • Stress test tool • Monitoring with DataDog
  • 62. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rc
  • 63. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rcTest Test Test!
  • 64. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production!Release stable cluster No downtime
  • 65. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 66. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 67. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 68. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 69. Monitoring with DataDog Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto process td-agent in_presto_metrics/v1/jmx/mbean /v1/query /v1/node out_metricsense DataDog
  • 70. Monitoring with DataDog Query stalled time - Most important for us. - It triggers alert calls to us… - It can be mainly increased by td-presto connector problems. Most of them are race condition issue.
  • 71. How many queries processed More than 20000 queries / day
  • 72. How many queries processed Most queries nish within 1min
  • 73. Analytics Infrastructure, Simplied in the Cloud We’re hiring! https://jobs.lever.co/treasure-data