SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Presto At LINE
Presto Conference Tokyo 2019
2019/07/11
Wataru Yukawa & Yuya Ebihara
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Data flow
Hadoop
RDBMS
Log
Hadoop
Ingest Data Report
Ranger
Tableau
OASIS
Yanagishima
LINE Analytics
Aquarium
OASIS
● Web-based data analysis platform
like Apache Zeppelin, Jupyter
● Basically use Spark but also
can use Presto
OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
https://www.slideshare.net/linecorp/oasis-data-analysis-platform-for-multitenant-hadoop-cluster
904 UU
67k PV
LINE Analytics
● Analysis tool similar to Google Analytics
○ Dashboard
○ Basic Summary
○ Realtime
○ Page Contents
○ Event Tracking
○ User Environment
○ Tools
● Backend is Presto
Why LINE's Front-end Development Team Built the Web Tracking System
https://www.slideshare.net/linecorp/why-lines-frontend-development-team-built-the-web-tracking-system
433 UU
7k PV
Aquarium
● Metadata catalog tool
○ Contacts
○ Note
○ Columns
○ Location
○ HDFS
○ Relationship
○ Reference
○ DDL
Efficient And Invincible Big Data Platform In LINE
https://www.slideshare.net/linecorp/efficient-and-invincible-big-data-platform-in-line/25
481 UU
10k PV
Yanagishima
1.3k UU
11k PV
● Web UI for
○ Presto
○ Hive
○ Spark SQL
○ Elasticsearch
● Started in 2015
● Many similar tools like Hue, Airpal, Shib
but I wanted to create in my mind
https://github.com/yanagishima/yanagishima
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Yanagishima features
● Share query with permanent link
● Handle multiple Presto clusters
● Input parameters
● Pretty print for json & map
● Chart
● Pivot table
● Show EXPLAIN result as Text and Graphviz
● Desktop notification
Input
parameters
Pretty
print
● Supported chart type
○ Line
○ Stacked Area
○ Full-Stacked Area
○ Column
○ Stacked Column
Chart
Pivot
table
Explain
● Text
● Graphviz
Desktop
notification
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Yanagishima Components
API server
○ Written in Java
○ Store query into sqlite3 or mysql
○ Store query result into filesystem
○ Don’t have authentication, use in-house auth system in proxy server
○ Don’t have authorization system, use Apache Ranger
SPA
○ Written in jQuery at first
○ Frontend engineers in our department replaced with Vue.js in 2017
○ Clean and modern code thanks to their major refactoring
How to process query
● Asynchronous processing flow
○ User submits query
○ Get query id
○ Track with query id by client side polling
○ User can know progress and kill query
● Easy to implement thanks to Presto REST API
● Not easy to implement in Hive and Spark due to lack of API
e.g., Not easy to get YARN application id
Dependency
● Depends on presto-cli not JDBC because of performance and feature
● Yanagishima wants not only query result but also column name in 1 Presto request
● DatabaseMetaData#getColumns is slow, more than 10s due to system.jdbc.columns table scan
● Presto didn’t support JDBC cancel in 2015 but now supports
● Chose to use presto-cli but it has de-merit
Compatibility issue
● Unfortunately, presto-cli >= 0.205 can’t connect old Presto server because of ROW type #224
→ Bundled new & old presto-cli without shade because package name is different
io.prestosql & com.facebook.presto
● May change to use JDBC because it’s better not to use presto-cli as PSF mentioned in the
above issue
Version Workers Auth
Analysis 315 76 -
Datachain 314 100 LDAP
Shonan 306 36 -
Dataopen 0.197 9 LDAP
Datalake2 0.188 200 LDAP
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Analysis Presto error query
SemanticErrorName is available since 313 #790
Syntactic Analysis
Semantic Analysis
Syntax Error
Semantic Error
Fail
Fail
Pass
Thank you for kind code review 🙏🏻
Classification of USER_ERROR
● Many syntax errors
● Typical semantic error is that user accesses to not existed column/schema
SYNTAX_ERROR
● mismatched input ... expecting
● Hive views are not supported
● ...
Semantic Error Name Count
null 743
MISSING_ATTRIBUTE 273
MISSING_SCHEMA 169
MUST_BE_AGGREGATE_OR_GROUP_BY 111
TYPE_MISMATCH 87
FUNCTION_NOT_FOUND 72
MISSING_TABLE 53
MISSING_CATALOG 23
INVALID_LITERAL 4
AMBIGUOUS_ATTRIBUTE 4
CANNOT_HAVE_AGGREGATIONS_WINDOWS_OR_GROUPING 3
INVALID_ORDINAL 2
NOT_SUPPORTED 2
ORDER_BY_MUST_BE_IN_SELECT 2
NESTED_AGGREGATION 1
WINDOW_REQUIRES_OVER 1
REFERENCE_TO_OUTPUT_ATTRIBUTE_WITHIN_ORDER_BY_AGGREGATION 1
Error Name Count
SYNTAX_ERROR 635
NOT_FOUND 47
HIVE_EXCEEDED_PARTITION_LIMIT 26
INVALID_FUNCTION_ARGUMENT 19
INVALID_CAST_ARGUMENT 6
PERMISSION_DENIED 3
SUBQUERY_MULTIPLE_ROWS 3
ADMINISTRATIVELY_KILLED 2
INVALID_SESSION_PROPERTY 1
DIVISION_BY_ZERO 1
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
● Execute query with Presto because of speed and rich UDF
○ User wants to execute query quickly and check data roughly
● Implement batch with Spark SQL in OASIS or with Hive in console
○ User wants to create stable batch
● Yanagishima is like Gist, OASIS is like GitHub
Typical use case in Yanagishima & OASIS
Why we don’t use Presto in batch
● Lack of Hive metastore impersonation
○ Support Impersonation in Metastore communication #43
● Less stable than Hive or Spark
○ Want to prioritize stability rather than latency in batch
○ Need to handle huge data
Impersonation
● Presto does not support impersonating the end user when accessing the Hive metastore
● SELECT query is no problem but CREATE/DROP/ALTER/INSERT/DELETE query can be problem
● If Presto process’s user is presto and yukawa creates table, presto accesses Hive metastore as
presto user, not yukawa. It means other user can drop table if presto user has write permission
● We don’t allow presto user to write in HDFS with Apache Ranger
● HMS impersonation is available in Starburst Distribution of Presto
● Support for impersonation will be a game changer
CREATE TABLE line.ad
WRITE permission
hadoop.proxyuser.presto.groups=*
hadoop.proxyuser.presto.hosts=*
DROP TABLE line.ad
Hive Metastorepresto user
Ranger
Hadoop
yukawa
ebihara
Less stable than Hive/Spark
● A single query/worker crash can be a bottleneck
● Auto restart worker mechanizm may be necessary
● Presto worker, data node, node manager are deployed in the same machine
● Enabling CGroups may be necessary because pyspark python process cpu usage is high, etc
● For example, yarn.nodemanager.resource.percentage-physical-cpu-limit : 70%
Hive/Spark batch is more stable but it’s not easy to convert from Presto to
Hive/Spark due to date function, syntax, …
Presto
Hive
Spark SQL
json_extract_scalar get_json_object
date_format(now(), '%Y%m%d') date_format(current_timestamp(), 'yyyyMMdd')
cross join unnest () as t () lateral view explode() t
url function like url_extract_parameter -
try -
Hard to convert query
Confusing Spark SQL error message
Spark can use “DISTINCT” as column name SPARK-27170
It’s difficult to understand error message
It will be improved in Spark 3.0 SPARK-27901
cannot resolve ‘`distinct`’ given input columns: [...]; line 1 pos 7;
‘GlocalLimit 100’
+- ‘LocalLimit 100
+- ‘Project [‘distinct, ...’]
+- Filter (...)
+- SubqueryAlias ...
+- HiveTableRelation ...
SELECT distinct
,a
,b
,c
FROM test_table LIMIT 100
Confusing...💨
DEBUG
Recent Issues
More than 100,000 partitions error occurred in 307 #619
● The fixed version was released within one day
Partition location does not exist in hive external table #620
● Upgrade Hadoop library 2.7.7 to 3.2.0 affected
● Ongoing https://issues.apache.org/jira/browse/HDFS-14466
Create table failed when using viewfs #10099
● It’s known issue and not fatal for now because we use Presto with read-only mode
Handle repeated predicate pushdown into Hive connector #984
● Performance regression, already fixed in 315
Great !
Schema mismatch of parquet file #9156
● Our old cluster faced this issue recently, already fixed in 0.203
Scale
Table # Partitions
Table A 1,588,031
Table B 1,429,047
Table C 1,429,046
Table D 1,116,130
Table E 772,725
● Daily queries: ~20K
● Daily processed data: 330TB
● Daily processed rows: 4 Trillion rows
● Partitions
hive.max-partitions-per-scan (default: 100,000)
Maximum number of partitions for a single table scan
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Already fixed in 308
More than 100,000 partitions error
More than 100,000 partitions error occurred in 307 #619
→ Query over table 'default.test_left' can potentially read more than 100000 partitions
at io.prestosql.plugin.hive.HiveMetadata.getPartitionsAsList(HiveMetadata.java:601)
at io.prestosql.plugin.hive.HiveMetadata.getTableLayouts(HiveMetadata.java:1645)
....
Steps to Reproduce
● Start hadoop-master docker image
$ presto-product-tests/conf/docker/singlenode/compose.sh up -d hadoop-master
$ presto-product-tests/conf/docker/singlenode/compose.sh up -d
● Create a table and populate rows
presto> CREATE TABLE test_part (col int, part_col int) with (partitioned_by = ARRAY['part_col']);
presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(1, 100)) AS u(id);
presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(101, 150)) AS u(id);
hive.max-partitions-per-scan=100 in product test
hive.max-partitions-per-writers=100 (default)
● Execute reproducible query (TestHivePartitionsTable.java)
presto> SELECT a.part_col FROM
(SELECT * FROM test_part WHERE part_col = 1) a, (SELECT * FROM test_part WHERE part_col = 1) b
WHERE a.col = b.col
Frames and Variables
● Migration to remove table layout was ongoing
● “TupleDomain” is one of the keywords about predicate pushdown
Fix
● Fixed EffectivePredicateExtractor.visitTableScan method
Actually, it’s the workaround until the migration completed
● Timeline
○ Created Issue April 11, 4PM
○ Merged commit April 12, 7AM
○ Released 308 April 12, 3PM
Released within one day 🎉
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Webhdfs partition location does not exist
Partitioned webhdfs table throws “Partition location does not exist” error #620
● Webhdfs isn’t supported (at least not tested) due to missing classes #957
→ Add missing jar to plugin directory
● Create table with webhdfs location on hive
hive> CREATE TABLE test_part_webhdfs (col1 int) PARTITIONED BY (dt int)
LOCATION 'webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs';
hive> INSERT INTO test_part_webhdfs PARTITION(dt=1) VALUES (1);
presto> SELECT * FROM test_part_webhdfs;
→ Partition location does not exist:
webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs/dt=1
Remote Debugger
● Edit jvm.config
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
● Remote debugger configuration in IntelliJ IDEA
Run→Edit Configurations...→+→Remote
Step into Hadoop library
● The argument calling Hadoop library are same
○ We can also step into dependent libraries as local codes
Different internal call
● Hadoop 2.7.7 (Presto 306)
http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=x
● Hadoop 3.2.0 (Presto 307)
http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt%253D1?op=LISTSTATUS&user.name=x
{
"RemoteException":{
"exception":"FileNotFoundException",
"javaClassName":"java.io.FileNotFoundException",
"message":"File /user/hive/warehouse/test_part/dt%3D1 does not exist."
}
}
HDFS-14466
FileSystem.listLocatedStatus for path including '=' encodes it and returns FileNotFoundException
Equals sign is doubled encoded
dt=1 → dt%3D1 → dt%253D1
HADOOP-16258.001.patch by Iwasaki-san
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Already fixed in 0.203
Schema mismatch of parquet
● Failed to access table created by Spark
presto> SELECT * FROM default.test_parquet WHERE dt='20190101'
Error opening Hive split hdfs://cluster/apps/hive/warehouse/test_parquet/dt=20190101/20190101.snappy.parquet
(offset=503316480, length=33554432):
Schema mismatch, metastore schema for row column col1.element has 13 fields but parquet schema has 12 fields
Issue column type is ARRAY<STRUCT<...>>
● Hive metastore returns 13 fields
● Parquet schema returns 12 fields
parquet-tools
● Supported options
○ cat
○ head
○ schema
○ meta
○ dump
○ merge
○ rowcount
○ size
https://github.com/apache/parquet-mr/tree/master/parquet-tools
● Inspect schema of parquet file
$ parquet-tools schema sample.parquet
message spark_schema {
optional group @metadata {
optional binary beat (UTF8);
optional binary topic (UTF8);
optional binary type (UTF8);
optional binary version (UTF8);
}
optional binary @timestamp (UTF8);
…
We can analyze even large file over GB within few seconds
Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions
Contributions
● General
○Retrieve semantic error name
○Fix partition pruning regression on 307
○COMMENT ON TABLE
○DROP COLUMN
○Column alias in CTAS
○Non-ascii date_format function argument
● Cassandra connector
○INSERT statement
○Auto-discover protocol version
○Materialized View
○Smallint, tinyint, date type
○Nested collection type
● Hive connector
○CREATE TABLE property
■ textfile_skip_header_line_count
■ textfile_skip_footer_line_count
● MySQL connector
○Map JSON to Presto JSON
● CLI
○Output format
■ JSON
■ CSV_UNQUOTED
■ CSV_HEADER_UNQUOTED
Thanks PSF, Starburst and ex-Teradata ✨
THANK YOU

Mais conteúdo relacionado

Mais procurados

AutoUpgrade and Best Practices
AutoUpgrade and Best PracticesAutoUpgrade and Best Practices
AutoUpgrade and Best PracticesJitendra Singh
 
PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報Masahiko Sawada
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~NTT DATA OSS Professional Services
 
並行処理初心者のためのAkka入門
並行処理初心者のためのAkka入門並行処理初心者のためのAkka入門
並行処理初心者のためのAkka入門Yoshimura Soichiro
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Spring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作るSpring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作るGo Miyasaka
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
新機能によるデータベースシステムの改善ポイント
新機能によるデータベースシステムの改善ポイント新機能によるデータベースシステムの改善ポイント
新機能によるデータベースシステムの改善ポイントオラクルエンジニア通信
 
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...NTT DATA Technology & Innovation
 
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of PrestoTaro L. Saito
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesBobby Curtis
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
KubernetesバックアップツールVeleroとちょっとした苦労話
KubernetesバックアップツールVeleroとちょっとした苦労話KubernetesバックアップツールVeleroとちょっとした苦労話
KubernetesバックアップツールVeleroとちょっとした苦労話imurata8203
 
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)NTT DATA Technology & Innovation
 
Dockerを支える技術
Dockerを支える技術Dockerを支える技術
Dockerを支える技術Etsuji Nakai
 
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)lestrrat
 
Catalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetCatalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetInfluxData
 

Mais procurados (20)

AutoUpgrade and Best Practices
AutoUpgrade and Best PracticesAutoUpgrade and Best Practices
AutoUpgrade and Best Practices
 
PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
並行処理初心者のためのAkka入門
並行処理初心者のためのAkka入門並行処理初心者のためのAkka入門
並行処理初心者のためのAkka入門
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Spring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作るSpring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作る
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
新機能によるデータベースシステムの改善ポイント
新機能によるデータベースシステムの改善ポイント新機能によるデータベースシステムの改善ポイント
新機能によるデータベースシステムの改善ポイント
 
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
 
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
KubernetesバックアップツールVeleroとちょっとした苦労話
KubernetesバックアップツールVeleroとちょっとした苦労話KubernetesバックアップツールVeleroとちょっとした苦労話
KubernetesバックアップツールVeleroとちょっとした苦労話
 
PostgreSQL: XID周回問題に潜む別の問題
PostgreSQL: XID周回問題に潜む別の問題PostgreSQL: XID周回問題に潜む別の問題
PostgreSQL: XID周回問題に潜む別の問題
 
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL初心者がパッチを提案してからコミットされるまで(第20回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
 
Dockerを支える技術
Dockerを支える技術Dockerを支える技術
Dockerを支える技術
 
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
 
Catalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetCatalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data Set
 

Semelhante a Presto conferencetokyo2019

Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersDatabricks
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring BootTrey Howard
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2datamantra
 
Apache Tajo on Swift
Apache Tajo on SwiftApache Tajo on Swift
Apache Tajo on SwiftJihoon Son
 
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on SwiftOpenStack Korea Community
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastUwe Korn
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Tracing and profiling my sql (percona live europe 2019) draft_1
Tracing and profiling my sql (percona live europe 2019) draft_1Tracing and profiling my sql (percona live europe 2019) draft_1
Tracing and profiling my sql (percona live europe 2019) draft_1Valerii Kravchuk
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit
 

Semelhante a Presto conferencetokyo2019 (20)

Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring Boot
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Apache Tajo on Swift
Apache Tajo on SwiftApache Tajo on Swift
Apache Tajo on Swift
 
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift
[OpenStack Day in Korea 2015] Track 2-6 - Apache Tajo on Swift
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Beyond Puppet
Beyond PuppetBeyond Puppet
Beyond Puppet
 
Tracing and profiling my sql (percona live europe 2019) draft_1
Tracing and profiling my sql (percona live europe 2019) draft_1Tracing and profiling my sql (percona live europe 2019) draft_1
Tracing and profiling my sql (percona live europe 2019) draft_1
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018
 

Mais de wyukawa

Strata2017 sg
Strata2017 sgStrata2017 sg
Strata2017 sgwyukawa
 
Azkaban-en
Azkaban-enAzkaban-en
Azkaban-enwyukawa
 
Upgrading from-hdp-21-to-hdp-25
Upgrading from-hdp-21-to-hdp-25Upgrading from-hdp-21-to-hdp-25
Upgrading from-hdp-21-to-hdp-25wyukawa
 
Promcon2016
Promcon2016Promcon2016
Promcon2016wyukawa
 
Prometheus london
Prometheus londonPrometheus london
Prometheus londonwyukawa
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2wyukawa
 
Prometheus casual talk1
Prometheus casual talk1Prometheus casual talk1
Prometheus casual talk1wyukawa
 
My ambariexperience
My ambariexperienceMy ambariexperience
My ambariexperiencewyukawa
 
Prometheus
PrometheusPrometheus
Prometheuswyukawa
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24wyukawa
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Hive sourcecodereading
Hive sourcecodereadingHive sourcecodereading
Hive sourcecodereadingwyukawa
 
Hdfs write
Hdfs writeHdfs write
Hdfs writewyukawa
 
Osc mercurial-public
Osc mercurial-publicOsc mercurial-public
Osc mercurial-publicwyukawa
 
Dvcs study
Dvcs studyDvcs study
Dvcs studywyukawa
 
Hudson study-zen
Hudson study-zenHudson study-zen
Hudson study-zenwyukawa
 
Shibuya.trac.8
Shibuya.trac.8Shibuya.trac.8
Shibuya.trac.8wyukawa
 
Hudson tanabata.trac
Hudson tanabata.tracHudson tanabata.trac
Hudson tanabata.tracwyukawa
 

Mais de wyukawa (19)

Strata2017 sg
Strata2017 sgStrata2017 sg
Strata2017 sg
 
Azkaban-en
Azkaban-enAzkaban-en
Azkaban-en
 
Azkaban
AzkabanAzkaban
Azkaban
 
Upgrading from-hdp-21-to-hdp-25
Upgrading from-hdp-21-to-hdp-25Upgrading from-hdp-21-to-hdp-25
Upgrading from-hdp-21-to-hdp-25
 
Promcon2016
Promcon2016Promcon2016
Promcon2016
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Prometheus casual talk1
Prometheus casual talk1Prometheus casual talk1
Prometheus casual talk1
 
My ambariexperience
My ambariexperienceMy ambariexperience
My ambariexperience
 
Prometheus
PrometheusPrometheus
Prometheus
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Hive sourcecodereading
Hive sourcecodereadingHive sourcecodereading
Hive sourcecodereading
 
Hdfs write
Hdfs writeHdfs write
Hdfs write
 
Osc mercurial-public
Osc mercurial-publicOsc mercurial-public
Osc mercurial-public
 
Dvcs study
Dvcs studyDvcs study
Dvcs study
 
Hudson study-zen
Hudson study-zenHudson study-zen
Hudson study-zen
 
Shibuya.trac.8
Shibuya.trac.8Shibuya.trac.8
Shibuya.trac.8
 
Hudson tanabata.trac
Hudson tanabata.tracHudson tanabata.trac
Hudson tanabata.trac
 

Último

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Presto conferencetokyo2019

  • 1. Presto At LINE Presto Conference Tokyo 2019 2019/07/11 Wataru Yukawa & Yuya Ebihara
  • 2. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 3. Data flow Hadoop RDBMS Log Hadoop Ingest Data Report Ranger Tableau OASIS Yanagishima LINE Analytics Aquarium
  • 4. OASIS ● Web-based data analysis platform like Apache Zeppelin, Jupyter ● Basically use Spark but also can use Presto OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster https://www.slideshare.net/linecorp/oasis-data-analysis-platform-for-multitenant-hadoop-cluster 904 UU 67k PV
  • 5. LINE Analytics ● Analysis tool similar to Google Analytics ○ Dashboard ○ Basic Summary ○ Realtime ○ Page Contents ○ Event Tracking ○ User Environment ○ Tools ● Backend is Presto Why LINE's Front-end Development Team Built the Web Tracking System https://www.slideshare.net/linecorp/why-lines-frontend-development-team-built-the-web-tracking-system 433 UU 7k PV
  • 6. Aquarium ● Metadata catalog tool ○ Contacts ○ Note ○ Columns ○ Location ○ HDFS ○ Relationship ○ Reference ○ DDL Efficient And Invincible Big Data Platform In LINE https://www.slideshare.net/linecorp/efficient-and-invincible-big-data-platform-in-line/25 481 UU 10k PV
  • 7. Yanagishima 1.3k UU 11k PV ● Web UI for ○ Presto ○ Hive ○ Spark SQL ○ Elasticsearch ● Started in 2015 ● Many similar tools like Hue, Airpal, Shib but I wanted to create in my mind https://github.com/yanagishima/yanagishima
  • 8. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 9. Yanagishima features ● Share query with permanent link ● Handle multiple Presto clusters ● Input parameters ● Pretty print for json & map ● Chart ● Pivot table ● Show EXPLAIN result as Text and Graphviz ● Desktop notification
  • 12. ● Supported chart type ○ Line ○ Stacked Area ○ Full-Stacked Area ○ Column ○ Stacked Column Chart
  • 16. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 17. Yanagishima Components API server ○ Written in Java ○ Store query into sqlite3 or mysql ○ Store query result into filesystem ○ Don’t have authentication, use in-house auth system in proxy server ○ Don’t have authorization system, use Apache Ranger SPA ○ Written in jQuery at first ○ Frontend engineers in our department replaced with Vue.js in 2017 ○ Clean and modern code thanks to their major refactoring
  • 18. How to process query ● Asynchronous processing flow ○ User submits query ○ Get query id ○ Track with query id by client side polling ○ User can know progress and kill query ● Easy to implement thanks to Presto REST API ● Not easy to implement in Hive and Spark due to lack of API e.g., Not easy to get YARN application id
  • 19. Dependency ● Depends on presto-cli not JDBC because of performance and feature ● Yanagishima wants not only query result but also column name in 1 Presto request ● DatabaseMetaData#getColumns is slow, more than 10s due to system.jdbc.columns table scan ● Presto didn’t support JDBC cancel in 2015 but now supports ● Chose to use presto-cli but it has de-merit
  • 20. Compatibility issue ● Unfortunately, presto-cli >= 0.205 can’t connect old Presto server because of ROW type #224 → Bundled new & old presto-cli without shade because package name is different io.prestosql & com.facebook.presto ● May change to use JDBC because it’s better not to use presto-cli as PSF mentioned in the above issue Version Workers Auth Analysis 315 76 - Datachain 314 100 LDAP Shonan 306 36 - Dataopen 0.197 9 LDAP Datalake2 0.188 200 LDAP
  • 21. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 22. Analysis Presto error query SemanticErrorName is available since 313 #790 Syntactic Analysis Semantic Analysis Syntax Error Semantic Error Fail Fail Pass Thank you for kind code review 🙏🏻
  • 23. Classification of USER_ERROR ● Many syntax errors ● Typical semantic error is that user accesses to not existed column/schema SYNTAX_ERROR ● mismatched input ... expecting ● Hive views are not supported ● ... Semantic Error Name Count null 743 MISSING_ATTRIBUTE 273 MISSING_SCHEMA 169 MUST_BE_AGGREGATE_OR_GROUP_BY 111 TYPE_MISMATCH 87 FUNCTION_NOT_FOUND 72 MISSING_TABLE 53 MISSING_CATALOG 23 INVALID_LITERAL 4 AMBIGUOUS_ATTRIBUTE 4 CANNOT_HAVE_AGGREGATIONS_WINDOWS_OR_GROUPING 3 INVALID_ORDINAL 2 NOT_SUPPORTED 2 ORDER_BY_MUST_BE_IN_SELECT 2 NESTED_AGGREGATION 1 WINDOW_REQUIRES_OVER 1 REFERENCE_TO_OUTPUT_ATTRIBUTE_WITHIN_ORDER_BY_AGGREGATION 1 Error Name Count SYNTAX_ERROR 635 NOT_FOUND 47 HIVE_EXCEEDED_PARTITION_LIMIT 26 INVALID_FUNCTION_ARGUMENT 19 INVALID_CAST_ARGUMENT 6 PERMISSION_DENIED 3 SUBQUERY_MULTIPLE_ROWS 3 ADMINISTRATIVELY_KILLED 2 INVALID_SESSION_PROPERTY 1 DIVISION_BY_ZERO 1
  • 24. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 25. ● Execute query with Presto because of speed and rich UDF ○ User wants to execute query quickly and check data roughly ● Implement batch with Spark SQL in OASIS or with Hive in console ○ User wants to create stable batch ● Yanagishima is like Gist, OASIS is like GitHub Typical use case in Yanagishima & OASIS
  • 26. Why we don’t use Presto in batch ● Lack of Hive metastore impersonation ○ Support Impersonation in Metastore communication #43 ● Less stable than Hive or Spark ○ Want to prioritize stability rather than latency in batch ○ Need to handle huge data
  • 27. Impersonation ● Presto does not support impersonating the end user when accessing the Hive metastore ● SELECT query is no problem but CREATE/DROP/ALTER/INSERT/DELETE query can be problem ● If Presto process’s user is presto and yukawa creates table, presto accesses Hive metastore as presto user, not yukawa. It means other user can drop table if presto user has write permission ● We don’t allow presto user to write in HDFS with Apache Ranger ● HMS impersonation is available in Starburst Distribution of Presto ● Support for impersonation will be a game changer CREATE TABLE line.ad WRITE permission hadoop.proxyuser.presto.groups=* hadoop.proxyuser.presto.hosts=* DROP TABLE line.ad Hive Metastorepresto user Ranger Hadoop yukawa ebihara
  • 28. Less stable than Hive/Spark ● A single query/worker crash can be a bottleneck ● Auto restart worker mechanizm may be necessary ● Presto worker, data node, node manager are deployed in the same machine ● Enabling CGroups may be necessary because pyspark python process cpu usage is high, etc ● For example, yarn.nodemanager.resource.percentage-physical-cpu-limit : 70% Hive/Spark batch is more stable but it’s not easy to convert from Presto to Hive/Spark due to date function, syntax, …
  • 29. Presto Hive Spark SQL json_extract_scalar get_json_object date_format(now(), '%Y%m%d') date_format(current_timestamp(), 'yyyyMMdd') cross join unnest () as t () lateral view explode() t url function like url_extract_parameter - try - Hard to convert query
  • 30. Confusing Spark SQL error message Spark can use “DISTINCT” as column name SPARK-27170 It’s difficult to understand error message It will be improved in Spark 3.0 SPARK-27901 cannot resolve ‘`distinct`’ given input columns: [...]; line 1 pos 7; ‘GlocalLimit 100’ +- ‘LocalLimit 100 +- ‘Project [‘distinct, ...’] +- Filter (...) +- SubqueryAlias ... +- HiveTableRelation ... SELECT distinct ,a ,b ,c FROM test_table LIMIT 100 Confusing...💨
  • 31. DEBUG
  • 32. Recent Issues More than 100,000 partitions error occurred in 307 #619 ● The fixed version was released within one day Partition location does not exist in hive external table #620 ● Upgrade Hadoop library 2.7.7 to 3.2.0 affected ● Ongoing https://issues.apache.org/jira/browse/HDFS-14466 Create table failed when using viewfs #10099 ● It’s known issue and not fatal for now because we use Presto with read-only mode Handle repeated predicate pushdown into Hive connector #984 ● Performance regression, already fixed in 315 Great ! Schema mismatch of parquet file #9156 ● Our old cluster faced this issue recently, already fixed in 0.203
  • 33. Scale Table # Partitions Table A 1,588,031 Table B 1,429,047 Table C 1,429,046 Table D 1,116,130 Table E 772,725 ● Daily queries: ~20K ● Daily processed data: 330TB ● Daily processed rows: 4 Trillion rows ● Partitions hive.max-partitions-per-scan (default: 100,000) Maximum number of partitions for a single table scan
  • 34. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 35. Already fixed in 308 More than 100,000 partitions error More than 100,000 partitions error occurred in 307 #619 → Query over table 'default.test_left' can potentially read more than 100000 partitions at io.prestosql.plugin.hive.HiveMetadata.getPartitionsAsList(HiveMetadata.java:601) at io.prestosql.plugin.hive.HiveMetadata.getTableLayouts(HiveMetadata.java:1645) ....
  • 36. Steps to Reproduce ● Start hadoop-master docker image $ presto-product-tests/conf/docker/singlenode/compose.sh up -d hadoop-master $ presto-product-tests/conf/docker/singlenode/compose.sh up -d ● Create a table and populate rows presto> CREATE TABLE test_part (col int, part_col int) with (partitioned_by = ARRAY['part_col']); presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(1, 100)) AS u(id); presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(101, 150)) AS u(id); hive.max-partitions-per-scan=100 in product test hive.max-partitions-per-writers=100 (default) ● Execute reproducible query (TestHivePartitionsTable.java) presto> SELECT a.part_col FROM (SELECT * FROM test_part WHERE part_col = 1) a, (SELECT * FROM test_part WHERE part_col = 1) b WHERE a.col = b.col
  • 37. Frames and Variables ● Migration to remove table layout was ongoing ● “TupleDomain” is one of the keywords about predicate pushdown
  • 38. Fix ● Fixed EffectivePredicateExtractor.visitTableScan method Actually, it’s the workaround until the migration completed ● Timeline ○ Created Issue April 11, 4PM ○ Merged commit April 12, 7AM ○ Released 308 April 12, 3PM Released within one day 🎉
  • 39. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 40. Webhdfs partition location does not exist Partitioned webhdfs table throws “Partition location does not exist” error #620 ● Webhdfs isn’t supported (at least not tested) due to missing classes #957 → Add missing jar to plugin directory ● Create table with webhdfs location on hive hive> CREATE TABLE test_part_webhdfs (col1 int) PARTITIONED BY (dt int) LOCATION 'webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs'; hive> INSERT INTO test_part_webhdfs PARTITION(dt=1) VALUES (1); presto> SELECT * FROM test_part_webhdfs; → Partition location does not exist: webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs/dt=1
  • 41. Remote Debugger ● Edit jvm.config -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 ● Remote debugger configuration in IntelliJ IDEA Run→Edit Configurations...→+→Remote
  • 42. Step into Hadoop library ● The argument calling Hadoop library are same ○ We can also step into dependent libraries as local codes Different internal call ● Hadoop 2.7.7 (Presto 306) http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=x ● Hadoop 3.2.0 (Presto 307) http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt%253D1?op=LISTSTATUS&user.name=x { "RemoteException":{ "exception":"FileNotFoundException", "javaClassName":"java.io.FileNotFoundException", "message":"File /user/hive/warehouse/test_part/dt%3D1 does not exist." } }
  • 43. HDFS-14466 FileSystem.listLocatedStatus for path including '=' encodes it and returns FileNotFoundException Equals sign is doubled encoded dt=1 → dt%3D1 → dt%253D1 HADOOP-16258.001.patch by Iwasaki-san
  • 44. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 45. Already fixed in 0.203 Schema mismatch of parquet ● Failed to access table created by Spark presto> SELECT * FROM default.test_parquet WHERE dt='20190101' Error opening Hive split hdfs://cluster/apps/hive/warehouse/test_parquet/dt=20190101/20190101.snappy.parquet (offset=503316480, length=33554432): Schema mismatch, metastore schema for row column col1.element has 13 fields but parquet schema has 12 fields Issue column type is ARRAY<STRUCT<...>> ● Hive metastore returns 13 fields ● Parquet schema returns 12 fields
  • 46. parquet-tools ● Supported options ○ cat ○ head ○ schema ○ meta ○ dump ○ merge ○ rowcount ○ size https://github.com/apache/parquet-mr/tree/master/parquet-tools ● Inspect schema of parquet file $ parquet-tools schema sample.parquet message spark_schema { optional group @metadata { optional binary beat (UTF8); optional binary topic (UTF8); optional binary type (UTF8); optional binary version (UTF8); } optional binary @timestamp (UTF8); … We can analyze even large file over GB within few seconds
  • 47. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  • 48. Contributions ● General ○Retrieve semantic error name ○Fix partition pruning regression on 307 ○COMMENT ON TABLE ○DROP COLUMN ○Column alias in CTAS ○Non-ascii date_format function argument ● Cassandra connector ○INSERT statement ○Auto-discover protocol version ○Materialized View ○Smallint, tinyint, date type ○Nested collection type ● Hive connector ○CREATE TABLE property ■ textfile_skip_header_line_count ■ textfile_skip_footer_line_count ● MySQL connector ○Map JSON to Presto JSON ● CLI ○Output format ■ JSON ■ CSV_UNQUOTED ■ CSV_HEADER_UNQUOTED Thanks PSF, Starburst and ex-Teradata ✨