SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Taro L. Saito, Ph.D.
Arm Treasure Data
June 29, 2019
Scala Matsuri 2019 - Tokyo
How To Use Scala At Work
Airframe In Action At Arm Treasure Data
1calaを仕事で使おう - Arm reasure DataでのAirframe活用事例

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
2
● Principal Software Engineer at Arm
Treasure Data
● Building distributed query engine service
● Living in US for 4 years
● DBMS & Data Science Background
● Ph.D. of Computer Science
● Database Systems and Genome
Sciences Research
● Assistant Professor at the University of
Tokyo
● OSS Projects Around Scala
● sbt-sonatype: used for releasing 3000+
Scala projects
● snappy-java: a compression library used
in Spark, Parquet, etc.
自己紹介

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
New Release from O’Reilly Japan
● Helped Japanese translation of Data-Intensive
Application Design
● Techniques and concepts around distributed data
processing systems
● Available at Amazon.co.jp and O’Reilly Japan web sites
● will be published on July 18, 2019
3
分散データシステム入門の決定版の翻訳が来月発売

400+
Customers
Founded in
2011
Raised
$54M
Security
Acquired by Arm / Softbank
2018
Arm Treasure Data
Arm reasure Dataの概要

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
The Architecture of Arm Treasure Data
5
DataLogs
Device
Data
Batch
Data
PlazmaDB
Table Schema
Data Collection Cloud Storage Distributed Data Processing
2 million records / sec. 130 trillion records 1 billion rows processed / sec.
Jobs
Job Management
SQL Editor
Scheduler
Workflows
Machine
Learning
Treasure Data OSS
Third Party OSS
reasure Dataのシステム構成。 calaはどこに?

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Module Mix-InPackaging
HTTP Requests and
Responses
Data
airframe-launcher
> _
airframe-log
production:
port: 10010
user: xxxx
...
airframe-config
airframe-codec
sbt-pack
airframe-fluentd
Scala
Objects
Table Data
(CSV, TSV)
JSON
airframe-jsonairframe-surface
airframe-tablet
airframe-jmx
Monitor Runtime States
Generate Mapping Codec
Metrics &
Log Data
JDBC
ResultSets
airframe-jdbc
airframe-http
airframe-http-finagle
Launch HTTP
Services
airframe DI
Debug Logs
Schema-On-Read
Mapping
Airframe
サービスの裏側で使われているAirframe ( cala製 ) のモジュール群

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Our OSS Strategy Around Scala
● Gather the best practices of Scala into Airframe OSS
● Get the real experiences by operating 24/7 services
7
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming OSS Outcome
Airframeを核にした cala周辺の 戦略

Airframe
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● Various internal and third-party Scala/Java libraries
● Managed in different repositories, different release cycles
● High-learning cost
■ The knowledge is confined to engineers’ brains
3 Years Ago...
8
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming Various Libraries Outcome
3年前、Airframeは存在せず、様々なライブラリが混在していた

logger
launcher
object mapper
JDBC reader
json4s jackson
….
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
5 Years Ago...
● No Scala engineer in the company
● Scala in 2014: Scala 2.9.x
● Was not good enough to use:
■ e.g., no string interpolation like s”... ${x}...”
9
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming Ruby, Java Outcome
5年前には calaのエンジニアも、 calaのコードもなかった

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Today’s Agenda
● How to introduce Scala to your company
● Learn the best practices of using Scala at work
● From 20 Airframe modules
10本日紹介する内容

Airframe
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
How Can We Introduce Scala?
● Saying “I want to use Scala”
● It will not work, especially if you or your team are not familiar with Scala
● Your managers need more information whether it’s good enough or not
● Even if you are a tech lead:
● Need some confidence in using Scala in production
● How can we establish such confidence in using Scala?
11calaをどう導入するか? calaを使っても良いという自信を得るには?

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Start With A Small Investment to Scala
● Guidelines
● Think how you can save your time with Scala
● If you can save 1 minute in a day, your can spend 6 hours for this improvement
■ Save 1 minute / day = 365 minutes / year = 6 hour investment
■ Save 10 minutes / week = 520 minutes / year = 8.6 hour investment
■ Save 1 hour / week = 52 hours / year = 2.2 day investment
● Time is your most valuable asset
● Save your time by using Scala
12「 calaを使って」時間を節約するための「小さな投資」をはじめよう

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● prestop (presto + top)
● Non production service code
● A handy query monitoring tool for Presto, written in Scala
● Display complex JSON data with fancy ANSI color
The First Scala Code in TD
13reasure Data最初の calaプログラム

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-log
● Scala 2.10: My small investment to test Scala Macros and String interpolation
● A Modern Logging Library for Scala (at Medium)
● ANSI color and source code location display
● Just add LogSupport trait to your class
14プログラムの開発をログメッセージで効率化する

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-launcher
● Needed to handle complex command line options and nested commands
● e.g., $ prestop -e production monitor (other options …)
● Enabled annotation-based command line definitions
15複雑なコマンドラインプログラムを簡単に作成できるようにする

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-config: Application Configuration Flow
● YAML config (embedded into Docker)
● Override credentials, then bind to config objects
YAML
development:
addr: api-dev.com
production:
addr: api.com
Config Object
case class ServerConfig(
addr: String,
port: Int = 8080,
password: String
)
production:
addr: api.com
command: -e production Credentials and Local
Configurations
Merge
Immutable
Object Default Parameters
(e.g., port = 8080)
Object
Mapping
16アプリケーション設定のフローをライブラリ化

airframe-launcher
> _
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
sbt-pack plugin
● A sbt plugin to create standalone Scala packages
● A single folder package with bin and lib folders containing all dependent JARs
● Generates command-line launcher scripts
● My small investment in 2012 to save packaging time
17sbt-packでプログラムをパッケージングし、Dockerイメージを手軽に作成

airframe-launcher
airframe-config
YAML config file
Standalone
Scala Package
sbt-pack Dockerfile
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Medium-SIze Investment: Find A Common Pattern
● Extract a common problem pattern and create a solution
● Data -> Object Mapping
● How many data readers and object mappers do we need?
● How can we save our time for handling such various data types?
YAML
JDBC
ResultSet
YAML Parser +
Object Mapper
Config
Object
Table
Object
Object-Relation
Mapper
JSON
JSON Parser +
Object Mapper
Object
18入力データを cala bjectにマッピングしたいケースは多い。中期的な投資が必要

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-msgpack: MessagePack as Universal Data Format
● MessagePack (msgpack.org)
● Compact JSON-like binary format
● Describes data types and data values at the same time (self-describing)
Object
Unpack
Pack
JDBC
ResultSet
Pack/Unpack
YAML
JSON
19essage ackを中間フォーマットとして使うと、オブジェクトマッパーの実装は1つに

MessagePack
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB: MessagePack DBMS
● Fluentd -> MessagePack -> Arm Treasure Data
● Automatically generating table schema from MessagePack data
● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc.
Table Schema
Int Column Reader
String Column Reader
Update
Schema
Generate
Reader Set
Table Reader
Schema-free Data
20
Data Collection Distributed Data Processing
Arm reasure Dataは essage ackベースの chema-on-readシステム

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Schema-On-Read Data Processing with MessagePack
● Users can store arbitrary typed data (No table design is required)
● Data can be read in a target type required by the application (e.g., SQL query)
Int
Float
Boolean
String
Array
Map
Binary
SQL BigInt
parseInt
toInt
0 or 1
IntCodec
Pack Unpack
Error or null
“100”
(string)
100
(int)
100
(int)
21
Logs
データ読み込み時に、アプリケーションの要求する型に合わせる ( chema-on- ead)

CSV
command-line
arguments
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-codec: Schema-On-Read Pack/Unpack Interface
● Apply schema-on-read for Scala objects
Input MessagePack Output
Pack Unpack
PackUnpack
22essage ackを通した chema-on-readデータ変換インターフェースを calaに適用

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Pre-defined Codecs in airframe-codec
● Primitive Codecs
● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec
● FloatCodec, DoubleCodec
● StringCodec
● BooleanCodec
● TimeStampCodec
● Collection Codec
● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc.
● OptionCodec
● JsonCodec (airframe-json)
● Java-specific Codec
● FileCodec, ZonedDateTimeCodec, JDBCResultSetCodec, etc.
● Adding Custom Codecs
● Implement MessageCodec[X] interface
23calaで必要なほぼ全てのデータ型へのマッピングをサポート

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
MessageCodec.of[A]: Combination of Codecs
Unpack
Pack
IntCodec
StringCodec
DoubleCodec
MessagePack
MessageCodec.of[A]
24オブジェクトの型に合わせてCodecを合成

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-surface
● Reading Type Signatures From ScalaSig
● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files
● Surface.of[A]
■ returns A’s parameter names and types
class A (data:List[B])
class A
data: List[java.lang.Object]
class A
data: List[java.lang.Object]
ScalaSig: data:List[B]
javac
scalac
Surface.of[A]
data: List[B]
scala.reflect.runtime.universe.TypeTag
Type erasure removes
generic type information
25オブジェクトの型情報を cala igから取得する

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
[WIP] Scala.js RPC
● Scala.js
● Compiling Scala code into JavaScript for Web Browsers
● airframe-codec: Passing model class data between Scala and Scala.js
UserInfo MessagePack UserInfo
Pack Unpack
PackUnpack
Scala
Server Side
Scala.js
Client Side
XML RPC
26airframe-codecは cala.js(ブラウザ側)とのデータ受け渡しにも使える

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
[WIP] airframe-sql
● Universal stream SQL engine
● Processing various types of data through MessagePack
MessagePack Stream SQL MessagePack
Query
Processing
Filter/Aggregation/Join, etc.
27任意のデータ形式に対し、 essage ackを通して で処理をする

JDBC
ResultSet
Pack
YAML
JSON
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28
Scala In Production
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
A Technical Debt In TD (2015-2016)
● Prestogres: PostgreSQL gateway to Presto
● Enabled using PostgreSQL JDBC/ODBC
drivers to access Presto
● So-called Sada (founder)’s magic
● Was good for the first use cases
● Many Problems:
● Hacks around pgpool-II was hard to
debug
● Hard to support customers upon errors
● Incompatible SQL with Presto
● Nobody could fix these issues
■ including the creator!
29restogresというハックが技術的負債になっていた

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Replacing Prestogres with Prestobase
30calaで restobaseのプロトタイプを作成. 3ヶ月後にサービスリリース

● Prototyped in Scala within a week after a quick chat with Sada
● Utilizing Airframe assets
● Deployed as a production service in 3 months
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-di
● Created a dependency injection library for Scala
● For Prestobase development
● Scala-friendly Syntax
● Useful for combining hundreds of modules
● based on airframe-surface, airframe-log
● See also:
● Airframe Meetup #1 Report (2018)
31restobaseの開発中に calaのためのAirframe DIが誕生

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Airframe OSS
● Lightweight Building Blocks for Scala
● Collection of our investments to Scala
● Repackaged into wvlet.airframe in 2016
● airframe-log
● airframe-launcher
● airframe-config
● airframe-surface
● airframe-di
● airframe-codec
● ...
● As of 2019, Airframe has 20 modules
● 35+ releases in 2018
● Already had 17+ releases in 2019
● Contributing to the Scala Community Build
● To test the latest Scala versions
322016年に各種ツールをAirframeとして統合。20のモジュール、頻繁なリリースサイクル

Airframe
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Monorepo
● Cross build
● For 3 + 1 Scala versions
■ 2.13, 2.12, 2.11, and Scala.js
● 20 modules
■ 4 x 20 = 80 artifacts!
● Challenge
● Publishing took 3 hours with
sbt-release
● Bottleneck
● Sequential run of compile -> test ->
publish for all artifacts
33Airframeはメンテナンスを集約するため単一レポジトリ構成

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Release Automation on Travis CI
● Single-Step Release
● Triggered by git tag
● Running Tasks In Parallel
● Run tests for each Scala version
● Update doc & release notes
■ Generate release notes
from git logs
● Publish
■ sbt-pgp & sbt-sonatype
○ GPG signature
○ Copy to Maven Central
● Finishes around 10~20 minutes
● Blog: 3 Tips For Maintaining
Scala Projects
34ravis CI上でリリースを全自動化し、頻繁なリリースを可能に

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
sbt-sonatype plugin
● A sbt-plugin for releasing projects to Maven Central
● open staging repository -> verify -> close -> promote -> drop
● A small investment
● At 2015 new year holiday => Payed off for saving Airframe release time
● 3000+ Scala projects are using sbt-sonatype
35sbt-sonatypeはお正月休みに作られたプロジェクト。多くの calaライブラリで使われている

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http
● Created a simple HTTP framework
● Based on Airframe modules:
■ airframe-surface
■ airframe-codec
■ airframe-msgpack
■ etc.
● Blog
● Building Low-Friction Web Service
Over Finagle
● Save the time for choosing a web
framework:
● Many frameworks exist:
● e.g, Finatra, Finch, akka-http, spring,
RESTeasy, open-api, swagger, etc.
36Airframe資産を活用して、Webフレームワークも手軽に作成

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http-client
● Error handling of HTTP requests is
difficult
● 4xx, 5xx status code
● Should we retry the request?
■ IOException, EOFException
■ TimeoutException
■ InterruptedException
■ SSLException
■ InvocationTargetException
● HTTP client
● request retries
● response mapping
■ JSON, MessagePack format
● airframe-codec
37間違いやすいH リクエストのエラーハンドリングをライブラリ化

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-control
● Everything can fail …
● Network disconnection
● Servere crash
● ...
● Retry
● Exponential backoff
■ 2x, 4x, ...
● Jittering
■ 1 sec., 2 * rand, 4 * rand, …
● Customize error type classifiers
● retryable failures
● non-retryable failures
38リトライ処理をパターン化

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http-recorder
● Testing against actual web services is time consuming
● Record & Replay HTTP responses
● Reproducible results
● Runnable on small machines (e.g., Travis CI)
39H リクエストをレコーディングして、Webサービスのテストを効率化する

HTTP
Request
HTTP
Recorder
Request
Real Web
Service
Recording Mode:
Response
HTTP
Request
HTTP
Recorder
Replay Mode:
Request
Response Recording
Responses
Request
Recorded
Responses
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 40
Data Analysis with Scala
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Data-Driven System Optimization
● TD is one of the biggest users of TD
● Query logs
● Collecting all Presto query logs since 2015
● Query statements, performance statistics, logs, etc.
● Logs are our valuable assets
● To understand user activities and enable data-driven optimizations
41
Logs
User
Query
Collect Query Logs
Analyze Query Logs
Machine
Learning
Query
Optimization
Optimize System
システムの最適化のためにログの収集、解析が重要

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-fluentd
● Collect Scala Application Logs To Fluentd
● Scala Objects -> MessagePack -> Fluentd
42essage ackを受け取るFluentdには、airframe-codeの出力を渡せる

Collect Query Logs
Analyze Query Logs
Machine
Learning
Query
Optimization
Optimize System
airframe-fluentd
Scala
Objects
airframe-codec
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-jmx
● Add @JMX annotation to your application metrics
● It’s also useful to check the application version, configurations, etc.
● JMX clients can check these metrics
● e.g., jconsole
43J Xで、JV の外側からアプリケーションの状態を確認し、メトリックを収集

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-metrics
● Human Readable Data Format (ElapsedTime, DataSize, etc.)
● Handy Time Window String Support
44時間幅、区間、データサイズを人間を扱いやすい形式にし、ログの解析を効率化

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Taking Snapshots of Data Analysis Tasks
● Save Long-Running Task Results As MessagePack (binary)
● Save the cost of re-computation
Result: Seq[A] MessagePack Storage
Pack
Save
Unpack
Task
Run
Load
Second Run:
Load
Compute
(e.g., 10 min)
First run
Snapshot
45Airframe資産を活用して、データ解析結果をキャッシュし作業を効率化する

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Module Mix-InPackaging
HTTP Requests and
Responses
Data
airframe-launcher
> _
airframe-log
production:
port: 10010
user: xxxx
...
airframe-config
airframe-codec
sbt-pack
airframe-fluentd
Scala
Objects
Table Data
(CSV, TSV)
JSON
airframe-jsonairframe-surface
airframe-tablet
airframe-jmx
Monitor Runtime States
Generate Mapping Codec
Metrics &
Log Data
JDBC
ResultSets
airframe-jdbc
airframe-http
airframe-http-finagle
Launch HTTP
Services
airframe DI
Debug Logs
Schema-On-Read
Mapping
Airframe
Airframeを中心にコード資産が形成されている

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Resolving Technical Debts with Airframe Upgrade
● Migrate common programming patterns into Airframe
● Upgrade Airframe Version
● YY.MM.patch versioning: 19.5.x, 19.6.x, …
■ Easy to see how behind the project is from the latest version.
● Reduce code and logic duplications across components
47
Knowledges
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming OSS Outcome
Airframeをアップグレードする際に技術的負債を解消していく

Airframe
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Scala At Arm Treasure Data
● Scala is now an official language at Arm Treasure Data
● 0 -> 10+ engineers who can write Scala
● Use cases are growing:
● Query optimization, API, Spark, data analysis,
storage systems, service operation, etc.
● We are happy to share our Scala assets through Airframe!
48
Add Your GitHub Star!
wvlet/airframe
Airframe
calaエンジニアが充実してきたArm reasure Data。 calaの適用範囲も広がっている

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto Conference Tokyo 2019
● July 11 (Thu), 2019, 13:30 ~ (Free)
● https://techplay.jp/event/733772
● Inviting Presto Creators (Martin, Dain, David)
● Presto Software Foundation
● Talks from big Presto users in Japan
● Yahoo! JAPAN, LINE, Arm Treasure Data
● Presto Source Code Navigation
49
resto Conference okyo 2019を7/11(木) 13:30~より開催 (参加無料)

Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!
50

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core enginePLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoPGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
 
201810 td tech_talk
201810 td tech_talk201810 td tech_talk
201810 td tech_talk
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Functional APIs with Absinthe GraphQL
Functional APIs with Absinthe GraphQLFunctional APIs with Absinthe GraphQL
Functional APIs with Absinthe GraphQL
 
Improve data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDFImprove data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDF
 
Migrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming FlinkMigrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming Flink
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
HDF5 In Support of Database Applications
HDF5 In Support of Database ApplicationsHDF5 In Support of Database Applications
HDF5 In Support of Database Applications
 
Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!
 

Semelhante a How To Use Scala At Work - Airframe In Action at Arm Treasure Data

Semelhante a How To Use Scala At Work - Airframe In Action at Arm Treasure Data (20)

Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Introduction to Amazon EC2 F1 Instances
Introduction to Amazon EC2 F1 Instances Introduction to Amazon EC2 F1 Instances
Introduction to Amazon EC2 F1 Instances
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
 
Revisit Dependency Injection in scala
Revisit Dependency Injection in scalaRevisit Dependency Injection in scala
Revisit Dependency Injection in scala
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances - CMP402 - re...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances - CMP402 - re...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances - CMP402 - re...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances - CMP402 - re...
 
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 

Mais de Taro L. Saito

Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
Taro L. Saito
 

Mais de Taro L. Saito (17)

Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 
Silkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミングSilkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミング
 
2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ
 
Relational-Style XML Query @ SIGMOD-J 2008 Dec.
Relational-Style XML Query @ SIGMOD-J 2008 Dec.Relational-Style XML Query @ SIGMOD-J 2008 Dec.
Relational-Style XML Query @ SIGMOD-J 2008 Dec.
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

How To Use Scala At Work - Airframe In Action at Arm Treasure Data

  • 1. Taro L. Saito, Ph.D. Arm Treasure Data June 29, 2019 Scala Matsuri 2019 - Tokyo How To Use Scala At Work Airframe In Action At Arm Treasure Data 1calaを仕事で使おう - Arm reasure DataでのAirframe活用事例

  • 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) 2 ● Principal Software Engineer at Arm Treasure Data ● Building distributed query engine service ● Living in US for 4 years ● DBMS & Data Science Background ● Ph.D. of Computer Science ● Database Systems and Genome Sciences Research ● Assistant Professor at the University of Tokyo ● OSS Projects Around Scala ● sbt-sonatype: used for releasing 3000+ Scala projects ● snappy-java: a compression library used in Spark, Parquet, etc. 自己紹介

  • 3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. New Release from O’Reilly Japan ● Helped Japanese translation of Data-Intensive Application Design ● Techniques and concepts around distributed data processing systems ● Available at Amazon.co.jp and O’Reilly Japan web sites ● will be published on July 18, 2019 3 分散データシステム入門の決定版の翻訳が来月発売

  • 4. 400+ Customers Founded in 2011 Raised $54M Security Acquired by Arm / Softbank 2018 Arm Treasure Data Arm reasure Dataの概要

  • 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. The Architecture of Arm Treasure Data 5 DataLogs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing 2 million records / sec. 130 trillion records 1 billion rows processed / sec. Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS reasure Dataのシステム構成。 calaはどこに?

  • 6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe サービスの裏側で使われているAirframe ( cala製 ) のモジュール群

  • 7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Our OSS Strategy Around Scala ● Gather the best practices of Scala into Airframe OSS ● Get the real experiences by operating 24/7 services 7 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeを核にした cala周辺の 戦略
 Airframe
  • 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● Various internal and third-party Scala/Java libraries ● Managed in different repositories, different release cycles ● High-learning cost ■ The knowledge is confined to engineers’ brains 3 Years Ago... 8 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Various Libraries Outcome 3年前、Airframeは存在せず、様々なライブラリが混在していた
 logger launcher object mapper JDBC reader json4s jackson ….
  • 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 5 Years Ago... ● No Scala engineer in the company ● Scala in 2014: Scala 2.9.x ● Was not good enough to use: ■ e.g., no string interpolation like s”... ${x}...” 9 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Ruby, Java Outcome 5年前には calaのエンジニアも、 calaのコードもなかった

  • 10. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Today’s Agenda ● How to introduce Scala to your company ● Learn the best practices of using Scala at work ● From 20 Airframe modules 10本日紹介する内容
 Airframe
  • 11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. How Can We Introduce Scala? ● Saying “I want to use Scala” ● It will not work, especially if you or your team are not familiar with Scala ● Your managers need more information whether it’s good enough or not ● Even if you are a tech lead: ● Need some confidence in using Scala in production ● How can we establish such confidence in using Scala? 11calaをどう導入するか? calaを使っても良いという自信を得るには?

  • 12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Start With A Small Investment to Scala ● Guidelines ● Think how you can save your time with Scala ● If you can save 1 minute in a day, your can spend 6 hours for this improvement ■ Save 1 minute / day = 365 minutes / year = 6 hour investment ■ Save 10 minutes / week = 520 minutes / year = 8.6 hour investment ■ Save 1 hour / week = 52 hours / year = 2.2 day investment ● Time is your most valuable asset ● Save your time by using Scala 12「 calaを使って」時間を節約するための「小さな投資」をはじめよう

  • 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● prestop (presto + top) ● Non production service code ● A handy query monitoring tool for Presto, written in Scala ● Display complex JSON data with fancy ANSI color The First Scala Code in TD 13reasure Data最初の calaプログラム

  • 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-log ● Scala 2.10: My small investment to test Scala Macros and String interpolation ● A Modern Logging Library for Scala (at Medium) ● ANSI color and source code location display ● Just add LogSupport trait to your class 14プログラムの開発をログメッセージで効率化する

  • 15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-launcher ● Needed to handle complex command line options and nested commands ● e.g., $ prestop -e production monitor (other options …) ● Enabled annotation-based command line definitions 15複雑なコマンドラインプログラムを簡単に作成できるようにする

  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-config: Application Configuration Flow ● YAML config (embedded into Docker) ● Override credentials, then bind to config objects YAML development: addr: api-dev.com production: addr: api.com Config Object case class ServerConfig( addr: String, port: Int = 8080, password: String ) production: addr: api.com command: -e production Credentials and Local Configurations Merge Immutable Object Default Parameters (e.g., port = 8080) Object Mapping 16アプリケーション設定のフローをライブラリ化
 airframe-launcher > _
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. sbt-pack plugin ● A sbt plugin to create standalone Scala packages ● A single folder package with bin and lib folders containing all dependent JARs ● Generates command-line launcher scripts ● My small investment in 2012 to save packaging time 17sbt-packでプログラムをパッケージングし、Dockerイメージを手軽に作成
 airframe-launcher airframe-config YAML config file Standalone Scala Package sbt-pack Dockerfile
  • 18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Medium-SIze Investment: Find A Common Pattern ● Extract a common problem pattern and create a solution ● Data -> Object Mapping ● How many data readers and object mappers do we need? ● How can we save our time for handling such various data types? YAML JDBC ResultSet YAML Parser + Object Mapper Config Object Table Object Object-Relation Mapper JSON JSON Parser + Object Mapper Object 18入力データを cala bjectにマッピングしたいケースは多い。中期的な投資が必要

  • 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-msgpack: MessagePack as Universal Data Format ● MessagePack (msgpack.org) ● Compact JSON-like binary format ● Describes data types and data values at the same time (self-describing) Object Unpack Pack JDBC ResultSet Pack/Unpack YAML JSON 19essage ackを中間フォーマットとして使うと、オブジェクトマッパーの実装は1つに
 MessagePack
  • 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: MessagePack DBMS ● Fluentd -> MessagePack -> Arm Treasure Data ● Automatically generating table schema from MessagePack data ● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc. Table Schema Int Column Reader String Column Reader Update Schema Generate Reader Set Table Reader Schema-free Data 20 Data Collection Distributed Data Processing Arm reasure Dataは essage ackベースの chema-on-readシステム

  • 21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Schema-On-Read Data Processing with MessagePack ● Users can store arbitrary typed data (No table design is required) ● Data can be read in a target type required by the application (e.g., SQL query) Int Float Boolean String Array Map Binary SQL BigInt parseInt toInt 0 or 1 IntCodec Pack Unpack Error or null “100” (string) 100 (int) 100 (int) 21 Logs データ読み込み時に、アプリケーションの要求する型に合わせる ( chema-on- ead)
 CSV command-line arguments
  • 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-codec: Schema-On-Read Pack/Unpack Interface ● Apply schema-on-read for Scala objects Input MessagePack Output Pack Unpack PackUnpack 22essage ackを通した chema-on-readデータ変換インターフェースを calaに適用

  • 23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Pre-defined Codecs in airframe-codec ● Primitive Codecs ● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec ● FloatCodec, DoubleCodec ● StringCodec ● BooleanCodec ● TimeStampCodec ● Collection Codec ● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc. ● OptionCodec ● JsonCodec (airframe-json) ● Java-specific Codec ● FileCodec, ZonedDateTimeCodec, JDBCResultSetCodec, etc. ● Adding Custom Codecs ● Implement MessageCodec[X] interface 23calaで必要なほぼ全てのデータ型へのマッピングをサポート

  • 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. MessageCodec.of[A]: Combination of Codecs Unpack Pack IntCodec StringCodec DoubleCodec MessagePack MessageCodec.of[A] 24オブジェクトの型に合わせてCodecを合成

  • 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-surface ● Reading Type Signatures From ScalaSig ● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files ● Surface.of[A] ■ returns A’s parameter names and types class A (data:List[B]) class A data: List[java.lang.Object] class A data: List[java.lang.Object] ScalaSig: data:List[B] javac scalac Surface.of[A] data: List[B] scala.reflect.runtime.universe.TypeTag Type erasure removes generic type information 25オブジェクトの型情報を cala igから取得する

  • 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. [WIP] Scala.js RPC ● Scala.js ● Compiling Scala code into JavaScript for Web Browsers ● airframe-codec: Passing model class data between Scala and Scala.js UserInfo MessagePack UserInfo Pack Unpack PackUnpack Scala Server Side Scala.js Client Side XML RPC 26airframe-codecは cala.js(ブラウザ側)とのデータ受け渡しにも使える

  • 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. [WIP] airframe-sql ● Universal stream SQL engine ● Processing various types of data through MessagePack MessagePack Stream SQL MessagePack Query Processing Filter/Aggregation/Join, etc. 27任意のデータ形式に対し、 essage ackを通して で処理をする
 JDBC ResultSet Pack YAML JSON
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28 Scala In Production
  • 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. A Technical Debt In TD (2015-2016) ● Prestogres: PostgreSQL gateway to Presto ● Enabled using PostgreSQL JDBC/ODBC drivers to access Presto ● So-called Sada (founder)’s magic ● Was good for the first use cases ● Many Problems: ● Hacks around pgpool-II was hard to debug ● Hard to support customers upon errors ● Incompatible SQL with Presto ● Nobody could fix these issues ■ including the creator! 29restogresというハックが技術的負債になっていた

  • 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Replacing Prestogres with Prestobase 30calaで restobaseのプロトタイプを作成. 3ヶ月後にサービスリリース
 ● Prototyped in Scala within a week after a quick chat with Sada ● Utilizing Airframe assets ● Deployed as a production service in 3 months
  • 31. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-di ● Created a dependency injection library for Scala ● For Prestobase development ● Scala-friendly Syntax ● Useful for combining hundreds of modules ● based on airframe-surface, airframe-log ● See also: ● Airframe Meetup #1 Report (2018) 31restobaseの開発中に calaのためのAirframe DIが誕生

  • 32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Airframe OSS ● Lightweight Building Blocks for Scala ● Collection of our investments to Scala ● Repackaged into wvlet.airframe in 2016 ● airframe-log ● airframe-launcher ● airframe-config ● airframe-surface ● airframe-di ● airframe-codec ● ... ● As of 2019, Airframe has 20 modules ● 35+ releases in 2018 ● Already had 17+ releases in 2019 ● Contributing to the Scala Community Build ● To test the latest Scala versions 322016年に各種ツールをAirframeとして統合。20のモジュール、頻繁なリリースサイクル
 Airframe
  • 33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Monorepo ● Cross build ● For 3 + 1 Scala versions ■ 2.13, 2.12, 2.11, and Scala.js ● 20 modules ■ 4 x 20 = 80 artifacts! ● Challenge ● Publishing took 3 hours with sbt-release ● Bottleneck ● Sequential run of compile -> test -> publish for all artifacts 33Airframeはメンテナンスを集約するため単一レポジトリ構成

  • 34. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Release Automation on Travis CI ● Single-Step Release ● Triggered by git tag ● Running Tasks In Parallel ● Run tests for each Scala version ● Update doc & release notes ■ Generate release notes from git logs ● Publish ■ sbt-pgp & sbt-sonatype ○ GPG signature ○ Copy to Maven Central ● Finishes around 10~20 minutes ● Blog: 3 Tips For Maintaining Scala Projects 34ravis CI上でリリースを全自動化し、頻繁なリリースを可能に

  • 35. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. sbt-sonatype plugin ● A sbt-plugin for releasing projects to Maven Central ● open staging repository -> verify -> close -> promote -> drop ● A small investment ● At 2015 new year holiday => Payed off for saving Airframe release time ● 3000+ Scala projects are using sbt-sonatype 35sbt-sonatypeはお正月休みに作られたプロジェクト。多くの calaライブラリで使われている

  • 36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http ● Created a simple HTTP framework ● Based on Airframe modules: ■ airframe-surface ■ airframe-codec ■ airframe-msgpack ■ etc. ● Blog ● Building Low-Friction Web Service Over Finagle ● Save the time for choosing a web framework: ● Many frameworks exist: ● e.g, Finatra, Finch, akka-http, spring, RESTeasy, open-api, swagger, etc. 36Airframe資産を活用して、Webフレームワークも手軽に作成

  • 37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-client ● Error handling of HTTP requests is difficult ● 4xx, 5xx status code ● Should we retry the request? ■ IOException, EOFException ■ TimeoutException ■ InterruptedException ■ SSLException ■ InvocationTargetException ● HTTP client ● request retries ● response mapping ■ JSON, MessagePack format ● airframe-codec 37間違いやすいH リクエストのエラーハンドリングをライブラリ化

  • 38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-control ● Everything can fail … ● Network disconnection ● Servere crash ● ... ● Retry ● Exponential backoff ■ 2x, 4x, ... ● Jittering ■ 1 sec., 2 * rand, 4 * rand, … ● Customize error type classifiers ● retryable failures ● non-retryable failures 38リトライ処理をパターン化

  • 39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-recorder ● Testing against actual web services is time consuming ● Record & Replay HTTP responses ● Reproducible results ● Runnable on small machines (e.g., Travis CI) 39H リクエストをレコーディングして、Webサービスのテストを効率化する
 HTTP Request HTTP Recorder Request Real Web Service Recording Mode: Response HTTP Request HTTP Recorder Replay Mode: Request Response Recording Responses Request Recorded Responses
  • 40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 40 Data Analysis with Scala
  • 41. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Data-Driven System Optimization ● TD is one of the biggest users of TD ● Query logs ● Collecting all Presto query logs since 2015 ● Query statements, performance statistics, logs, etc. ● Logs are our valuable assets ● To understand user activities and enable data-driven optimizations 41 Logs User Query Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System システムの最適化のためにログの収集、解析が重要

  • 42. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-fluentd ● Collect Scala Application Logs To Fluentd ● Scala Objects -> MessagePack -> Fluentd 42essage ackを受け取るFluentdには、airframe-codeの出力を渡せる
 Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System airframe-fluentd Scala Objects airframe-codec
  • 43. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-jmx ● Add @JMX annotation to your application metrics ● It’s also useful to check the application version, configurations, etc. ● JMX clients can check these metrics ● e.g., jconsole 43J Xで、JV の外側からアプリケーションの状態を確認し、メトリックを収集

  • 44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-metrics ● Human Readable Data Format (ElapsedTime, DataSize, etc.) ● Handy Time Window String Support 44時間幅、区間、データサイズを人間を扱いやすい形式にし、ログの解析を効率化

  • 45. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Taking Snapshots of Data Analysis Tasks ● Save Long-Running Task Results As MessagePack (binary) ● Save the cost of re-computation Result: Seq[A] MessagePack Storage Pack Save Unpack Task Run Load Second Run: Load Compute (e.g., 10 min) First run Snapshot 45Airframe資産を活用して、データ解析結果をキャッシュし作業を効率化する

  • 46. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe Airframeを中心にコード資産が形成されている

  • 47. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Resolving Technical Debts with Airframe Upgrade ● Migrate common programming patterns into Airframe ● Upgrade Airframe Version ● YY.MM.patch versioning: 19.5.x, 19.6.x, … ■ Easy to see how behind the project is from the latest version. ● Reduce code and logic duplications across components 47 Knowledges Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeをアップグレードする際に技術的負債を解消していく
 Airframe
  • 48. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Scala At Arm Treasure Data ● Scala is now an official language at Arm Treasure Data ● 0 -> 10+ engineers who can write Scala ● Use cases are growing: ● Query optimization, API, Spark, data analysis, storage systems, service operation, etc. ● We are happy to share our Scala assets through Airframe! 48 Add Your GitHub Star! wvlet/airframe Airframe calaエンジニアが充実してきたArm reasure Data。 calaの適用範囲も広がっている

  • 49. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Conference Tokyo 2019 ● July 11 (Thu), 2019, 13:30 ~ (Free) ● https://techplay.jp/event/733772 ● Inviting Presto Creators (Martin, Dain, David) ● Presto Software Foundation ● Talks from big Presto users in Japan ● Yahoo! JAPAN, LINE, Arm Treasure Data ● Presto Source Code Navigation 49 resto Conference okyo 2019を7/11(木) 13:30~より開催 (参加無料)

  • 50. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 50