Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
1. Taro L. Saito, Ph.D.
Arm Treasure Data
June 29, 2019
Scala Matsuri 2019 - Tokyo
How To Use Scala At Work
Airframe In Action At Arm Treasure Data
1calaを仕事で使おう - Arm reasure DataでのAirframe活用事例
2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
2
● Principal Software Engineer at Arm
Treasure Data
● Building distributed query engine service
● Living in US for 4 years
● DBMS & Data Science Background
● Ph.D. of Computer Science
● Database Systems and Genome
Sciences Research
● Assistant Professor at the University of
Tokyo
● OSS Projects Around Scala
● sbt-sonatype: used for releasing 3000+
Scala projects
● snappy-java: a compression library used
in Spark, Parquet, etc.
自己紹介
3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
New Release from O’Reilly Japan
● Helped Japanese translation of Data-Intensive
Application Design
● Techniques and concepts around distributed data
processing systems
● Available at Amazon.co.jp and O’Reilly Japan web sites
● will be published on July 18, 2019
3
分散データシステム入門の決定版の翻訳が来月発売
5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
The Architecture of Arm Treasure Data
5
DataLogs
Device
Data
Batch
Data
PlazmaDB
Table Schema
Data Collection Cloud Storage Distributed Data Processing
2 million records / sec. 130 trillion records 1 billion rows processed / sec.
Jobs
Job Management
SQL Editor
Scheduler
Workflows
Machine
Learning
Treasure Data OSS
Third Party OSS
reasure Dataのシステム構成。 calaはどこに?
6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Module Mix-InPackaging
HTTP Requests and
Responses
Data
airframe-launcher
> _
airframe-log
production:
port: 10010
user: xxxx
...
airframe-config
airframe-codec
sbt-pack
airframe-fluentd
Scala
Objects
Table Data
(CSV, TSV)
JSON
airframe-jsonairframe-surface
airframe-tablet
airframe-jmx
Monitor Runtime States
Generate Mapping Codec
Metrics &
Log Data
JDBC
ResultSets
airframe-jdbc
airframe-http
airframe-http-finagle
Launch HTTP
Services
airframe DI
Debug Logs
Schema-On-Read
Mapping
Airframe
サービスの裏側で使われているAirframe ( cala製 ) のモジュール群
7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Our OSS Strategy Around Scala
● Gather the best practices of Scala into Airframe OSS
● Get the real experiences by operating 24/7 services
7
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming OSS Outcome
Airframeを核にした cala周辺の 戦略
Airframe
8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● Various internal and third-party Scala/Java libraries
● Managed in different repositories, different release cycles
● High-learning cost
■ The knowledge is confined to engineers’ brains
3 Years Ago...
8
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming Various Libraries Outcome
3年前、Airframeは存在せず、様々なライブラリが混在していた
logger
launcher
object mapper
JDBC reader
json4s jackson
….
9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
5 Years Ago...
● No Scala engineer in the company
● Scala in 2014: Scala 2.9.x
● Was not good enough to use:
■ e.g., no string interpolation like s”... ${x}...”
9
Knowledge
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming Ruby, Java Outcome
5年前には calaのエンジニアも、 calaのコードもなかった
10. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Today’s Agenda
● How to introduce Scala to your company
● Learn the best practices of using Scala at work
● From 20 Airframe modules
10本日紹介する内容
Airframe
11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
How Can We Introduce Scala?
● Saying “I want to use Scala”
● It will not work, especially if you or your team are not familiar with Scala
● Your managers need more information whether it’s good enough or not
● Even if you are a tech lead:
● Need some confidence in using Scala in production
● How can we establish such confidence in using Scala?
11calaをどう導入するか? calaを使っても良いという自信を得るには?
12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Start With A Small Investment to Scala
● Guidelines
● Think how you can save your time with Scala
● If you can save 1 minute in a day, your can spend 6 hours for this improvement
■ Save 1 minute / day = 365 minutes / year = 6 hour investment
■ Save 10 minutes / week = 520 minutes / year = 8.6 hour investment
■ Save 1 hour / week = 52 hours / year = 2.2 day investment
● Time is your most valuable asset
● Save your time by using Scala
12「 calaを使って」時間を節約するための「小さな投資」をはじめよう
13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● prestop (presto + top)
● Non production service code
● A handy query monitoring tool for Presto, written in Scala
● Display complex JSON data with fancy ANSI color
The First Scala Code in TD
13reasure Data最初の calaプログラム
14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-log
● Scala 2.10: My small investment to test Scala Macros and String interpolation
● A Modern Logging Library for Scala (at Medium)
● ANSI color and source code location display
● Just add LogSupport trait to your class
14プログラムの開発をログメッセージで効率化する
15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-launcher
● Needed to handle complex command line options and nested commands
● e.g., $ prestop -e production monitor (other options …)
● Enabled annotation-based command line definitions
15複雑なコマンドラインプログラムを簡単に作成できるようにする
16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-config: Application Configuration Flow
● YAML config (embedded into Docker)
● Override credentials, then bind to config objects
YAML
development:
addr: api-dev.com
production:
addr: api.com
Config Object
case class ServerConfig(
addr: String,
port: Int = 8080,
password: String
)
production:
addr: api.com
command: -e production Credentials and Local
Configurations
Merge
Immutable
Object Default Parameters
(e.g., port = 8080)
Object
Mapping
16アプリケーション設定のフローをライブラリ化
airframe-launcher
> _
17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
sbt-pack plugin
● A sbt plugin to create standalone Scala packages
● A single folder package with bin and lib folders containing all dependent JARs
● Generates command-line launcher scripts
● My small investment in 2012 to save packaging time
17sbt-packでプログラムをパッケージングし、Dockerイメージを手軽に作成
airframe-launcher
airframe-config
YAML config file
Standalone
Scala Package
sbt-pack Dockerfile
18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Medium-SIze Investment: Find A Common Pattern
● Extract a common problem pattern and create a solution
● Data -> Object Mapping
● How many data readers and object mappers do we need?
● How can we save our time for handling such various data types?
YAML
JDBC
ResultSet
YAML Parser +
Object Mapper
Config
Object
Table
Object
Object-Relation
Mapper
JSON
JSON Parser +
Object Mapper
Object
18入力データを cala bjectにマッピングしたいケースは多い。中期的な投資が必要
19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-msgpack: MessagePack as Universal Data Format
● MessagePack (msgpack.org)
● Compact JSON-like binary format
● Describes data types and data values at the same time (self-describing)
Object
Unpack
Pack
JDBC
ResultSet
Pack/Unpack
YAML
JSON
19essage ackを中間フォーマットとして使うと、オブジェクトマッパーの実装は1つに
MessagePack
20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB: MessagePack DBMS
● Fluentd -> MessagePack -> Arm Treasure Data
● Automatically generating table schema from MessagePack data
● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc.
Table Schema
Int Column Reader
String Column Reader
Update
Schema
Generate
Reader Set
Table Reader
Schema-free Data
20
Data Collection Distributed Data Processing
Arm reasure Dataは essage ackベースの chema-on-readシステム
21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Schema-On-Read Data Processing with MessagePack
● Users can store arbitrary typed data (No table design is required)
● Data can be read in a target type required by the application (e.g., SQL query)
Int
Float
Boolean
String
Array
Map
Binary
SQL BigInt
parseInt
toInt
0 or 1
IntCodec
Pack Unpack
Error or null
“100”
(string)
100
(int)
100
(int)
21
Logs
データ読み込み時に、アプリケーションの要求する型に合わせる ( chema-on- ead)
CSV
command-line
arguments
22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-codec: Schema-On-Read Pack/Unpack Interface
● Apply schema-on-read for Scala objects
Input MessagePack Output
Pack Unpack
PackUnpack
22essage ackを通した chema-on-readデータ変換インターフェースを calaに適用
23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Pre-defined Codecs in airframe-codec
● Primitive Codecs
● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec
● FloatCodec, DoubleCodec
● StringCodec
● BooleanCodec
● TimeStampCodec
● Collection Codec
● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc.
● OptionCodec
● JsonCodec (airframe-json)
● Java-specific Codec
● FileCodec, ZonedDateTimeCodec, JDBCResultSetCodec, etc.
● Adding Custom Codecs
● Implement MessageCodec[X] interface
23calaで必要なほぼ全てのデータ型へのマッピングをサポート
24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
MessageCodec.of[A]: Combination of Codecs
Unpack
Pack
IntCodec
StringCodec
DoubleCodec
MessagePack
MessageCodec.of[A]
24オブジェクトの型に合わせてCodecを合成
25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-surface
● Reading Type Signatures From ScalaSig
● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files
● Surface.of[A]
■ returns A’s parameter names and types
class A (data:List[B])
class A
data: List[java.lang.Object]
class A
data: List[java.lang.Object]
ScalaSig: data:List[B]
javac
scalac
Surface.of[A]
data: List[B]
scala.reflect.runtime.universe.TypeTag
Type erasure removes
generic type information
25オブジェクトの型情報を cala igから取得する
26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
[WIP] Scala.js RPC
● Scala.js
● Compiling Scala code into JavaScript for Web Browsers
● airframe-codec: Passing model class data between Scala and Scala.js
UserInfo MessagePack UserInfo
Pack Unpack
PackUnpack
Scala
Server Side
Scala.js
Client Side
XML RPC
26airframe-codecは cala.js(ブラウザ側)とのデータ受け渡しにも使える
27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
[WIP] airframe-sql
● Universal stream SQL engine
● Processing various types of data through MessagePack
MessagePack Stream SQL MessagePack
Query
Processing
Filter/Aggregation/Join, etc.
27任意のデータ形式に対し、 essage ackを通して で処理をする
JDBC
ResultSet
Pack
YAML
JSON
28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28
Scala In Production
29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
A Technical Debt In TD (2015-2016)
● Prestogres: PostgreSQL gateway to Presto
● Enabled using PostgreSQL JDBC/ODBC
drivers to access Presto
● So-called Sada (founder)’s magic
● Was good for the first use cases
● Many Problems:
● Hacks around pgpool-II was hard to
debug
● Hard to support customers upon errors
● Incompatible SQL with Presto
● Nobody could fix these issues
■ including the creator!
29restogresというハックが技術的負債になっていた
30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Replacing Prestogres with Prestobase
30calaで restobaseのプロトタイプを作成. 3ヶ月後にサービスリリース
● Prototyped in Scala within a week after a quick chat with Sada
● Utilizing Airframe assets
● Deployed as a production service in 3 months
31. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-di
● Created a dependency injection library for Scala
● For Prestobase development
● Scala-friendly Syntax
● Useful for combining hundreds of modules
● based on airframe-surface, airframe-log
● See also:
● Airframe Meetup #1 Report (2018)
31restobaseの開発中に calaのためのAirframe DIが誕生
32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Airframe OSS
● Lightweight Building Blocks for Scala
● Collection of our investments to Scala
● Repackaged into wvlet.airframe in 2016
● airframe-log
● airframe-launcher
● airframe-config
● airframe-surface
● airframe-di
● airframe-codec
● ...
● As of 2019, Airframe has 20 modules
● 35+ releases in 2018
● Already had 17+ releases in 2019
● Contributing to the Scala Community Build
● To test the latest Scala versions
322016年に各種ツールをAirframeとして統合。20のモジュール、頻繁なリリースサイクル
Airframe
33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Monorepo
● Cross build
● For 3 + 1 Scala versions
■ 2.13, 2.12, 2.11, and Scala.js
● 20 modules
■ 4 x 20 = 80 artifacts!
● Challenge
● Publishing took 3 hours with
sbt-release
● Bottleneck
● Sequential run of compile -> test ->
publish for all artifacts
33Airframeはメンテナンスを集約するため単一レポジトリ構成
34. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Release Automation on Travis CI
● Single-Step Release
● Triggered by git tag
● Running Tasks In Parallel
● Run tests for each Scala version
● Update doc & release notes
■ Generate release notes
from git logs
● Publish
■ sbt-pgp & sbt-sonatype
○ GPG signature
○ Copy to Maven Central
● Finishes around 10~20 minutes
● Blog: 3 Tips For Maintaining
Scala Projects
34ravis CI上でリリースを全自動化し、頻繁なリリースを可能に
35. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
sbt-sonatype plugin
● A sbt-plugin for releasing projects to Maven Central
● open staging repository -> verify -> close -> promote -> drop
● A small investment
● At 2015 new year holiday => Payed off for saving Airframe release time
● 3000+ Scala projects are using sbt-sonatype
35sbt-sonatypeはお正月休みに作られたプロジェクト。多くの calaライブラリで使われている
36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http
● Created a simple HTTP framework
● Based on Airframe modules:
■ airframe-surface
■ airframe-codec
■ airframe-msgpack
■ etc.
● Blog
● Building Low-Friction Web Service
Over Finagle
● Save the time for choosing a web
framework:
● Many frameworks exist:
● e.g, Finatra, Finch, akka-http, spring,
RESTeasy, open-api, swagger, etc.
36Airframe資産を活用して、Webフレームワークも手軽に作成
37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http-client
● Error handling of HTTP requests is
difficult
● 4xx, 5xx status code
● Should we retry the request?
■ IOException, EOFException
■ TimeoutException
■ InterruptedException
■ SSLException
■ InvocationTargetException
● HTTP client
● request retries
● response mapping
■ JSON, MessagePack format
● airframe-codec
37間違いやすいH リクエストのエラーハンドリングをライブラリ化
38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-control
● Everything can fail …
● Network disconnection
● Servere crash
● ...
● Retry
● Exponential backoff
■ 2x, 4x, ...
● Jittering
■ 1 sec., 2 * rand, 4 * rand, …
● Customize error type classifiers
● retryable failures
● non-retryable failures
38リトライ処理をパターン化
39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-http-recorder
● Testing against actual web services is time consuming
● Record & Replay HTTP responses
● Reproducible results
● Runnable on small machines (e.g., Travis CI)
39H リクエストをレコーディングして、Webサービスのテストを効率化する
HTTP
Request
HTTP
Recorder
Request
Real Web
Service
Recording Mode:
Response
HTTP
Request
HTTP
Recorder
Replay Mode:
Request
Response Recording
Responses
Request
Recorded
Responses
40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 40
Data Analysis with Scala
41. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Data-Driven System Optimization
● TD is one of the biggest users of TD
● Query logs
● Collecting all Presto query logs since 2015
● Query statements, performance statistics, logs, etc.
● Logs are our valuable assets
● To understand user activities and enable data-driven optimizations
41
Logs
User
Query
Collect Query Logs
Analyze Query Logs
Machine
Learning
Query
Optimization
Optimize System
システムの最適化のためにログの収集、解析が重要
42. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-fluentd
● Collect Scala Application Logs To Fluentd
● Scala Objects -> MessagePack -> Fluentd
42essage ackを受け取るFluentdには、airframe-codeの出力を渡せる
Collect Query Logs
Analyze Query Logs
Machine
Learning
Query
Optimization
Optimize System
airframe-fluentd
Scala
Objects
airframe-codec
43. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
airframe-jmx
● Add @JMX annotation to your application metrics
● It’s also useful to check the application version, configurations, etc.
● JMX clients can check these metrics
● e.g., jconsole
43J Xで、JV の外側からアプリケーションの状態を確認し、メトリックを収集
44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
airframe-metrics
● Human Readable Data Format (ElapsedTime, DataSize, etc.)
● Handy Time Window String Support
44時間幅、区間、データサイズを人間を扱いやすい形式にし、ログの解析を効率化
45. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Taking Snapshots of Data Analysis Tasks
● Save Long-Running Task Results As MessagePack (binary)
● Save the cost of re-computation
Result: Seq[A] MessagePack Storage
Pack
Save
Unpack
Task
Run
Load
Second Run:
Load
Compute
(e.g., 10 min)
First run
Snapshot
45Airframe資産を活用して、データ解析結果をキャッシュし作業を効率化する
46. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Module Mix-InPackaging
HTTP Requests and
Responses
Data
airframe-launcher
> _
airframe-log
production:
port: 10010
user: xxxx
...
airframe-config
airframe-codec
sbt-pack
airframe-fluentd
Scala
Objects
Table Data
(CSV, TSV)
JSON
airframe-jsonairframe-surface
airframe-tablet
airframe-jmx
Monitor Runtime States
Generate Mapping Codec
Metrics &
Log Data
JDBC
ResultSets
airframe-jdbc
airframe-http
airframe-http-finagle
Launch HTTP
Services
airframe DI
Debug Logs
Schema-On-Read
Mapping
Airframe
Airframeを中心にコード資産が形成されている
47. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Resolving Technical Debts with Airframe Upgrade
● Migrate common programming patterns into Airframe
● Upgrade Airframe Version
● YY.MM.patch versioning: 19.5.x, 19.6.x, …
■ Easy to see how behind the project is from the latest version.
● Reduce code and logic duplications across components
47
Knowledges
Experiences
Design Decisions
Products
24/7 Services
Business Values
Programming OSS Outcome
Airframeをアップグレードする際に技術的負債を解消していく
Airframe
48. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Scala At Arm Treasure Data
● Scala is now an official language at Arm Treasure Data
● 0 -> 10+ engineers who can write Scala
● Use cases are growing:
● Query optimization, API, Spark, data analysis,
storage systems, service operation, etc.
● We are happy to share our Scala assets through Airframe!
48
Add Your GitHub Star!
wvlet/airframe
Airframe
calaエンジニアが充実してきたArm reasure Data。 calaの適用範囲も広がっている
49. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto Conference Tokyo 2019
● July 11 (Thu), 2019, 13:30 ~ (Free)
● https://techplay.jp/event/733772
● Inviting Presto Creators (Martin, Dain, David)
● Presto Software Foundation
● Talks from big Presto users in Japan
● Yahoo! JAPAN, LINE, Arm Treasure Data
● Presto Source Code Navigation
49
resto Conference okyo 2019を7/11(木) 13:30~より開催 (参加無料)