O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

How To Use Scala At Work - Airframe In Action at Arm Treasure Data

452 visualizações

Publicada em

ScalaMatsuri 2019 presentation. June 29, 2019

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

How To Use Scala At Work - Airframe In Action at Arm Treasure Data

  1. 1. Taro L. Saito, Ph.D. Arm Treasure Data June 29, 2019 Scala Matsuri 2019 - Tokyo How To Use Scala At Work Airframe In Action At Arm Treasure Data 1calaを仕事で使おう - Arm reasure DataでのAirframe活用事例

  2. 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) 2 ● Principal Software Engineer at Arm Treasure Data ● Building distributed query engine service ● Living in US for 4 years ● DBMS & Data Science Background ● Ph.D. of Computer Science ● Database Systems and Genome Sciences Research ● Assistant Professor at the University of Tokyo ● OSS Projects Around Scala ● sbt-sonatype: used for releasing 3000+ Scala projects ● snappy-java: a compression library used in Spark, Parquet, etc. 自己紹介

  3. 3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. New Release from O’Reilly Japan ● Helped Japanese translation of Data-Intensive Application Design ● Techniques and concepts around distributed data processing systems ● Available at Amazon.co.jp and O’Reilly Japan web sites ● will be published on July 18, 2019 3 分散データシステム入門の決定版の翻訳が来月発売

  4. 4. 400+ Customers Founded in 2011 Raised $54M Security Acquired by Arm / Softbank 2018 Arm Treasure Data Arm reasure Dataの概要

  5. 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. The Architecture of Arm Treasure Data 5 DataLogs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing 2 million records / sec. 130 trillion records 1 billion rows processed / sec. Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS reasure Dataのシステム構成。 calaはどこに?

  6. 6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe サービスの裏側で使われているAirframe ( cala製 ) のモジュール群

  7. 7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Our OSS Strategy Around Scala ● Gather the best practices of Scala into Airframe OSS ● Get the real experiences by operating 24/7 services 7 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeを核にした cala周辺の 戦略
 Airframe
  8. 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● Various internal and third-party Scala/Java libraries ● Managed in different repositories, different release cycles ● High-learning cost ■ The knowledge is confined to engineers’ brains 3 Years Ago... 8 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Various Libraries Outcome 3年前、Airframeは存在せず、様々なライブラリが混在していた
 logger launcher object mapper JDBC reader json4s jackson ….
  9. 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 5 Years Ago... ● No Scala engineer in the company ● Scala in 2014: Scala 2.9.x ● Was not good enough to use: ■ e.g., no string interpolation like s”... ${x}...” 9 Knowledge Experiences Design Decisions Products 24/7 Services Business Values Programming Ruby, Java Outcome 5年前には calaのエンジニアも、 calaのコードもなかった

  10. 10. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Today’s Agenda ● How to introduce Scala to your company ● Learn the best practices of using Scala at work ● From 20 Airframe modules 10本日紹介する内容
 Airframe
  11. 11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. How Can We Introduce Scala? ● Saying “I want to use Scala” ● It will not work, especially if you or your team are not familiar with Scala ● Your managers need more information whether it’s good enough or not ● Even if you are a tech lead: ● Need some confidence in using Scala in production ● How can we establish such confidence in using Scala? 11calaをどう導入するか? calaを使っても良いという自信を得るには?

  12. 12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Start With A Small Investment to Scala ● Guidelines ● Think how you can save your time with Scala ● If you can save 1 minute in a day, your can spend 6 hours for this improvement ■ Save 1 minute / day = 365 minutes / year = 6 hour investment ■ Save 10 minutes / week = 520 minutes / year = 8.6 hour investment ■ Save 1 hour / week = 52 hours / year = 2.2 day investment ● Time is your most valuable asset ● Save your time by using Scala 12「 calaを使って」時間を節約するための「小さな投資」をはじめよう

  13. 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● prestop (presto + top) ● Non production service code ● A handy query monitoring tool for Presto, written in Scala ● Display complex JSON data with fancy ANSI color The First Scala Code in TD 13reasure Data最初の calaプログラム

  14. 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-log ● Scala 2.10: My small investment to test Scala Macros and String interpolation ● A Modern Logging Library for Scala (at Medium) ● ANSI color and source code location display ● Just add LogSupport trait to your class 14プログラムの開発をログメッセージで効率化する

  15. 15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-launcher ● Needed to handle complex command line options and nested commands ● e.g., $ prestop -e production monitor (other options …) ● Enabled annotation-based command line definitions 15複雑なコマンドラインプログラムを簡単に作成できるようにする

  16. 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-config: Application Configuration Flow ● YAML config (embedded into Docker) ● Override credentials, then bind to config objects YAML development: addr: api-dev.com production: addr: api.com Config Object case class ServerConfig( addr: String, port: Int = 8080, password: String ) production: addr: api.com command: -e production Credentials and Local Configurations Merge Immutable Object Default Parameters (e.g., port = 8080) Object Mapping 16アプリケーション設定のフローをライブラリ化
 airframe-launcher > _
  17. 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. sbt-pack plugin ● A sbt plugin to create standalone Scala packages ● A single folder package with bin and lib folders containing all dependent JARs ● Generates command-line launcher scripts ● My small investment in 2012 to save packaging time 17sbt-packでプログラムをパッケージングし、Dockerイメージを手軽に作成
 airframe-launcher airframe-config YAML config file Standalone Scala Package sbt-pack Dockerfile
  18. 18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Medium-SIze Investment: Find A Common Pattern ● Extract a common problem pattern and create a solution ● Data -> Object Mapping ● How many data readers and object mappers do we need? ● How can we save our time for handling such various data types? YAML JDBC ResultSet YAML Parser + Object Mapper Config Object Table Object Object-Relation Mapper JSON JSON Parser + Object Mapper Object 18入力データを cala bjectにマッピングしたいケースは多い。中期的な投資が必要

  19. 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-msgpack: MessagePack as Universal Data Format ● MessagePack (msgpack.org) ● Compact JSON-like binary format ● Describes data types and data values at the same time (self-describing) Object Unpack Pack JDBC ResultSet Pack/Unpack YAML JSON 19essage ackを中間フォーマットとして使うと、オブジェクトマッパーの実装は1つに
 MessagePack
  20. 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: MessagePack DBMS ● Fluentd -> MessagePack -> Arm Treasure Data ● Automatically generating table schema from MessagePack data ● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc. Table Schema Int Column Reader String Column Reader Update Schema Generate Reader Set Table Reader Schema-free Data 20 Data Collection Distributed Data Processing Arm reasure Dataは essage ackベースの chema-on-readシステム

  21. 21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Schema-On-Read Data Processing with MessagePack ● Users can store arbitrary typed data (No table design is required) ● Data can be read in a target type required by the application (e.g., SQL query) Int Float Boolean String Array Map Binary SQL BigInt parseInt toInt 0 or 1 IntCodec Pack Unpack Error or null “100” (string) 100 (int) 100 (int) 21 Logs データ読み込み時に、アプリケーションの要求する型に合わせる ( chema-on- ead)
 CSV command-line arguments
  22. 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-codec: Schema-On-Read Pack/Unpack Interface ● Apply schema-on-read for Scala objects Input MessagePack Output Pack Unpack PackUnpack 22essage ackを通した chema-on-readデータ変換インターフェースを calaに適用

  23. 23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Pre-defined Codecs in airframe-codec ● Primitive Codecs ● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec ● FloatCodec, DoubleCodec ● StringCodec ● BooleanCodec ● TimeStampCodec ● Collection Codec ● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc. ● OptionCodec ● JsonCodec (airframe-json) ● Java-specific Codec ● FileCodec, ZonedDateTimeCodec, JDBCResultSetCodec, etc. ● Adding Custom Codecs ● Implement MessageCodec[X] interface 23calaで必要なほぼ全てのデータ型へのマッピングをサポート

  24. 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. MessageCodec.of[A]: Combination of Codecs Unpack Pack IntCodec StringCodec DoubleCodec MessagePack MessageCodec.of[A] 24オブジェクトの型に合わせてCodecを合成

  25. 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-surface ● Reading Type Signatures From ScalaSig ● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files ● Surface.of[A] ■ returns A’s parameter names and types class A (data:List[B]) class A data: List[java.lang.Object] class A data: List[java.lang.Object] ScalaSig: data:List[B] javac scalac Surface.of[A] data: List[B] scala.reflect.runtime.universe.TypeTag Type erasure removes generic type information 25オブジェクトの型情報を cala igから取得する

  26. 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. [WIP] Scala.js RPC ● Scala.js ● Compiling Scala code into JavaScript for Web Browsers ● airframe-codec: Passing model class data between Scala and Scala.js UserInfo MessagePack UserInfo Pack Unpack PackUnpack Scala Server Side Scala.js Client Side XML RPC 26airframe-codecは cala.js(ブラウザ側)とのデータ受け渡しにも使える

  27. 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. [WIP] airframe-sql ● Universal stream SQL engine ● Processing various types of data through MessagePack MessagePack Stream SQL MessagePack Query Processing Filter/Aggregation/Join, etc. 27任意のデータ形式に対し、 essage ackを通して で処理をする
 JDBC ResultSet Pack YAML JSON
  28. 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28 Scala In Production
  29. 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. A Technical Debt In TD (2015-2016) ● Prestogres: PostgreSQL gateway to Presto ● Enabled using PostgreSQL JDBC/ODBC drivers to access Presto ● So-called Sada (founder)’s magic ● Was good for the first use cases ● Many Problems: ● Hacks around pgpool-II was hard to debug ● Hard to support customers upon errors ● Incompatible SQL with Presto ● Nobody could fix these issues ■ including the creator! 29restogresというハックが技術的負債になっていた

  30. 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Replacing Prestogres with Prestobase 30calaで restobaseのプロトタイプを作成. 3ヶ月後にサービスリリース
 ● Prototyped in Scala within a week after a quick chat with Sada ● Utilizing Airframe assets ● Deployed as a production service in 3 months
  31. 31. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-di ● Created a dependency injection library for Scala ● For Prestobase development ● Scala-friendly Syntax ● Useful for combining hundreds of modules ● based on airframe-surface, airframe-log ● See also: ● Airframe Meetup #1 Report (2018) 31restobaseの開発中に calaのためのAirframe DIが誕生

  32. 32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Airframe OSS ● Lightweight Building Blocks for Scala ● Collection of our investments to Scala ● Repackaged into wvlet.airframe in 2016 ● airframe-log ● airframe-launcher ● airframe-config ● airframe-surface ● airframe-di ● airframe-codec ● ... ● As of 2019, Airframe has 20 modules ● 35+ releases in 2018 ● Already had 17+ releases in 2019 ● Contributing to the Scala Community Build ● To test the latest Scala versions 322016年に各種ツールをAirframeとして統合。20のモジュール、頻繁なリリースサイクル
 Airframe
  33. 33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Monorepo ● Cross build ● For 3 + 1 Scala versions ■ 2.13, 2.12, 2.11, and Scala.js ● 20 modules ■ 4 x 20 = 80 artifacts! ● Challenge ● Publishing took 3 hours with sbt-release ● Bottleneck ● Sequential run of compile -> test -> publish for all artifacts 33Airframeはメンテナンスを集約するため単一レポジトリ構成

  34. 34. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Release Automation on Travis CI ● Single-Step Release ● Triggered by git tag ● Running Tasks In Parallel ● Run tests for each Scala version ● Update doc & release notes ■ Generate release notes from git logs ● Publish ■ sbt-pgp & sbt-sonatype ○ GPG signature ○ Copy to Maven Central ● Finishes around 10~20 minutes ● Blog: 3 Tips For Maintaining Scala Projects 34ravis CI上でリリースを全自動化し、頻繁なリリースを可能に

  35. 35. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. sbt-sonatype plugin ● A sbt-plugin for releasing projects to Maven Central ● open staging repository -> verify -> close -> promote -> drop ● A small investment ● At 2015 new year holiday => Payed off for saving Airframe release time ● 3000+ Scala projects are using sbt-sonatype 35sbt-sonatypeはお正月休みに作られたプロジェクト。多くの calaライブラリで使われている

  36. 36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http ● Created a simple HTTP framework ● Based on Airframe modules: ■ airframe-surface ■ airframe-codec ■ airframe-msgpack ■ etc. ● Blog ● Building Low-Friction Web Service Over Finagle ● Save the time for choosing a web framework: ● Many frameworks exist: ● e.g, Finatra, Finch, akka-http, spring, RESTeasy, open-api, swagger, etc. 36Airframe資産を活用して、Webフレームワークも手軽に作成

  37. 37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-client ● Error handling of HTTP requests is difficult ● 4xx, 5xx status code ● Should we retry the request? ■ IOException, EOFException ■ TimeoutException ■ InterruptedException ■ SSLException ■ InvocationTargetException ● HTTP client ● request retries ● response mapping ■ JSON, MessagePack format ● airframe-codec 37間違いやすいH リクエストのエラーハンドリングをライブラリ化

  38. 38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-control ● Everything can fail … ● Network disconnection ● Servere crash ● ... ● Retry ● Exponential backoff ■ 2x, 4x, ... ● Jittering ■ 1 sec., 2 * rand, 4 * rand, … ● Customize error type classifiers ● retryable failures ● non-retryable failures 38リトライ処理をパターン化

  39. 39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-http-recorder ● Testing against actual web services is time consuming ● Record & Replay HTTP responses ● Reproducible results ● Runnable on small machines (e.g., Travis CI) 39H リクエストをレコーディングして、Webサービスのテストを効率化する
 HTTP Request HTTP Recorder Request Real Web Service Recording Mode: Response HTTP Request HTTP Recorder Replay Mode: Request Response Recording Responses Request Recorded Responses
  40. 40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 40 Data Analysis with Scala
  41. 41. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Data-Driven System Optimization ● TD is one of the biggest users of TD ● Query logs ● Collecting all Presto query logs since 2015 ● Query statements, performance statistics, logs, etc. ● Logs are our valuable assets ● To understand user activities and enable data-driven optimizations 41 Logs User Query Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System システムの最適化のためにログの収集、解析が重要

  42. 42. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-fluentd ● Collect Scala Application Logs To Fluentd ● Scala Objects -> MessagePack -> Fluentd 42essage ackを受け取るFluentdには、airframe-codeの出力を渡せる
 Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System airframe-fluentd Scala Objects airframe-codec
  43. 43. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. airframe-jmx ● Add @JMX annotation to your application metrics ● It’s also useful to check the application version, configurations, etc. ● JMX clients can check these metrics ● e.g., jconsole 43J Xで、JV の外側からアプリケーションの状態を確認し、メトリックを収集

  44. 44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. airframe-metrics ● Human Readable Data Format (ElapsedTime, DataSize, etc.) ● Handy Time Window String Support 44時間幅、区間、データサイズを人間を扱いやすい形式にし、ログの解析を効率化

  45. 45. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Taking Snapshots of Data Analysis Tasks ● Save Long-Running Task Results As MessagePack (binary) ● Save the cost of re-computation Result: Seq[A] MessagePack Storage Pack Save Unpack Task Run Load Second Run: Load Compute (e.g., 10 min) First run Snapshot 45Airframe資産を活用して、データ解析結果をキャッシュし作業を効率化する

  46. 46. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Module Mix-InPackaging HTTP Requests and Responses Data airframe-launcher > _ airframe-log production: port: 10010 user: xxxx ... airframe-config airframe-codec sbt-pack airframe-fluentd Scala Objects Table Data (CSV, TSV) JSON airframe-jsonairframe-surface airframe-tablet airframe-jmx Monitor Runtime States Generate Mapping Codec Metrics & Log Data JDBC ResultSets airframe-jdbc airframe-http airframe-http-finagle Launch HTTP Services airframe DI Debug Logs Schema-On-Read Mapping Airframe Airframeを中心にコード資産が形成されている

  47. 47. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Resolving Technical Debts with Airframe Upgrade ● Migrate common programming patterns into Airframe ● Upgrade Airframe Version ● YY.MM.patch versioning: 19.5.x, 19.6.x, … ■ Easy to see how behind the project is from the latest version. ● Reduce code and logic duplications across components 47 Knowledges Experiences Design Decisions Products 24/7 Services Business Values Programming OSS Outcome Airframeをアップグレードする際に技術的負債を解消していく
 Airframe
  48. 48. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Scala At Arm Treasure Data ● Scala is now an official language at Arm Treasure Data ● 0 -> 10+ engineers who can write Scala ● Use cases are growing: ● Query optimization, API, Spark, data analysis, storage systems, service operation, etc. ● We are happy to share our Scala assets through Airframe! 48 Add Your GitHub Star! wvlet/airframe Airframe calaエンジニアが充実してきたArm reasure Data。 calaの適用範囲も広がっている

  49. 49. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Conference Tokyo 2019 ● July 11 (Thu), 2019, 13:30 ~ (Free) ● https://techplay.jp/event/733772 ● Inviting Presto Creators (Martin, Dain, David) ● Presto Software Foundation ● Talks from big Presto users in Japan ● Yahoo! JAPAN, LINE, Arm Treasure Data ● Presto Source Code Navigation 49 resto Conference okyo 2019を7/11(木) 13:30~より開催 (参加無料)

  50. 50. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 50

×