Introduce to Apache Beam
Dive in to Beam's architecture and live demo running data pipeline on different runners such as Google Dataflow, Flink and Spark
7. Goal
• Provide an abstraction layer between data
processing’s code and the execution runtime.
• Batch processing and Streaming Jobs in one
world.
• Beam SDK open the door to write once, run
anywhere.*
on-premise and non-Google cloud
11. programming tips/ Flink
• Use the Flink DataStream API in Java and Scala
• Use the Beam API directly in Java (and soon
Python) with the Flink runner
13. for Flink user
• we encourage users to use either of the Beam or Flink
APIs to implement their Flink jobs for stream data
processing.
• But Native Flink API -
• backwards-compatible API
• built-in libraries (e.g., CEP and upcoming SQL)
• key-value state (with the ability to query that state in
the future)
http://data-artisans.com/why-apache-beam/
17. Another things
• BigQuery have DML support!!! https://goo.gl/
lcZQVZ
• DataStudio Beta in Taiwan is available
• Embulk
• Fluentd v0.14.6 - 2016/09/07