Enviar pesquisa
Carregar
Big data processing with PubSub, Dataflow, and BigQuery
•
2 gostaram
•
1,197 visualizações
T
Thuyen Ho
Seguir
Presented at Google Developer Group Vietnam 2018
Leia menos
Leia mais
Dados e análise
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 30
Baixar agora
Baixar para ler offline
Recomendados
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Rajit Saha
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
HostedbyConfluent
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks
Recomendados
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Rajit Saha
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
HostedbyConfluent
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
GraphRM
IoT at Google Scale
IoT at Google Scale
James Chittenden
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
confluent
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
InfluxData
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
Mais conteúdo relacionado
Mais procurados
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
GraphRM
IoT at Google Scale
IoT at Google Scale
James Chittenden
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
confluent
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
InfluxData
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
Mais procurados
(20)
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
IoT at Google Scale
IoT at Google Scale
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
Semelhante a Big data processing with PubSub, Dataflow, and BigQuery
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Rittman Analytics
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
Attunity
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Gleb Otochkin
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Amazon Web Services
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
HostedbyConfluent
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
Charly Mostert
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
Kent Graziano
Veritas + MongoDB
Veritas + MongoDB
MongoDB
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Denodo
Intro to Delta Lake
Intro to Delta Lake
Databricks
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
Semelhante a Big data processing with PubSub, Dataflow, and BigQuery
(20)
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
Veritas + MongoDB
Veritas + MongoDB
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Intro to Delta Lake
Intro to Delta Lake
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Último
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Timothy Spann
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
Halmar dropshipping via API with DroFx
Halmar dropshipping via API with DroFx
olyaivanovalion
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
olyaivanovalion
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
olyaivanovalion
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
olyaivanovalion
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
olyaivanovalion
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
olyaivanovalion
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
olyaivanovalion
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Delhi Call girls
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
olyaivanovalion
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
fulawalesam
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
firstjob4
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
olyaivanovalion
Último
(20)
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Halmar dropshipping via API with DroFx
Halmar dropshipping via API with DroFx
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Big data processing with PubSub, Dataflow, and BigQuery
1.
BIG DATA PROCESSING
WITH PUB/SUB, DATAFLOW AND BIGQUERY Thuyen Ho – Data Engineer @ KNOREX © 2018 KNOREX
2.
© 2018 KNOREX Established
in 2010, Knorex provides Precision Performance Marketing products and solutions to leading trading desks, agencies and brands. Offices and direct business presence across US, UK, Australia, China, India and Southeast Asia (SEA) ABOUT KNOREX 8 OFFICES 110+ STAFFS . . . . ....
3.
© 2018 KNOREX 3 PROBLEM
STATEMENT Ingest large volume of streaming user data, transform based on ever changing parameters, and store them in a database in real time. This data will be used for 2 purpose: 1. Targeting users in real time for advertising campaigns 2. Aggregation of data for estimation of campaign reach Third- party partner KNOREX DMP Ingest stream events • QPS: ~1500 - 2000 events • Event size: 50KB – 100KB • Data Volume: ~1TB a day Historical data • Reprocess: ~30TB each day • Aggregate: ~60TB each day
4.
© 2018 KNOREX 4 •
Quick Introduction To Pub/Sub, Dataflow and BigQuery • KNOREX Approach • Q&A AGENDA
5.
5 Quick Introduction To
Pub/Sub, Dataflow and BigQuery
6.
© 2018 KNOREX 6 SERVERLESS
STREAM PROCESSING PIPELINE WITH GCP Dataflow stream processing BigQuery analytics engine Data events Processed data Pub/Sub messaging queue
7.
© 2018 KNOREX 7 Cloud
Pub/Sub is an asynchronous messaging service designed to be highly reliable and scalable. CLOUD PUB/SUB
8.
© 2018 KNOREX 8 CLOUD
PUB/SUB – PULL SUBSCRIPTION
9.
© 2018 KNOREX 9 CLOUD
PUB/SUB – PUSH SUBSCRIPTION
10.
© 2018 KNOREX1 0 Lambda
architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. (source: wikipedia.org) To balance: • Latency • Throughput • Fault-tolerance LAMBDA ARCHITECTURE
11.
© 2018 KNOREX1 1 DATA
PROCESSING - TRANSFORMS Storage Group Aggregate Filter Transform Input Data Output Data Data Processing
12.
© 2018 KNOREX1 2 Cloud
Dataflow is a fully-managed service, autoscaling execution environment for Beam pipelines. Beams supports the following language-specific SDKs: Java, Python and Go CLOUD DATAFLOW Implement batch and streaming data processing jobs that run on any execution engine. great execution environment
13.
© 2018 KNOREX1 3 BEAM
ABSTRACTIONS Storage Group Aggregate Filter Transform Input Data Output Data Data Processing Bounded / Unbounded PCollection PTransform PTransform PTransform PTransform Pipeline
14.
© 2018 KNOREX1 4 BEAM
- FIXED TIME WINDOWS 1 7 2 1 8 Unbounded events Processing time 3 8 6 3 5 3 8 8 2 4 2 1 9 3 7 30s window 0 00:00:00 00:00:30 00:01:00 00:01:30 30s window 1 30s window 2
15.
© 2018 KNOREX1 5 BEAM
– SLIDING TIME WINDOWS 1 7 2 1 8 Unbounded events Processing time 3 8 6 3 5 3 8 8 2 4 2 1 9 3 7 30s window 0 00:00:00 00:00:30 00:01:00 00:01:30 30s window 1 30s window 2
16.
© 2018 KNOREX1 6 BEAM
– SESSION WINDOWS 1 2 Processing time 2 4 7 window 0 00:00:00 00:00:30 00:01:00 00:01:30 window 1 window 2 7 4 2 2 2 2 2 2 2 4 4 4 Gap duration
17.
© 2018 KNOREX1 7 A
fast, highly scalable, cost-effective, and fully managed enterprise data warehouse for analytics. Some of the features: • Serverless • Real-time Analytics • Standard SQL • Storage and Compute Separation • Flexible Data Ingestion • Petabyte Scale CLOUD BIGQUERY
18.
© 2018 KNOREX1 8 BIGQUERY
STORAGE IS COLUMNAR Column1 Column2 Column3 Each column in sperate. No Indexes or key is required.
19.
© 2018 KNOREX1 9 INGESTION-TIME
PARTITIONED TABLE 19 Column1 Column2 Column3 SELECT Column1, Column2 FROM `database.table_name` WHERE PARTITIONDATE >= "2018-12-01" AND _PARTITIONDATE < "2018-12-03" 2018-12-01 00:00:00 2018-12-01 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-03 00:00:00 2018-12-03 00:00:00 _PARTITIONTIME 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 _PARTITIONDATE
20.
© 2018 KNOREX2 0 INGESTION-TIME
PARTITIONED TABLE Column1 Column2 Column3 SELECT Column1, Column2 FROM `database.table_name` WHERE PARTITIONDATE >= "2018-12-01" AND _PARTITIONDATE < "2018-12-03" 2018-12-01 00:00:00 2018-12-01 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-03 00:00:00 2018-12-03 00:00:00 _PARTITIONTIME 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 _PARTITIONDATE
21.
© 2018 KNOREX2 1 PARTITIONED
TABLE Column1 Column2 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 Column3 Partitioned based on data in a specified TIMESTAMP or DATE column. SELECT Column1, Column2 FROM `database.table_name` WHERE Column3 >= "2018-12-01" AND Column3 < "2018-12-03"
22.
22 KNOREX APPROACH
23.
© 2018 KNOREX2 3 ARCHITECTURE
– STREAMING PIPELINE Third-Party partner Processing and analytics CMS & RTB engine API gateway Cloud Load Balancing Data warehouse BigQuery Sharding + Clustering Stream proc Cloud Dataflow Autoscaling API Compute Engine Autoscaling Audience Cloud Bigtable 3 regions CMS Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic Segmented users Cloud Pub/Sub Device topic Python script Compute Engine Autoscaling Event ingest
24.
© 2018 KNOREX2 4 ARCHITECTURE
– EVENT INGEST GCE run code with auto-scaling instances. it receives 1500 events a sec from our partner. API endpoint will put events into two separate topics: cookie and device. Cloud Load Balancing API Compute Engine Autoscaling Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic 1500 events a sec
25.
© 2018 KNOREX2 5 ARCHITECTURE
– PROCESSING AND ANALYTICS 25 Cloud Dataflow transforms and enriches raw events in real time and inserts both processed data into BigQuery as well as send them to RTB engine through Pub/Sub. Each region has a subscription to pull data from segment topic, then insert into BigTable. BigQuery is a warehouse for analytics. Tables are partitioned by ingestion time. It keep data in 60 days. Data warehouse BigQuery Partition + Clustering Stream proc Cloud Dataflow Autoscaling Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic Segmented users Cloud Pub/Sub segment topic Asia region Compute Engine Cloud BigTable JP region Compute Engine Cloud BigTable US region Compute Engine Cloud BigTable CMS KNX RTB Engine
26.
© 2018 KNOREX2 6 ARCHITECTURE
– BATCH PIPELINE The Dataflow also takes data from BigQuery in the past 30 days and reprocess again in batch job. Cloud Dataflow batch processing BigQuery analytics engine Batch pipeline Batch loads BigQuery analytics engine Pub/Sub
27.
© 2018 KNOREX2 7 DATAFLOW
– PIPELINE VISUALIZATION
28.
28 Q&A
29.
29 Building Resilient Streaming
Systems Lab
30.
30 THANK YOU KNOR E
X.COM
Baixar agora