SlideShare uma empresa Scribd logo
1 de 33
Traveloka’s Journey to
No Ops Streaming
Analytics
Rendy Bambang Jr., Data Eng Lead - Traveloka
Gaurav Anand, Solutions Engineer - Google
● Business Intelligence
● Analytics
● Personalization
● Fraud Detection
● Ads optimization
● Cross selling
● AB Test
● etc.
How we use the data
6 offices
Incl. Singapore
1,000+
Global employees
400+
Engineers
Our technology core has enabled
us to scale Traveloka into
6 countries
across ASEAN rapidly in
less than 2 years.
#EnablingMobility
In the beginning...
Consumer of Data
Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
Key Numbers
● Volume kafka: billions of messages/day
● In-Memory DB: hundreds of GB in-memory data
● NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases
● S3: hundreds of TB
● Spark: 20+ nodes, 200+ core
● Redshift DW: 20+ Nodes, tens of TB
● Team: 8 Developers + 3 SysOps/DevOps
Consumer of Data
Problems with Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
Problems with Initial Data Architecture
Debugging Kafka Issues - Dedicated
On-call
Data Warehouse throughput issues
for high frequency load, coupling
storage & compute
Team well being, paged on holiday,
even honeymoon for infra issue!
Scaling Issues with NoSQL DB and
In-Memory DB
Scaling Issues with Custom-built
Java Consumers
How do we..?
Ideal Solution
Fully-managed infrastructure to
free engineers to solve business
problems
Autoscaling of Storage and
Compute
Low end-to-end latency with
guaranteed SLA
Resilience, end-to-end system
availability
Solution Components
● Google Cloud PubSub (Events Data Ingestion)
● Google Cloud Dataflow (Stream Processing)
● Google Bigquery (Analytics)
● Cross-Cloud Environment (AWS-GCP)
● AWS DynamoDB (Operational datastore)
Note: Although Cloud Datastore was our prefered operational DB, but its non availability in SG region
necessitated use of Dynamodb.
How did we..?
Analytics Architecture: Reimagined
Consumer of Data
Streaming
Batch
Traveloka
App
Kafka
ETL
Data
Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
DOMO
Analytics UI
NoSQL DB
Traveloka
Services
Ingest
Cloud
Pub/Sub
Storage
Cloud
Storage
Pipelines
Cloud
Dataflow
Analytics
BigQuery
Monitoring
Logging
Hive, Presto
Query
Developed two Common Dataflow Engine
● Self-Service Streaming analytics to BigQuery
Developed two Common Dataflow Engine
● Stream processing to DynamoDB, common features for dev:
○ Combine by key
○ Optimistic Concurrency
○ Local-file based integration test
Key Facts/Numbers
● End to End Pipeline Latency: seconds
● Volume: hundreds of GB/day
● Team: 2 Developers, 0 Ops
● Agility: POC + Pilot in 1 month
● Migrate 50+ different stream processing use case in 1 month
● Bigquery Integration with BI tools: thousands of dashboard, hundreds of
users
Awesome Autoscale
Pubsub & Dataflow could
absorb spiky load just fine!
Our case: promo
PubSub Publish Count DataFlow vcpus Count
Why Cloud Dataflow
(Beam): Tuning Pipeline
vs. Managing Servers
The Lambda Architecture
Unified Model with Apache Beam
Batch
or
Streaming
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Unified Model with Apache Beam
Optimize
Schedule
Why Google Cloud..?
Traveloka Data Team Philosophy
● Managed Service
● NoOps
● Self-Service
Focus more on solving complex business problems rather than focusing on
infrastructure
What required us to change?
● Ever increasing scale
● Ever increasing operations burden
● New business needs: Streaming Analytics
What’s Next..?
Next Generation Architecture
Cloud Pub/Sub
Cloud Dataflow
BigQuery Cloud Storage
Kubernetes Cluster Collector
Managed services
Simplify!
BI &
Analytics UI
Conclusion
Our engineering team of
2 produces and
maintains like a team of
8 because of products
like PubSub, Dataflow &
Bigquery
“
”
Q&A
Thank You.

Mais conteúdo relacionado

Mais procurados

Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
Databricks
 

Mais procurados (20)

Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Meetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiMeetup Google BigQuery powered by ai
Meetup Google BigQuery powered by ai
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data Pipeline
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks Summit
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 

Semelhante a Traveloka's journey to no ops streaming analytics

Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Codecamp Romania
 

Semelhante a Traveloka's journey to no ops streaming analytics (20)

Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govNot Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
 
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
 
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
The great migration embracing serverless first
The great migration  embracing serverless first The great migration  embracing serverless first
The great migration embracing serverless first
 
AWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runnerAWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runner
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 
CloudKit
CloudKitCloudKit
CloudKit
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Traveloka's journey to no ops streaming analytics

  • 1. Traveloka’s Journey to No Ops Streaming Analytics Rendy Bambang Jr., Data Eng Lead - Traveloka Gaurav Anand, Solutions Engineer - Google
  • 2. ● Business Intelligence ● Analytics ● Personalization ● Fraud Detection ● Ads optimization ● Cross selling ● AB Test ● etc. How we use the data
  • 3. 6 offices Incl. Singapore 1,000+ Global employees 400+ Engineers Our technology core has enabled us to scale Traveloka into 6 countries across ASEAN rapidly in less than 2 years.
  • 6. Consumer of Data Initial Data Architecture Streaming Batch Traveloka App Kafka ETL In Memory Real Time DW Data Warehouse S3 Data Lake Batch Ingest Android, iOS NoSQL Realtime DB Traveloka Services Hive, Presto Query DOMO Analytics UI
  • 7. Key Numbers ● Volume kafka: billions of messages/day ● In-Memory DB: hundreds of GB in-memory data ● NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases ● S3: hundreds of TB ● Spark: 20+ nodes, 200+ core ● Redshift DW: 20+ Nodes, tens of TB ● Team: 8 Developers + 3 SysOps/DevOps
  • 8. Consumer of Data Problems with Initial Data Architecture Streaming Batch Traveloka App Kafka ETL In Memory Real Time DW Data Warehouse S3 Data Lake Batch Ingest Android, iOS NoSQL Realtime DB Traveloka Services Hive, Presto Query DOMO Analytics UI
  • 9. Problems with Initial Data Architecture Debugging Kafka Issues - Dedicated On-call Data Warehouse throughput issues for high frequency load, coupling storage & compute Team well being, paged on holiday, even honeymoon for infra issue! Scaling Issues with NoSQL DB and In-Memory DB Scaling Issues with Custom-built Java Consumers
  • 11. Ideal Solution Fully-managed infrastructure to free engineers to solve business problems Autoscaling of Storage and Compute Low end-to-end latency with guaranteed SLA Resilience, end-to-end system availability
  • 12. Solution Components ● Google Cloud PubSub (Events Data Ingestion) ● Google Cloud Dataflow (Stream Processing) ● Google Bigquery (Analytics) ● Cross-Cloud Environment (AWS-GCP) ● AWS DynamoDB (Operational datastore) Note: Although Cloud Datastore was our prefered operational DB, but its non availability in SG region necessitated use of Dynamodb.
  • 14. Analytics Architecture: Reimagined Consumer of Data Streaming Batch Traveloka App Kafka ETL Data Warehouse S3 Data Lake Batch Ingest Android, iOS DOMO Analytics UI NoSQL DB Traveloka Services Ingest Cloud Pub/Sub Storage Cloud Storage Pipelines Cloud Dataflow Analytics BigQuery Monitoring Logging Hive, Presto Query
  • 15. Developed two Common Dataflow Engine ● Self-Service Streaming analytics to BigQuery
  • 16. Developed two Common Dataflow Engine ● Stream processing to DynamoDB, common features for dev: ○ Combine by key ○ Optimistic Concurrency ○ Local-file based integration test
  • 17. Key Facts/Numbers ● End to End Pipeline Latency: seconds ● Volume: hundreds of GB/day ● Team: 2 Developers, 0 Ops ● Agility: POC + Pilot in 1 month ● Migrate 50+ different stream processing use case in 1 month ● Bigquery Integration with BI tools: thousands of dashboard, hundreds of users
  • 18. Awesome Autoscale Pubsub & Dataflow could absorb spiky load just fine! Our case: promo PubSub Publish Count DataFlow vcpus Count
  • 19. Why Cloud Dataflow (Beam): Tuning Pipeline vs. Managing Servers
  • 21. Unified Model with Apache Beam Batch or Streaming
  • 23. Unified Model with Apache Beam
  • 26. Traveloka Data Team Philosophy ● Managed Service ● NoOps ● Self-Service Focus more on solving complex business problems rather than focusing on infrastructure
  • 27. What required us to change? ● Ever increasing scale ● Ever increasing operations burden ● New business needs: Streaming Analytics
  • 29. Next Generation Architecture Cloud Pub/Sub Cloud Dataflow BigQuery Cloud Storage Kubernetes Cluster Collector Managed services Simplify! BI & Analytics UI
  • 30. Conclusion Our engineering team of 2 produces and maintains like a team of 8 because of products like PubSub, Dataflow & Bigquery “ ”
  • 31.
  • 32. Q&A

Notas do Editor

  1. One of the most strategic parts of Traveloka's business is a streaming data processing pipeline that powers a number of use cases, including fraud detection, personalization, ads optimization, cross selling, A/B testing, and promotion eligibility. In this talk, we’ll describe how Traveloka recently migrated this pipeline from a legacy architecture to a multi-cloud solution that includes the Google Cloud Platform (GCP) data analytics platform.
  2. Highlight data engineer, highlight singapore office Traveloka is a travel technology company based in Indonesia, Singapore, and India its goal is to revolutionize human mobility
  3. Traveloka vision
  4. The purpose of my talk today is to give you a practical feedback regarding PubSub, DataFlow, BigQuery and more broadly our usage of Google Cloud Platform. I'll start by briefly talking about Traveloka and what we do. Then I'll discuss the architecture we used for the past few years and the reasons why we decided to investigate new solutions. Finally, I'll present you what we put in place and the lessons learned along the way.
  5. Volume kafka: billions of messages/day In-Memory DB: hundreds of GB in-memory data NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases S3: hundreds of TB Spark: 20+ nodes, 200+ core Redshift DW: 20+ Nodes, tens of TB Team: 8 Developers + 3 SysOps/DevOps
  6. Volume kafka: billions of messages/day In-Memory DB: hundreds of GB in-memory data NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases S3: hundreds of TB Spark: 20+ nodes, 200+ core Redshift DW: 20+ Nodes, tens of TB Team: 8 Developers + 3 SysOps/DevOps
  7. Track Session: As Traveloka grew over time, several problems emerged, including:
  8. Track Session: We did our homework on technology that could support these requirements for our use case.
  9. 4minutes Highlight component Role and mapping similar function from both sides, like bigquery Hybrid, not the end state
  10. Track Session: We did our homework on technology that could support these requirements for our use case.
  11. Track Session: We did our homework on technology that could support these requirements for our use case.
  12. Touch team, 0 ops Agility migration Bigquery integration
  13. How big is the spike? 10x?
  14. 20th Minute slide.
  15. Basic idea: Run low-latency, weakly consistent streaming system Alongside high-latency, strongly consistent batch system And somehow merge their results together at the end. This provided low-latency, correct results. But at the cost of building, maintaining, and merging the results from two separate systems. So what we set out to do with Apache Beam, was to provide a unified model...
  16. One which could give you the features of both systems… But even more than that, one which would allow you to tradeoff the characteristics of each according to use case So after you write your pipeline,
  17. This is an approach we laid out in our 2013 VLDB paper on the Dataflow Model. And if you want to learn more in detail, that’s a good place to start.
  18. ...whether that’s Dataflow, Apache Spark, Apache Flink, Apache Apex, or any other runner we support. And then for our part...
  19. Dynamic Load balancing & Autoscaling Worker VMs, Optimize Pipelines,
  20. Remove AWS Rectangle, we will just add for BI Analytics
  21. Thank you for them!