SlideShare a Scribd company logo
1 of 26
Download to read offline
CREATE
INTELLIGENCE
FROM DATA
Session 4
14:40 - 15:15
Data Lake API
with BigQuery.
PRESENTERS: Imre Nagi
Software Engineer
Traveloka
Rendy Bambang Jr.
Data System Architect
Traveloka
Imre Nagi
Software Engineer
Traveloka
Data
Provisioning API
with BigQuery.
Rendy Bambang Jr.
Data System Architect
Traveloka
Traveloka
7 offices
Jakarta, Singapore,
Bangalore, etc
2,000+
Global employees
400+
Engineers
Metrics
● ~4 TiB per day data goes in to PubSub
● ~400 TB (~500 billion rows) data in BigQuery
● ~250 TB data in GCS
● >2 PiB BigQuery data scan per month (excluding ETLs)
● >60k batch jobs executed per day
● >2500 Dataflow jobs per day
● >1500 charts using BigQuery generated via BI tools
AGENDA
● How we use Data
● Problem Statement
● Data Lake API
● Future work
Data drives
product & enables
business use case
Each mission team has unique use
cases in terms of data usage. Data team
in Traveloka needs to fulfill this need in
order to maximize Traveloka growth and
revenue.
● Personalization
● Fraud Detection
● Improving User Experience
● A/B Test
● Giving recommendation
● Review Moderation
● Photo classification
● and many other use cases
How we use data
1Understanding
The Problem
Data Provisioning in Traveloka
Machine To Machine Machine To Human
Frequent, Small
request
Huge Data, High
Latency
Data Provisioning in Traveloka
Machine To Machine Machine To Human
Frequent, Small
request
Huge Data, High
Latency
Huge Data, High
Latency
How we previously deliver big data to product team
Product team
requests data for a
specific use case
Data team provides
raw or
pre-processed data
in a blob storage
Data team grants
access to bucket or
tables for team
microservices
Product team pulls
the data and do its
job
1 2 3 4
IngestionSystemDataProcessing
Kafka
Batch Ingest
S3 Data Lake
PostgreSQL
ETL
Traveloka
Services
Initial Data Delivery System
This becomes problematic
● Systems are tightly coupled
● No column level access control
● Hard to audit data usage
What we need?
A standardised Way in Accessing Data
● Clear contract between client and server
● Client is not tightly coupled to internal
implementation
● Better access control
2Async
Data Provisioning API
Data
Provisioning
API
Requirements
1. Clear contract with JSON query
2. Asynchronously return huge number of
data
3. Well defined access control up to
column level
Technology We Use
● BigQuery
● Cloud Composer
● GKE
● Stackdriver Logging
● Cloud Storage
● Cloud SQL
● RxJava
● Dropwizard
Data Provisioning API underlying architecture
Storage
Cloud Storage
For Storing
Results
Cloud SQL
BigQuery
Interface
Data Provision
API
Consumers
Traveloka Backend
Services
Data Source
BigQuery
Tracking Data
Processing Pipeline
BigQuery SQL
Orchestration
Monitoring Logging
Architecture: Data Provision API overall architecture
Storage
Cloud Storage
For Storing
Results
Cloud SQL
BigQuery
Interface
Data Provision
API
Consumers
Traveloka Backend
Services
Data Source
BigQuery
Tracking Data
Processing Pipeline
BigQuery SQL
Orchestration
Monitoring Logging
How Data Provisioning API works
Query
Interpreter
Interf
ace
Monitoring
Logging
Service
Client
Kubernetes
Engine
Query
Validation
BigQuery
Cloud
SQL
Cloud
StorageJob Creation
Query
Execution
Write to
Permanent
Table
Export
Permanent
Table to GCS
Generate
Sign URL
Store the
sign URL
ACL Checks
3Future
Improvement
Future Improvement
● Use Queue to manage the jobs
● Add more capabilities to the
query features (complex
aggregation, etc)
● Separating ACL service to
enable service reuse.
THANK YOU
Jakarta, Indonesia
October 4th, 2018

More Related Content

What's hot

Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registryconfluent
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudRobert Sanders
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
How to use hybrid cloud to migrate and deploy unified business applications i...
How to use hybrid cloud to migrate and deploy unified business applications i...How to use hybrid cloud to migrate and deploy unified business applications i...
How to use hybrid cloud to migrate and deploy unified business applications i...Eric D. Schabell
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_dataNitin Kumar
 
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMS
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMSDIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMS
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMSChamalee Wickrama Arachchi
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...HostedbyConfluent
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming AppsWSO2
 
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful Streams
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful StreamsKafka Summit SF 2017 - Building Event-Driven Services with Stateful Streams
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful Streamsconfluent
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
Analysis of data science software 2020
Analysis of data science software 2020Analysis of data science software 2020
Analysis of data science software 2020Russ Reinsch
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGMatt Stubbs
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...confluent
 
Mutable data @ scale
Mutable data @ scaleMutable data @ scale
Mutable data @ scaleOri Reshef
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...HostedbyConfluent
 

What's hot (20)

Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
How to use hybrid cloud to migrate and deploy unified business applications i...
How to use hybrid cloud to migrate and deploy unified business applications i...How to use hybrid cloud to migrate and deploy unified business applications i...
How to use hybrid cloud to migrate and deploy unified business applications i...
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_data
 
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMS
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMSDIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMS
DIGITAL ADDRESSING CODES FOR LOCATION BASED SYSTEMS
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
 
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful Streams
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful StreamsKafka Summit SF 2017 - Building Event-Driven Services with Stateful Streams
Kafka Summit SF 2017 - Building Event-Driven Services with Stateful Streams
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
AmazonRedshift
AmazonRedshiftAmazonRedshift
AmazonRedshift
 
Analysis of data science software 2020
Analysis of data science software 2020Analysis of data science software 2020
Analysis of data science software 2020
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
 
Mutable data @ scale
Mutable data @ scaleMutable data @ scale
Mutable data @ scale
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...
Continuous Intelligence for Customer Service Using Kafka Event Streams | Simo...
 

Similar to Data Provision API with BigQuery - Google Cloud Summit Jakarta 18

CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsRendy Bambang Junior
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptxHarissh16
 
Why and How SmartNews uses SaaS?
Why and How SmartNews uses SaaS?Why and How SmartNews uses SaaS?
Why and How SmartNews uses SaaS?Takumi Sakamoto
 
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編Miho Yamamoto
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache SparkOren Raboy
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in SparkDigital Vidya
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarRasel Rana
 
Innovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfInnovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfCristina Vidu
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
OpenMetadata Community Meeting - 14 Dec. 2023
OpenMetadata Community Meeting - 14 Dec. 2023OpenMetadata Community Meeting - 14 Dec. 2023
OpenMetadata Community Meeting - 14 Dec. 2023OpenMetadata
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 

Similar to Data Provision API with BigQuery - Google Cloud Summit Jakarta 18 (20)

CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analytics
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Why and How SmartNews uses SaaS?
Why and How SmartNews uses SaaS?Why and How SmartNews uses SaaS?
Why and How SmartNews uses SaaS?
 
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Innovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfInnovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdf
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
OpenMetadata Community Meeting - 14 Dec. 2023
OpenMetadata Community Meeting - 14 Dec. 2023OpenMetadata Community Meeting - 14 Dec. 2023
OpenMetadata Community Meeting - 14 Dec. 2023
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 

More from Imre Nagi

GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamImre Nagi
 
What is Google Cloud Platform - GDG DevFest 18 Depok
What is Google Cloud Platform - GDG DevFest 18 DepokWhat is Google Cloud Platform - GDG DevFest 18 Depok
What is Google Cloud Platform - GDG DevFest 18 DepokImre Nagi
 
Engineering Life @ Traveloka
Engineering Life @ TravelokaEngineering Life @ Traveloka
Engineering Life @ TravelokaImre Nagi
 
e-KTP Information Extraction with Google Cloud Function & Google Cloud Vision
e-KTP Information Extraction with Google Cloud Function & Google Cloud Visione-KTP Information Extraction with Google Cloud Function & Google Cloud Vision
e-KTP Information Extraction with Google Cloud Function & Google Cloud VisionImre Nagi
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformImre Nagi
 
Dockerize Your Project - GDGBogor
Dockerize Your Project - GDGBogorDockerize Your Project - GDGBogor
Dockerize Your Project - GDGBogorImre Nagi
 

More from Imre Nagi (6)

GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
 
What is Google Cloud Platform - GDG DevFest 18 Depok
What is Google Cloud Platform - GDG DevFest 18 DepokWhat is Google Cloud Platform - GDG DevFest 18 Depok
What is Google Cloud Platform - GDG DevFest 18 Depok
 
Engineering Life @ Traveloka
Engineering Life @ TravelokaEngineering Life @ Traveloka
Engineering Life @ Traveloka
 
e-KTP Information Extraction with Google Cloud Function & Google Cloud Vision
e-KTP Information Extraction with Google Cloud Function & Google Cloud Visione-KTP Information Extraction with Google Cloud Function & Google Cloud Vision
e-KTP Information Extraction with Google Cloud Function & Google Cloud Vision
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud Platform
 
Dockerize Your Project - GDGBogor
Dockerize Your Project - GDGBogorDockerize Your Project - GDGBogor
Dockerize Your Project - GDGBogor
 

Recently uploaded

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Data Provision API with BigQuery - Google Cloud Summit Jakarta 18

  • 2. Session 4 14:40 - 15:15 Data Lake API with BigQuery. PRESENTERS: Imre Nagi Software Engineer Traveloka Rendy Bambang Jr. Data System Architect Traveloka
  • 3. Imre Nagi Software Engineer Traveloka Data Provisioning API with BigQuery. Rendy Bambang Jr. Data System Architect Traveloka
  • 4.
  • 5. Traveloka 7 offices Jakarta, Singapore, Bangalore, etc 2,000+ Global employees 400+ Engineers
  • 6. Metrics ● ~4 TiB per day data goes in to PubSub ● ~400 TB (~500 billion rows) data in BigQuery ● ~250 TB data in GCS ● >2 PiB BigQuery data scan per month (excluding ETLs) ● >60k batch jobs executed per day ● >2500 Dataflow jobs per day ● >1500 charts using BigQuery generated via BI tools
  • 7. AGENDA ● How we use Data ● Problem Statement ● Data Lake API ● Future work
  • 8. Data drives product & enables business use case Each mission team has unique use cases in terms of data usage. Data team in Traveloka needs to fulfill this need in order to maximize Traveloka growth and revenue.
  • 9. ● Personalization ● Fraud Detection ● Improving User Experience ● A/B Test ● Giving recommendation ● Review Moderation ● Photo classification ● and many other use cases How we use data
  • 11. Data Provisioning in Traveloka Machine To Machine Machine To Human Frequent, Small request Huge Data, High Latency
  • 12. Data Provisioning in Traveloka Machine To Machine Machine To Human Frequent, Small request Huge Data, High Latency Huge Data, High Latency
  • 13. How we previously deliver big data to product team Product team requests data for a specific use case Data team provides raw or pre-processed data in a blob storage Data team grants access to bucket or tables for team microservices Product team pulls the data and do its job 1 2 3 4
  • 14. IngestionSystemDataProcessing Kafka Batch Ingest S3 Data Lake PostgreSQL ETL Traveloka Services Initial Data Delivery System
  • 15. This becomes problematic ● Systems are tightly coupled ● No column level access control ● Hard to audit data usage
  • 16. What we need? A standardised Way in Accessing Data ● Clear contract between client and server ● Client is not tightly coupled to internal implementation ● Better access control
  • 18. Data Provisioning API Requirements 1. Clear contract with JSON query 2. Asynchronously return huge number of data 3. Well defined access control up to column level
  • 19. Technology We Use ● BigQuery ● Cloud Composer ● GKE ● Stackdriver Logging ● Cloud Storage ● Cloud SQL ● RxJava ● Dropwizard
  • 20. Data Provisioning API underlying architecture Storage Cloud Storage For Storing Results Cloud SQL BigQuery Interface Data Provision API Consumers Traveloka Backend Services Data Source BigQuery Tracking Data Processing Pipeline BigQuery SQL Orchestration Monitoring Logging Architecture: Data Provision API overall architecture
  • 21. Storage Cloud Storage For Storing Results Cloud SQL BigQuery Interface Data Provision API Consumers Traveloka Backend Services Data Source BigQuery Tracking Data Processing Pipeline BigQuery SQL Orchestration Monitoring Logging
  • 22. How Data Provisioning API works Query Interpreter Interf ace Monitoring Logging Service Client Kubernetes Engine Query Validation BigQuery Cloud SQL Cloud StorageJob Creation Query Execution Write to Permanent Table Export Permanent Table to GCS Generate Sign URL Store the sign URL ACL Checks
  • 24. Future Improvement ● Use Queue to manage the jobs ● Add more capabilities to the query features (complex aggregation, etc) ● Separating ACL service to enable service reuse.