SlideShare uma empresa Scribd logo
1 de 45
Ho Nguyen
• Senior Software Engineer
• Technical Interests:
• Solution & code design
• Distributed systems
• Video/Image encoding
• Hobbies
• Movies & music
• Manga & anime (One Piece, Dragon Ball...)
• Coffee lover
Data-intensive problem
Ho Nguyen
Senior Software Engineer
Outline
• Simple problem
• When the data is big
• More problems
• Approaches
Simple problem
Program diagram
Complete code
Face Detection
When the data is big
How big is the data?
• A data set of 2 billion records of
unique URLs
• Assuming the previous program
needs 2 seconds to complete =>
Concurrency number = 0.5 URL/s
2 ∗ 2 ∗ 10𝑒8
3600 ∗ 24
= 46296 𝑑𝑎𝑦𝑠 ≈ 127(𝑦𝑒𝑎𝑟𝑠)
What is the concurrency number we need to
complete the dataset in X days?
What is the concurrency number we need?
• Goal: X=7 Days
• 2 billions URLs
• Current concurrency 0.5 URL/s.
2 ∗ 10𝑒8
X ∗ 3600 ∗ 24
=
2 ∗ 10𝑒8
7 ∗ 3600 ∗ 24
≈ 3307 𝑈𝑅𝐿𝑠/𝑠
How to increase concurrency?
• Optimize code performance
• Increase hardware resource (CPU,
RAM, Disk, Network…) aka Scale-
up
• Scale-out
• Cloning to multiple processes
(X-Axis)
• Splitting by functions (Y-Axis)
• Data partitioning (Z-Axis)
Optimize code
• Pros
• Most effective if we found a bottleneck that can increase performance
to 661,300%
• Save infrastructure cost
• Cons
• Time consuming and uncertain
Scale-up
• Pros
• Easy to apply
• Cons
• Take time to find out the suitable
hardware configuration
• Expensive and limited
• Still need to optimize code and
redesign to take advantage of
hardware resources when cannot scale-up
Scale-out by cloning (X-Axis)
• Pros
• Can use all hardware
resources
• Not limited by hardware
• Cons
• More complex than scale-up
• Concurrency problems
Node 1 Node 2 Node 3
Scale-out by Splitting (Y-Axis)
Review the workflow
Scale-out by Splitting (Y-Axis)
• Download and resize image using CPU
• Face detection on GPU is faster
Reference: https://sites.google.com/site/facedetectionongpu/
Scale-out by Splitting (Y-Axis)
X-axis: Cloning
Download and
Process Image
Download and
Process Image
Download and
Process Image
Face Detection Face Detection
Y-axis:Splitting
Scale-out by Splitting (Y-Axis)
• Pro
• Reuse the advantage of hardware
• Cons
• Complex
• Concurrency problems
Scale-out by data-partitioning (Z-Axis)
Data schema
ID URL Done
1 https://abc.com/image1.jpg 1
2 https://abc.com/image2.jpg 0
3 https://abc.com/image3.jpg 0
4 https://abc.com/image4.jpg 0
Scale-out by data-partitioning (Z-Axis)
ID URL Done
1 https://abc.com/image1.jpg 0
3 https://abc.com/image2.jpg 0
ID URL Done
2 https://abc.com/image2.jpg 0
4 https://abc.com/image4.jpg 0
Key hashing
Scale-out by data-partitioning (Z-Axis)
ID URL Done
1 https://abc.com/image1.jpg 0
2 https://abc.com/image2.jpg 0
ID URL Done
3 https://abc.com/image2.jpg 0
4 https://abc.com/image4.jpg 0
Range base
Scale-out by data-partitioning (Z-Axis)
• Pros
• Increase database performance
• Reduce locking/non-locking
• Cons
• Increase maintenance and infrastructure cost
• Hard for automation scaling
Summary
• Skip the code optimization approach
• Skip the scale-up approach
• Focus on scale-out approaches
• We can increase the number of
processes/machines to increase the
concurrency number
• We can split into 2 services: Downloader and
Face Detections
• We may need data partition to optimize
database performance
Current approach
High Concurrency
Problems
Race condition
• Cause
• Same URL process twice or
more
• Impact
• Waste of resources
• Data corruption
• Faking concurrency
Race condition: How to solve?
• Distributed locks
• Pros
• N/a
• Cons
• Pessimistic locking impact
performance
• Hard to apply because we need to
synchronize multiples nodes
• Not good fault-tolerance
• Data sharding
• Pro
• High performance because of share
load (Physical shard)
• Cons
• Hard for scaling
• Increase maintenance & infrastructure
cost
• Queue/Worker
• Pros
• Easy to implement
• Easy to scale
• Good fault-tolerance
• Reusable communnication
• Con
• The load concentrates on the
queue so it can become a
bottleneck
Race condition: root cause
Race condition only causes
between Downloaders
=> If we found a way to
distribute the unique URL for
each downloader it will solve
the race condition for the whole
system.
Fault Tolerance
• Faults
• Network fault
• Network interruption
• IP Blocking
• Service crash
• Problems
• Can data be lost?
• Can the service restart and
continue to work on remaining
tasks?
Fault Tolerance criteria
Given When Then
A service crashed It restarted No Rework (Continue on
remaining items only)
Downloader service is running It crashed All downloaded images should
not be lost
FaceDetector service is running It crashed All detected result should not be
lost
Downloader is downloading
image
Network error happens Retry
Downloader retry to download
an image again
Network error is IP Locking Should rotate proxy to change
the ip
Service communication
• How do the services communicate?
• Do we need a load balancer?
Service communication methods
Type Method Pros Cons
Synchronous
HTTP • Familiar and
Simple to use
• Need a load
balancer
• Tight coupling
• Lock thread wait for
response
RPC • High performance
than HTTP
Asynchronous
Queue Messaging
(One-One)
• High performance
• Failure isolation
• Act as a load
balancer
• Reduced coupling
• Extra maintenance
cost
• Queue may
become bottleneck
Publich/Subscribe
(One-Many)
• We only need the
one-to-one
comunication
Summary
• Find approach to distribute unique URL to downloaders.
• The approach should pass the fault tolerance criteria
• We can base on the communication methods table to choose
the final solution
High Concurrency
Approaches
Approach 1: Range based physical shard
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑈𝑅𝐿𝑠
𝑚 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑜𝑛𝑠
𝑖 ∈ [0 … 𝑚 − 1]: 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟
𝑘 =
𝑛
𝑚
∶ number of urls in a partition
𝑠𝑡𝑎𝑟𝑡 𝑖 = 𝑘 ∗ 𝑖
𝑒𝑛𝑑 𝑖 =
𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘, 0 ≤ 𝑖 < 𝑚 − 1
𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘 + 𝑛 𝑚𝑜𝑑 𝑚 , 𝑖 = 𝑚 − 1
Approach 1: Range based physical shard
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No rework
+ Need to download
image again if crash
when face detection
+ Partition can be
abadoned
HTTP/gPRC • Pros
• Non locking on db level
• Cons
• Take time for preparation
• Hard to scale out/adjust
• Need load balancer
Approach 2: Logical shard
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠
𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑
𝑘 =
𝑛
𝑚
𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠
𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1
𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1
𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘.
⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
Approach 2: Logical shard
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No rework
+ Need to download image
again if crash when face
detection
+ Partition can be
abadoned
HTTP/gPRC • Pros
• Non locking on db level
• Simple implementation
• Cons
• Hard to scale out/adjust
• High database throughput
• Extra state to maintain: Total
Urls, Current Url Id,…
Approach 3: Queue/Worker x Logical Sharding
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠
𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑
𝑘 =
𝑛
𝑚
𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠
𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1
𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1
𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘.
⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
Approach 3: Queue/Worker x Logical Sharding
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No Rework
+ Failure isolation
+ Node is replacable
Messaging • Pros
• Easy to scale
• Easy fault-tolerance
• Fail isolation
• Asynchronous
• Cons
• Extra infrastructrure
• High throughput on queue
END
Questions
• How to measure and debug service?
• What is deployment process?
Q&A
THANK YOU FOR YOUR ATTENTION

Mais conteúdo relacionado

Mais procurados

Thiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngThiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngNguyen Minh Quang
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in MicroserviceNghia Minh
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
 
OAuth2 - Introduction
OAuth2 - IntroductionOAuth2 - Introduction
OAuth2 - IntroductionKnoldus Inc.
 
Go micro framework to build microservices
Go micro framework to build microservicesGo micro framework to build microservices
Go micro framework to build microservicesTechMaster Vietnam
 
itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1IT Expert Club
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture IT Expert Club
 
The Dual write problem
The Dual write problemThe Dual write problem
The Dual write problemJeppe Cramon
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
Passwords are passé. WebAuthn is simpler, stronger and ready to go
Passwords are passé. WebAuthn is simpler, stronger and ready to goPasswords are passé. WebAuthn is simpler, stronger and ready to go
Passwords are passé. WebAuthn is simpler, stronger and ready to goMichael Furman
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQWoo Young Choi
 
Redis trouble shooting
Redis trouble shootingRedis trouble shooting
Redis trouble shootingDaeMyung Kang
 
itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2IT Expert Club
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
ITLC HN 14 - Bizweb Microservices Architecture
ITLC HN 14  - Bizweb Microservices ArchitectureITLC HN 14  - Bizweb Microservices Architecture
ITLC HN 14 - Bizweb Microservices ArchitectureIT Expert Club
 

Mais procurados (20)

Thiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngThiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộng
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in Microservice
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
OAuth2 - Introduction
OAuth2 - IntroductionOAuth2 - Introduction
OAuth2 - Introduction
 
Go micro framework to build microservices
Go micro framework to build microservicesGo micro framework to build microservices
Go micro framework to build microservices
 
itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1
 
Why HATEOAS
Why HATEOASWhy HATEOAS
Why HATEOAS
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture
 
The Dual write problem
The Dual write problemThe Dual write problem
The Dual write problem
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
How Secure Are Your APIs?
How Secure Are Your APIs?How Secure Are Your APIs?
How Secure Are Your APIs?
 
Passwords are passé. WebAuthn is simpler, stronger and ready to go
Passwords are passé. WebAuthn is simpler, stronger and ready to goPasswords are passé. WebAuthn is simpler, stronger and ready to go
Passwords are passé. WebAuthn is simpler, stronger and ready to go
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ
 
Redis trouble shooting
Redis trouble shootingRedis trouble shooting
Redis trouble shooting
 
itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
ITLC HN 14 - Bizweb Microservices Architecture
ITLC HN 14  - Bizweb Microservices ArchitectureITLC HN 14  - Bizweb Microservices Architecture
ITLC HN 14 - Bizweb Microservices Architecture
 

Semelhante a Grokking Techtalk #37: Data intensive problem

Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonNeotys
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Offline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo OfflineOffline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo Offlineguestcb5c22
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2Sean Braymen
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and TreatsMarshall Yount
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013Server Density
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 

Semelhante a Grokking Techtalk #37: Data intensive problem (20)

Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Offline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo OfflineOffline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo Offline
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 

Mais de Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking VN
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoringGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking VN
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking VN
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking VN
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking VN
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di độngGrokking VN
 
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking VN
 

Mais de Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoring
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platform
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer Vision
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
 
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
 

Último

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 

Último (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 

Grokking Techtalk #37: Data intensive problem

  • 1. Ho Nguyen • Senior Software Engineer • Technical Interests: • Solution & code design • Distributed systems • Video/Image encoding • Hobbies • Movies & music • Manga & anime (One Piece, Dragon Ball...) • Coffee lover
  • 3. Outline • Simple problem • When the data is big • More problems • Approaches
  • 8.
  • 9. When the data is big
  • 10. How big is the data? • A data set of 2 billion records of unique URLs • Assuming the previous program needs 2 seconds to complete => Concurrency number = 0.5 URL/s 2 ∗ 2 ∗ 10𝑒8 3600 ∗ 24 = 46296 𝑑𝑎𝑦𝑠 ≈ 127(𝑦𝑒𝑎𝑟𝑠)
  • 11. What is the concurrency number we need to complete the dataset in X days?
  • 12. What is the concurrency number we need? • Goal: X=7 Days • 2 billions URLs • Current concurrency 0.5 URL/s. 2 ∗ 10𝑒8 X ∗ 3600 ∗ 24 = 2 ∗ 10𝑒8 7 ∗ 3600 ∗ 24 ≈ 3307 𝑈𝑅𝐿𝑠/𝑠
  • 13. How to increase concurrency? • Optimize code performance • Increase hardware resource (CPU, RAM, Disk, Network…) aka Scale- up • Scale-out • Cloning to multiple processes (X-Axis) • Splitting by functions (Y-Axis) • Data partitioning (Z-Axis)
  • 14. Optimize code • Pros • Most effective if we found a bottleneck that can increase performance to 661,300% • Save infrastructure cost • Cons • Time consuming and uncertain
  • 15. Scale-up • Pros • Easy to apply • Cons • Take time to find out the suitable hardware configuration • Expensive and limited • Still need to optimize code and redesign to take advantage of hardware resources when cannot scale-up
  • 16. Scale-out by cloning (X-Axis) • Pros • Can use all hardware resources • Not limited by hardware • Cons • More complex than scale-up • Concurrency problems Node 1 Node 2 Node 3
  • 17. Scale-out by Splitting (Y-Axis) Review the workflow
  • 18. Scale-out by Splitting (Y-Axis) • Download and resize image using CPU • Face detection on GPU is faster Reference: https://sites.google.com/site/facedetectionongpu/
  • 19. Scale-out by Splitting (Y-Axis) X-axis: Cloning Download and Process Image Download and Process Image Download and Process Image Face Detection Face Detection Y-axis:Splitting
  • 20. Scale-out by Splitting (Y-Axis) • Pro • Reuse the advantage of hardware • Cons • Complex • Concurrency problems
  • 21. Scale-out by data-partitioning (Z-Axis) Data schema ID URL Done 1 https://abc.com/image1.jpg 1 2 https://abc.com/image2.jpg 0 3 https://abc.com/image3.jpg 0 4 https://abc.com/image4.jpg 0
  • 22. Scale-out by data-partitioning (Z-Axis) ID URL Done 1 https://abc.com/image1.jpg 0 3 https://abc.com/image2.jpg 0 ID URL Done 2 https://abc.com/image2.jpg 0 4 https://abc.com/image4.jpg 0 Key hashing
  • 23. Scale-out by data-partitioning (Z-Axis) ID URL Done 1 https://abc.com/image1.jpg 0 2 https://abc.com/image2.jpg 0 ID URL Done 3 https://abc.com/image2.jpg 0 4 https://abc.com/image4.jpg 0 Range base
  • 24. Scale-out by data-partitioning (Z-Axis) • Pros • Increase database performance • Reduce locking/non-locking • Cons • Increase maintenance and infrastructure cost • Hard for automation scaling
  • 25. Summary • Skip the code optimization approach • Skip the scale-up approach • Focus on scale-out approaches • We can increase the number of processes/machines to increase the concurrency number • We can split into 2 services: Downloader and Face Detections • We may need data partition to optimize database performance
  • 28. Race condition • Cause • Same URL process twice or more • Impact • Waste of resources • Data corruption • Faking concurrency
  • 29. Race condition: How to solve? • Distributed locks • Pros • N/a • Cons • Pessimistic locking impact performance • Hard to apply because we need to synchronize multiples nodes • Not good fault-tolerance • Data sharding • Pro • High performance because of share load (Physical shard) • Cons • Hard for scaling • Increase maintenance & infrastructure cost • Queue/Worker • Pros • Easy to implement • Easy to scale • Good fault-tolerance • Reusable communnication • Con • The load concentrates on the queue so it can become a bottleneck
  • 30. Race condition: root cause Race condition only causes between Downloaders => If we found a way to distribute the unique URL for each downloader it will solve the race condition for the whole system.
  • 31. Fault Tolerance • Faults • Network fault • Network interruption • IP Blocking • Service crash • Problems • Can data be lost? • Can the service restart and continue to work on remaining tasks?
  • 32. Fault Tolerance criteria Given When Then A service crashed It restarted No Rework (Continue on remaining items only) Downloader service is running It crashed All downloaded images should not be lost FaceDetector service is running It crashed All detected result should not be lost Downloader is downloading image Network error happens Retry Downloader retry to download an image again Network error is IP Locking Should rotate proxy to change the ip
  • 33. Service communication • How do the services communicate? • Do we need a load balancer?
  • 34. Service communication methods Type Method Pros Cons Synchronous HTTP • Familiar and Simple to use • Need a load balancer • Tight coupling • Lock thread wait for response RPC • High performance than HTTP Asynchronous Queue Messaging (One-One) • High performance • Failure isolation • Act as a load balancer • Reduced coupling • Extra maintenance cost • Queue may become bottleneck Publich/Subscribe (One-Many) • We only need the one-to-one comunication
  • 35. Summary • Find approach to distribute unique URL to downloaders. • The approach should pass the fault tolerance criteria • We can base on the communication methods table to choose the final solution
  • 37. Approach 1: Range based physical shard 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑈𝑅𝐿𝑠 𝑚 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑜𝑛𝑠 𝑖 ∈ [0 … 𝑚 − 1]: 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑘 = 𝑛 𝑚 ∶ number of urls in a partition 𝑠𝑡𝑎𝑟𝑡 𝑖 = 𝑘 ∗ 𝑖 𝑒𝑛𝑑 𝑖 = 𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘, 0 ≤ 𝑖 < 𝑚 − 1 𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘 + 𝑛 𝑚𝑜𝑑 𝑚 , 𝑖 = 𝑚 − 1
  • 38. Approach 1: Range based physical shard Solve Race Condition Faul tolerance Comunication Types Notes Solved + No rework + Need to download image again if crash when face detection + Partition can be abadoned HTTP/gPRC • Pros • Non locking on db level • Cons • Take time for preparation • Hard to scale out/adjust • Need load balancer
  • 39. Approach 2: Logical shard 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠 𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑 𝑘 = 𝑛 𝑚 𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1 𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1 𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘. ⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
  • 40. Approach 2: Logical shard Solve Race Condition Faul tolerance Comunication Types Notes Solved + No rework + Need to download image again if crash when face detection + Partition can be abadoned HTTP/gPRC • Pros • Non locking on db level • Simple implementation • Cons • Hard to scale out/adjust • High database throughput • Extra state to maintain: Total Urls, Current Url Id,…
  • 41. Approach 3: Queue/Worker x Logical Sharding 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠 𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑 𝑘 = 𝑛 𝑚 𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1 𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1 𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘. ⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
  • 42. Approach 3: Queue/Worker x Logical Sharding Solve Race Condition Faul tolerance Comunication Types Notes Solved + No Rework + Failure isolation + Node is replacable Messaging • Pros • Easy to scale • Easy fault-tolerance • Fail isolation • Asynchronous • Cons • Extra infrastructrure • High throughput on queue
  • 43. END
  • 44. Questions • How to measure and debug service? • What is deployment process?
  • 45. Q&A THANK YOU FOR YOUR ATTENTION

Notas do Editor

  1. Chào các bạn Chúng ta bắt đầu được chưa ạ? MÌnh xin giới thiệu, mình là Hồ Senior Software Engieer tại cty AXON Ở axon mình “Write Code, Save Lives” Về technical thì mình thích Thiết kế giải pháp và thiết kế code, ngoài ra mình còn thích nghiên cứu về video và image encoding. Ngoài code ra thì mình cũng là một người “Bình Thường” thích nghe nhạc xem phim, đọc manga và xem anime.
  2. Cho mình hỏi là ở đây có bạn nào từng phải suy nghỉ để tối ưu hóa code để chương trình chạy nhanh hơn chưa? Nguyên nhân gì khiến bạn phải optimize code?
  3. Hôm nay, mình muốn chia sẻ một vấn đề hết sức đơn giản trở nên khá là thú vị khi lượng data cần phải xử lí quá lớn mà mình từng gặp. Mục tiêu là giúp các bạn có thêm nhiều góc nhìn trong việc giải quyết các bài toán trong quá trình làm việc ;) Mình tin rằng lựa chọn công nghệ phù hợp sẻ giúp giải pháp của chúng ta tối ưu hơn, nhưng trong phần trình bày này mình sẻ không nghiên về việc lưa trọn công nghệ. Mình không nói rằng các giải pháp của mình đưa ra là giải pháp tốt nhất.
  4. Mình nhận được một yêu cầu là phải viết chương trình như thế này: Nhận vào một URL của một tấm hình, Download tắm hình đó, xử lý và tìm vị trí của các khuôn mặt trên tấm hình đó. => Để dễ hiểu hơn, mời các bạn xem diagram của chương trình.
  5. Tôi tin rằng, các bạn ở đây đều có thể viết được chương trình này. => và đây là code của chương trình
  6. Nhưng mà, sẻ có bạn nói là, phần Face Detections khá là phức tạp nếu không có kiến thức về machine learning. Đúng vậy, nhưng may mắn thay là Face Detection là vấn đề khá là phổ biến và bạn có thể dung thư viện có săn như: OpenCV hay Tensorflow…. => và đây là code của phần Face Detections
  7. Dùng code face detection có sẳn
  8. Và đây là kết quả
  9. Các bạn thầy bài toán ban đầu khá là đơn giản đúng không? Nhưng đó chỉ là bài toán với 1 đường dẫn. Vậy nếu chúng ta có 2 tỷ đường dẫn thì sao? 2 tỷ đường dẫn lớn như thế nào? Mất bao lâu mới có thể xử lý hết? Đó là những câu hỏi mà tôi đã đặc ra khi nhận được yêu cầu từ sếp là: Dùng chương trình ban đầu để xử lý hết 2 tỷ đường dẫn hình trong 1 tập dử liệu có sẳn. => Vậy giờ chúng ta cùng phân tích nha.
  10. 2 tỷ ảnh cần download Nếu chương trình lúc nãy cần 2 giây để hoàn thành Thì tôi cần 127 năm mới xử lý xong tập 2 tỷ ảnh. Nếu tôi quay lại và nói sếp là cần 127 năm mới xử lý xong tập dữ liệu mà sếp đưa. Các bạn nghĩ sẻ như thế nào? T = S/V 1 năm 365.25 ngày
  11. Bài toán là: Tìm số concurrency number chúng ta cần là bao nhiêu để hoàn thành tập 2 tý ảnh trong số ngày mà chúng ta mong muốn?
  12. Vì chúng ta cần phải tăng concurrency number từ 0.5 lên 3307 URL/s tức là tang khoảng 661300% V=S/T
  13. Theo kinh nhiệm của tôi thì chúng ta có 3 cách chính để tăng concurrency number. Có rất nhiều cách để giúp bạn tang concurrency number. Nhưng tổng quảt lại thì có thể có 3 cách Optimize code có thể giúp bạn tang concurrency Tăng phần cứng, ví dụ: Tăng tóc độ xử lý của CPU, Tăng tóc độ đọc ghi của ổ đĩa hoặc tang RAM… Hồi xưa tôi thường xúi khách hang tang IOPs của Database Sever để tăng tốc độ. Scale-out (nhân rộng), có 3 phương thức scale-out Nhân rộng ra nhiều nodes/processes Chia theo chức năng Chia nhóm dữ liệu đễ xử lý. Chúng ta sẻ phân tích từng cách một
  14. Cách tối ưu code có thể giúp chúng ta đạt được kết quả cực kì tốt nếu chúng ta tìm được thuật toán tối ưu hơn nhiều lần. Nhưng chúng ta cần phải bỏ nhiều thời gian và công sức để tìm được chỗ cần optimize
  15. Trong thời đại điện toán đám mây, bạn có thể có 1 con server cực mạnh chỉ cần vài click Nhưng mà nó rất đắt đỏ (AZURE Calculator) https://st.ht/M6rGb Và bạn sẻ đạt đến giới hạn sớm thôi Tôi đã nghĩ cách này thì không có gì thú vi
  16. Scale-out chiều ngang thì như mình đã nói, từ một node có sẳn bạn nhân ra thành nhiều node process dữ liệu cùng lúc. Các này thì bạn sẻ không có bị giới hạn phần cứng Khả năng chịu lỗi cao. Nhưng mà khó triễn khai. Ví dụ như race condition. Đối với bài toán hiện tại thì Race condition có thể xãy ra như thế nào?
  17. Scale-out theo chiều dọc là chia từ 1 node gồm tất cả các chức năng thành nhiều node mỗi node 1 chức năng. Kiến trúc microservices dựa trên cách chia này. Chúng ta cần xem xét lại các chức năng của chương trình để có hiểu được các chứng năng chính và lợi ích đạt được khi chia các chức năng ra. May mắn là chương trình của tôi có các chức năng rất đơn giãn, vì thế tôi nhanh chia chương trình này ra thành 2 phần. Có bạn nào có thể giúp tôi chia ra không?
  18. Bằng cách chia thành 2 nhóm, thì hệ thống của tôi sẻ tận dụng được thế mạnh của phần cứng.
  19. Đây là sơ đồ về các chia các nodes Tới đây thì các bạn cũng thaas là từ một vất đề đơn giản ban đầu chúng ta đang phải gặp các vấn đề phức tạp hơn. Nhưng mà chưa dừng lại ở đây. Có một số vấn đề cần giải quyết.
  20. => Chúng ta đã có hướng tiếp cận cơ bản là scale-out, nhưng còn một số vần đề cần phải quan tâm.
  21. Tôi gọi nó là ”High Concurrency Problems” bới vì mục tiêu là giải quyết các vấn đề cần phải làm để tăng concurrency number.
  22. Nhiều node cùng xử lý 1 URL. Kết quả lưu vào có thể bị xáo chộn. Vậy giải quyết race condition như thế nào? Làm sao để đạt được performance tốt nhất? Đây là vấn đề chúng ta cần phải quan tâm khi tìm giải pháp.
  23. Để chánh việc phân phát các URL chùng lập. Chúng ta có 3 giải pháp. Distributed locks Có thể hiện thực bằng Redis Set/Get NX option Data sharding Là 1 cách thức scale-out Các này cần phải tìm ra các để chia data cho hiệu quả. Tốn chi phí chuẩn bị cũng như cơ sở hạ tân Queue Dễ thực hiện Dễ scale Để tiết kiệm thời gian, mình xin bỏ qua phần Distributed Locks.
  24. Khả năng chịu lỗi là vấn đề tôi luôn nghĩ tới mỗi khi thiết kế hệ thống. Tính đến các trường hợp lỗi có thể xãy ra giúp bạn giãm các rũi ro mà hệ thống của bạn có thể gặp phải. Ví dụ: Mất data Làm lại tư đầu
  25. Làm thế nào để các service giao tiếp nhau? Có cần một load balancer hay không?
  26. Tiếp theo chúng ta sẻ tìm cách giải quyết vấn các vấn đề trên và tìm hướng tiếp cận.
  27. Đê tiếp cận giải phấp tôi sẻ phân tích từng vấn đề và ở phần trước và cách giải quyết chúng. Đầu tiên là Race Condition
  28. Cost for preparation and deployment is high Hard for scaling. Add a new node: We can’t assign a new node to an existing partition. So we need to re-shard data and start again with new partitions number. Remove a node (a node crash): The partition of this node can be abandoned.
  29. Cost for preparation and deployment is high Hard for scaling. Add a new node: We can’t assign a new node to an existing partition. So we need to re-shard data and start again with new partitions number. Remove a node (a node crash): The partition of this node can be abandoned.
  30. he load is concentrated on a single database so the database can become the bottleneck (we can solve this problem by using more hardware resources for database node) Hard for scaling: Remove a node (or node crash): We need to recover the node (or add a new one with the same id) if we don’t want the partition of the node is abandoned Add new node: restart all nodes in the system with new “Number Of Processes” value Need to maintain the number of remaining URLs (or the processed URL)
  31. the load is concentrated on a single database so the database can become the bottleneck (we can solve this problem by using more hardware resources for database node) Hard for scaling: Remove a node (or node crash): We need to recover the node (or add a new one with the same id) if we don’t want the partition of the node is abandoned Add new node: restart all nodes in the system with new “Number Of Processes” value Need to maintain the number of remaining URLs (or the processed URL)
  32. The Delegator/Load Balancer is a service that fetches URLs from URLs database and pushes the URLs into the Queue which will be consumed by Workers (download & process image) Queue: an abstraction, we can build this queue inside the Delegator/Load Balancer or using an open-source project like RabbitMQ/Kafka. Process (1..N): Worker that consumes the URL from Queue for processing. Then push another queue item for the Face Detection phase. With this approach, we have some advantages: We archive Fault-Tolerance and Scalability naturally When a worker fails, its URLs will be handled by other workers When we add a worker, it will consume outstanding URLs in the queue We can reuse the Queue system for Face Detection to optimize the reusable result of “Download & Process image”. This means we can save the image which was downloaded and processed in storage before putting the task item into Face Detection queue. The implementation of single vs multiple goroutines in a node are the same, so we can have flexibility.
  33. The Delegator/Load Balancer is a service that fetches URLs from URLs database and pushes the URLs into the Queue which will be consumed by Workers (download & process image) Queue: an abstraction, we can build this queue inside the Delegator/Load Balancer or using an open-source project like RabbitMQ/Kafka. Process (1..N): Worker that consumes the URL from Queue for processing. Then push another queue item for the Face Detection phase. With this approach, we have some advantages: We archive Fault-Tolerance and Scalability naturally When a worker fails, its URLs will be handled by other workers When we add a worker, it will consume outstanding URLs in the queue We can reuse the Queue system for Face Detection to optimize the reusable result of “Download & Process image”. This means we can save the image which was downloaded and processed in storage before putting the task item into Face Detection queue. The implementation of single vs multiple goroutines in a node are the same, so we can have flexibility.