SlideShare uma empresa Scribd logo
1 de 15
Stream Upload And
Asynchronous Job
Processing System
Lê Bá Minh – minhlb@vng.com.vn
Technical Manager – Zalo Team - VNG
Agenda
• 1/ Why we need an Asynchronous Job Processing
System?
• 2/ How it works ?
• 3/ Application
• 4/ Q &A
Parallel Stream Upload
• Data is separated in chunks
Facts
• Zalo Stream Upload
• Background continuous Voice Upload
• Background Image upload
• …
• Facts (now)
• 1M voices /day
• 800K images /day
• Peak: 500 Chunks/second
• Expect:
• Scalable (more than 5000 chunks/second)
• High performance
What we need
• Asynchronous Job processing System
Collect Data
Processing Data
Response
Collect Data
Processing DataResponse
Workers
What we need
• Asynchronous Job processing System
• Batch Job
• Big data job
• High Reliable: No job missed
• Distributed job processing workers
• High performance
• Persistent
• Load balancing, Failed over, Recoverable
Open-source solutions
• Share-memory workers
• All workers in one physical server
• No fail-over
• Un-scalable
• Gearman
• Good but not completely fit our requirement
• No Batch Job support
• Not full reliable (lost job)
• Not full load-balance
• Un-stable if more than 2000 jobs/second
Zalo Asyn Job Processing
System
Client
Client
Worker 1
Worker 2
Worker 3
Z Database
Short Connection
Long Connection
TCP
TCP
Worker
Manager
Job
Caching
Job
Manager
Persistent
Manager
Job
Clean-Up
Job Server
TCP
TCP
TCP
Implementation
• C/C++ for Job Server
• C/C++, Java for client and workers
• Binary Protocol
• Z-Database
Job State
Queuing
Processing
Failed Time Out
Finished
Deliver to Worker
Worker ACK Failed
Worker ACK Finished
No ACK
Started
Job Type
• Single Job
• Simple task
• Immediately deliver
• Batch Job
• Multiple tasks
• Deliver when received all tasks
Deployment
Job Server 1
Job Server 2
Synchronized
Business Server
Worker 1
Worker 2
Worker 3
Applications
• Using for all Asynchronous job processing in Zalo: voice
upload, image upload, feed processing…
• Benchmark (single server)
• 50K images/seconds (640x480)
• 50k voices/seconds (30s)
• Advantages
• Batch Jobs
• Never lost job
• Worker can restart or stop any time
• Fail-over, Load Balancing, Quick recover in failure
• Issue
• Job duplication (handled by worker)
Q&A
Stream upload and asynchronous job processing  in large scale systems

Mais conteúdo relacionado

Mais procurados (7)

When the connection fails
When the connection failsWhen the connection fails
When the connection fails
 
A Bird and the Web
A Bird and the WebA Bird and the Web
A Bird and the Web
 
Building rich interface components with SharePoint
Building rich interface components with SharePointBuilding rich interface components with SharePoint
Building rich interface components with SharePoint
 
Virtual Reference
Virtual ReferenceVirtual Reference
Virtual Reference
 
MobileClient
MobileClientMobileClient
MobileClient
 
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
 
Wa mw 2013
Wa mw 2013Wa mw 2013
Wa mw 2013
 

Semelhante a Stream upload and asynchronous job processing in large scale systems

Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
Priority enabled wps
Priority enabled wpsPriority enabled wps
Priority enabled wps
52North
 
Engage 2013 - Leveraging Ad Hoc Analysis
Engage 2013 - Leveraging Ad Hoc AnalysisEngage 2013 - Leveraging Ad Hoc Analysis
Engage 2013 - Leveraging Ad Hoc Analysis
Webtrends
 

Semelhante a Stream upload and asynchronous job processing in large scale systems (20)

Management Data Warehouse
Management Data WarehouseManagement Data Warehouse
Management Data Warehouse
 
Ahmed Jassat Oracle Customer Day Presentation at Monte Casino
Ahmed Jassat Oracle Customer Day Presentation at Monte CasinoAhmed Jassat Oracle Customer Day Presentation at Monte Casino
Ahmed Jassat Oracle Customer Day Presentation at Monte Casino
 
Moving from Snapshot to Snapshot
Moving from Snapshot to SnapshotMoving from Snapshot to Snapshot
Moving from Snapshot to Snapshot
 
Hands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx PolandHands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx Poland
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Real time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflowsReal time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflows
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Priority enabled wps
Priority enabled wpsPriority enabled wps
Priority enabled wps
 
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
In Transit Images Drives Online Photography Business Forward with DAM
In Transit Images Drives Online Photography Business Forward with DAMIn Transit Images Drives Online Photography Business Forward with DAM
In Transit Images Drives Online Photography Business Forward with DAM
 
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
 
EPUG UKI - Lancaster Analytics
EPUG UKI - Lancaster AnalyticsEPUG UKI - Lancaster Analytics
EPUG UKI - Lancaster Analytics
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yoda
 
Engage 2013 - Leveraging Ad Hoc Analysis
Engage 2013 - Leveraging Ad Hoc AnalysisEngage 2013 - Leveraging Ad Hoc Analysis
Engage 2013 - Leveraging Ad Hoc Analysis
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
 
Mentor Graphics Customer Presentation
Mentor Graphics Customer PresentationMentor Graphics Customer Presentation
Mentor Graphics Customer Presentation
 

Mais de Barcamp Saigon

Mais de Barcamp Saigon (14)

7 secrets to be a product manager
7 secrets to be a product manager7 secrets to be a product manager
7 secrets to be a product manager
 
Apolopa Vietnam Introduction and Recruitment
Apolopa Vietnam Introduction and RecruitmentApolopa Vietnam Introduction and Recruitment
Apolopa Vietnam Introduction and Recruitment
 
AWS: How to deploy and scale your web application in the cloud
AWS: How to deploy and scale your web application in the cloudAWS: How to deploy and scale your web application in the cloud
AWS: How to deploy and scale your web application in the cloud
 
Erlang web framework: Chicago boss
Erlang web framework: Chicago bossErlang web framework: Chicago boss
Erlang web framework: Chicago boss
 
Thiền định
Thiền địnhThiền định
Thiền định
 
High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...High Availability - How to get 99.99% service availabilty - Designing cluster...
High Availability - How to get 99.99% service availabilty - Designing cluster...
 
Nokia Asha Developer Opportunity
Nokia Asha Developer Opportunity Nokia Asha Developer Opportunity
Nokia Asha Developer Opportunity
 
Data Analytics for Mobile App Development
Data Analytics for Mobile App DevelopmentData Analytics for Mobile App Development
Data Analytics for Mobile App Development
 
Zero cost serverless Real time web app
Zero cost serverless Real time web appZero cost serverless Real time web app
Zero cost serverless Real time web app
 
4Smart - Control everything in your house
4Smart - Control everything in your house4Smart - Control everything in your house
4Smart - Control everything in your house
 
AngularJS Framework
AngularJS FrameworkAngularJS Framework
AngularJS Framework
 
How to transfer a big file
How to transfer a big file How to transfer a big file
How to transfer a big file
 
Những khó khăn của một startup "Sinh viên"
Những khó khăn của một startup "Sinh viên"Những khó khăn của một startup "Sinh viên"
Những khó khăn của một startup "Sinh viên"
 
Students gone Google
Students gone GoogleStudents gone Google
Students gone Google
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Stream upload and asynchronous job processing in large scale systems

  • 1. Stream Upload And Asynchronous Job Processing System Lê Bá Minh – minhlb@vng.com.vn Technical Manager – Zalo Team - VNG
  • 2. Agenda • 1/ Why we need an Asynchronous Job Processing System? • 2/ How it works ? • 3/ Application • 4/ Q &A
  • 3. Parallel Stream Upload • Data is separated in chunks
  • 4. Facts • Zalo Stream Upload • Background continuous Voice Upload • Background Image upload • … • Facts (now) • 1M voices /day • 800K images /day • Peak: 500 Chunks/second • Expect: • Scalable (more than 5000 chunks/second) • High performance
  • 5. What we need • Asynchronous Job processing System Collect Data Processing Data Response Collect Data Processing DataResponse Workers
  • 6. What we need • Asynchronous Job processing System • Batch Job • Big data job • High Reliable: No job missed • Distributed job processing workers • High performance • Persistent • Load balancing, Failed over, Recoverable
  • 7. Open-source solutions • Share-memory workers • All workers in one physical server • No fail-over • Un-scalable • Gearman • Good but not completely fit our requirement • No Batch Job support • Not full reliable (lost job) • Not full load-balance • Un-stable if more than 2000 jobs/second
  • 8. Zalo Asyn Job Processing System Client Client Worker 1 Worker 2 Worker 3 Z Database Short Connection Long Connection TCP TCP Worker Manager Job Caching Job Manager Persistent Manager Job Clean-Up Job Server TCP TCP TCP
  • 9. Implementation • C/C++ for Job Server • C/C++, Java for client and workers • Binary Protocol • Z-Database
  • 10. Job State Queuing Processing Failed Time Out Finished Deliver to Worker Worker ACK Failed Worker ACK Finished No ACK Started
  • 11. Job Type • Single Job • Simple task • Immediately deliver • Batch Job • Multiple tasks • Deliver when received all tasks
  • 12. Deployment Job Server 1 Job Server 2 Synchronized Business Server Worker 1 Worker 2 Worker 3
  • 13. Applications • Using for all Asynchronous job processing in Zalo: voice upload, image upload, feed processing… • Benchmark (single server) • 50K images/seconds (640x480) • 50k voices/seconds (30s) • Advantages • Batch Jobs • Never lost job • Worker can restart or stop any time • Fail-over, Load Balancing, Quick recover in failure • Issue • Job duplication (handled by worker)
  • 14. Q&A