SlideShare uma empresa Scribd logo
1 de 29
Coupang Confidential and Proprietary
이 문서는 쿠팡의 대외비이며 지적자산입니다
Journey to the Continuous and
Scalable Big Data Platform
Matthew (정재화), Coupang
Coupang Confidential and Proprietary
About me
02
• Software Development Manager of BigData & DW Platform team
• 8+ years Hadoop experience
• Apache Tajo Committer and PMC
• blrunner78@gmail.com
• Blog : https://blrunner.tistory.com
• The author of Hadoop tech hand book
Coupang Confidential and Proprietary
Agenda
03
1. On-Premise
2. Cloud 1.0
3. Cloud 2.0
4. Airflow as a Service
5. Zeppelin as a Service
Coupang Confidential and Proprietary
Motivation
04
The purpose of a business is to
create and keep a customer
- Peter Drucker -
Coupang Confidential and Proprietary
1. On-Premise
Coupang Confidential and Proprietary
Architecture
06
• Aggregations and Joins
• MapReduce
• Hive/Pig/Spark
• Oozie
Logs
• Client Logs
• Server Logs
• Adhoc Query
• HiveRDBMS
External Data
ETL Cluster Read-Only Cluster
Coupang Confidential and Proprietary
Team's Responsibility
07
• Architect, build and operate our data infrastructure and tools
• Create and maintain company-wide data pipeline
• Troubleshoot and resolve all issues as users arise
Coupang Confidential and Proprietary
Areas of Improvements
08
• Pros
• A wide variety of workloads
• Continuous increase in users
• Cons
• Multiple copies of Data
• Lack of Elasticity
• Operation overhead
Coupang Confidential and Proprietary
2. Cloud 1.0
Coupang Confidential and Proprietary
Architecture : Decouple compute and storage
010
Domain Cluster #N
Domain Cluster #2
Centralized Resources
Hive
Meta store
Cloud Storage
Batch Cluster
HiveServer2
Ad-hoc Cluster
HiveServer2
Domain Cluster #1
HiveServer2
- Batch Jobs
- High throughput
- fault tolerant, ETL
- Ad-hoc Queries
- Low latency
- Interactive Analysis
- In-memory
Coupang Confidential and Proprietary
Team's Responsibility
011
• Architect, build and operate our data infrastructure and tools
• Troubleshoot and resolve all issues as users arise
• Implement company-wide data pipelines
Coupang Confidential and Proprietary
Areas of Improvements
012
• Pros
• Allows Parsing, Enriching of Data for Custom Need
• Independent scale of CPU and storage capacity
• Cons
• Learning Curve for Cloud Infrastructure
• Operation overhead
• Users want latest tools and more features
Coupang Confidential and Proprietary
3. Cloud 2.0
Coupang Confidential and Proprietary
High Level Architecture
014
Storage
Data Processing Tools
Scheduler Tools
Security
Airflow
LDAP Authentication Apache Ranger ACL & Audit
Zeppelin
Monitoring
Computing Clusters
Cloud Storage
Data Platorm
Portal
Coupang Confidential and Proprietary
Various types of Computing Clusters
015
Centralized Resource
Hive
Meta Store
Cloud
Storage
Transient Cluster
- Batch Jobs
Persistent Cluster
- Interactive Queries
Workload Specific Cluster
Coupang Confidential and Proprietary
Team's Responsibility
016
• Architect, build and our data infrastructure and tools
• Create data APIs and data services
• Support users using SLA policies
• Maintaining security and data privacy
• Application Knowledge Support Artifacts, etc.
Coupang Confidential and Proprietary
Areas of Improvements
017
• Pros
• Onboard lots of users and variety of jobs
• Easier management and added features
• Cons
• Unintended infrastructure costs have increased
• A wide variety of client tools and Dev environments
• Various types of users
Coupang Confidential and Proprietary
Lessons & Learnings
018
• Distribute traffic instead of concentrating the one place
• Optimize all types of system resources in clusters
• Enforce the Lifecycle of Hadoop Cluster
• Monitor clusters and send alarms from the efficiency perspective
• Training Users Continuously and building the community culture
Coupang Confidential and Proprietary
4. Airflow as a Service
Coupang Confidential and Proprietary
Why we love Airflow?
020
• Define Workflows as code
• Makes Workflows more maintainable, versionable, and testable
• More flexible execution and workflow generation
• Lots of features
• Sensor
• Workflow Profiling
• SLA alert
• Rich Web Interface
• Scalable Worker Processes
• In-house Airflow
Coupang Confidential and Proprietary
Airflow : deployment process
021
Cloud Storage
Coupang Confidential and Proprietary
5. Zeppelin as a Service
Coupang Confidential and Proprietary
Why we love Zeppelin?
023
• Easy spark development in personal computer
• Customized Presto Interpreter
• Run presto query easily without complex JDBC configuration
• Export the heavy data file to local machine without exception
• Persistent Storage for Notebook
Coupang Confidential and Proprietary
Zeppelin Architecture
024
Coupang Confidential and Proprietary
Areas of Improvements
025
• Users
• Load all notebooks in the main page -> Too slow
• Big notebook can consume most resources -> Zeppelin Pending
• Platform team
• Spark interpreter doesn’t support YARN cluster mode
• Doesn’t support the life cycle for notebooks
• Difficult to upgrade and improve existing zeppelins gracefully
Coupang Confidential and Proprietary
Resolution
026
• Upgrade Zeppelin to 0.8.1
• Main Page Improvements
• Yarn Cluster Mode for Spark Interpreter
• Interpreter Lifecycle manager
• Interpreter Recovery
• Containerized Zeppelin on Kubernetes
Coupang Confidential and Proprietary
Summary
027
• Understand who is the immediate customer
• Focus on the truly important things
• Detect and solve problems immediately
• Leverage the identity of infrastructure
• Best Practice is not best for you
Coupang Confidential and Proprietary
SELECT question FROM you
https://boards.greenhouse.io/coupang/
Coupang Confidential and Proprietary
Thank you

Mais conteúdo relacionado

Mais procurados

Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsUnbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsIsabelle Van Campenhoudt
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsAndrew Morgan
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartoriwalk2talk srl
 
Oracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresOracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresMichel Schildmeijer
 
2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and Minecraft2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and MinecraftSriram Hariharan
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
Azure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQLAzure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQLAmit Banerjee
 
Overcoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScaleOvercoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScaleRightScale
 
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!Marco Obinu
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerRightScale
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLContinuent
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondJeremyOtt5
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsTaswar Bhatti
 
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...Payara
 
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013RightScale
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018Taswar Bhatti
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
 
Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaAmit Banerjee
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 

Mais procurados (20)

Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsUnbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
 
Oracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresOracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy features
 
2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and Minecraft2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and Minecraft
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Azure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQLAzure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQL
 
Overcoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScaleOvercoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScale
 
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong Codeaholics
 
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
 
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a Ninja
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 

Semelhante a [Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problemsAdrian Cole
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Will Du
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
 
Optimize with Open Source
Optimize with Open SourceOptimize with Open Source
Optimize with Open SourceEDB
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudRoman Weber
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile appsMugunth Kumar
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol PerformanceBIOVIA
 
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)Nordic APIs
 
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo
 

Semelhante a [Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정 (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problems
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Optimize with Open Source
Optimize with Open SourceOptimize with Open Source
Optimize with Open Source
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
Location-independent SharePoint
Location-independent SharePointLocation-independent SharePoint
Location-independent SharePoint
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile apps
 
Oracle Database Cloud Service
Oracle Database Cloud ServiceOracle Database Cloud Service
Oracle Database Cloud Service
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance
 
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
 
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
 
Apex ace update
Apex ace updateApex ace update
Apex ace update
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
Ow
OwOw
Ow
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

[Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정

  • 1. Coupang Confidential and Proprietary 이 문서는 쿠팡의 대외비이며 지적자산입니다 Journey to the Continuous and Scalable Big Data Platform Matthew (정재화), Coupang
  • 2. Coupang Confidential and Proprietary About me 02 • Software Development Manager of BigData & DW Platform team • 8+ years Hadoop experience • Apache Tajo Committer and PMC • blrunner78@gmail.com • Blog : https://blrunner.tistory.com • The author of Hadoop tech hand book
  • 3. Coupang Confidential and Proprietary Agenda 03 1. On-Premise 2. Cloud 1.0 3. Cloud 2.0 4. Airflow as a Service 5. Zeppelin as a Service
  • 4. Coupang Confidential and Proprietary Motivation 04 The purpose of a business is to create and keep a customer - Peter Drucker -
  • 5. Coupang Confidential and Proprietary 1. On-Premise
  • 6. Coupang Confidential and Proprietary Architecture 06 • Aggregations and Joins • MapReduce • Hive/Pig/Spark • Oozie Logs • Client Logs • Server Logs • Adhoc Query • HiveRDBMS External Data ETL Cluster Read-Only Cluster
  • 7. Coupang Confidential and Proprietary Team's Responsibility 07 • Architect, build and operate our data infrastructure and tools • Create and maintain company-wide data pipeline • Troubleshoot and resolve all issues as users arise
  • 8. Coupang Confidential and Proprietary Areas of Improvements 08 • Pros • A wide variety of workloads • Continuous increase in users • Cons • Multiple copies of Data • Lack of Elasticity • Operation overhead
  • 9. Coupang Confidential and Proprietary 2. Cloud 1.0
  • 10. Coupang Confidential and Proprietary Architecture : Decouple compute and storage 010 Domain Cluster #N Domain Cluster #2 Centralized Resources Hive Meta store Cloud Storage Batch Cluster HiveServer2 Ad-hoc Cluster HiveServer2 Domain Cluster #1 HiveServer2 - Batch Jobs - High throughput - fault tolerant, ETL - Ad-hoc Queries - Low latency - Interactive Analysis - In-memory
  • 11. Coupang Confidential and Proprietary Team's Responsibility 011 • Architect, build and operate our data infrastructure and tools • Troubleshoot and resolve all issues as users arise • Implement company-wide data pipelines
  • 12. Coupang Confidential and Proprietary Areas of Improvements 012 • Pros • Allows Parsing, Enriching of Data for Custom Need • Independent scale of CPU and storage capacity • Cons • Learning Curve for Cloud Infrastructure • Operation overhead • Users want latest tools and more features
  • 13. Coupang Confidential and Proprietary 3. Cloud 2.0
  • 14. Coupang Confidential and Proprietary High Level Architecture 014 Storage Data Processing Tools Scheduler Tools Security Airflow LDAP Authentication Apache Ranger ACL & Audit Zeppelin Monitoring Computing Clusters Cloud Storage Data Platorm Portal
  • 15. Coupang Confidential and Proprietary Various types of Computing Clusters 015 Centralized Resource Hive Meta Store Cloud Storage Transient Cluster - Batch Jobs Persistent Cluster - Interactive Queries Workload Specific Cluster
  • 16. Coupang Confidential and Proprietary Team's Responsibility 016 • Architect, build and our data infrastructure and tools • Create data APIs and data services • Support users using SLA policies • Maintaining security and data privacy • Application Knowledge Support Artifacts, etc.
  • 17. Coupang Confidential and Proprietary Areas of Improvements 017 • Pros • Onboard lots of users and variety of jobs • Easier management and added features • Cons • Unintended infrastructure costs have increased • A wide variety of client tools and Dev environments • Various types of users
  • 18. Coupang Confidential and Proprietary Lessons & Learnings 018 • Distribute traffic instead of concentrating the one place • Optimize all types of system resources in clusters • Enforce the Lifecycle of Hadoop Cluster • Monitor clusters and send alarms from the efficiency perspective • Training Users Continuously and building the community culture
  • 19. Coupang Confidential and Proprietary 4. Airflow as a Service
  • 20. Coupang Confidential and Proprietary Why we love Airflow? 020 • Define Workflows as code • Makes Workflows more maintainable, versionable, and testable • More flexible execution and workflow generation • Lots of features • Sensor • Workflow Profiling • SLA alert • Rich Web Interface • Scalable Worker Processes • In-house Airflow
  • 21. Coupang Confidential and Proprietary Airflow : deployment process 021 Cloud Storage
  • 22. Coupang Confidential and Proprietary 5. Zeppelin as a Service
  • 23. Coupang Confidential and Proprietary Why we love Zeppelin? 023 • Easy spark development in personal computer • Customized Presto Interpreter • Run presto query easily without complex JDBC configuration • Export the heavy data file to local machine without exception • Persistent Storage for Notebook
  • 24. Coupang Confidential and Proprietary Zeppelin Architecture 024
  • 25. Coupang Confidential and Proprietary Areas of Improvements 025 • Users • Load all notebooks in the main page -> Too slow • Big notebook can consume most resources -> Zeppelin Pending • Platform team • Spark interpreter doesn’t support YARN cluster mode • Doesn’t support the life cycle for notebooks • Difficult to upgrade and improve existing zeppelins gracefully
  • 26. Coupang Confidential and Proprietary Resolution 026 • Upgrade Zeppelin to 0.8.1 • Main Page Improvements • Yarn Cluster Mode for Spark Interpreter • Interpreter Lifecycle manager • Interpreter Recovery • Containerized Zeppelin on Kubernetes
  • 27. Coupang Confidential and Proprietary Summary 027 • Understand who is the immediate customer • Focus on the truly important things • Detect and solve problems immediately • Leverage the identity of infrastructure • Best Practice is not best for you
  • 28. Coupang Confidential and Proprietary SELECT question FROM you https://boards.greenhouse.io/coupang/
  • 29. Coupang Confidential and Proprietary Thank you