O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Bridging the Gap between Big Data
System Software Stack and
Applications: The Case of Distributed
Storage Service for Semi...
Agenda
•Introduction
•Background
•Goal
•Design
•Performance
•Summary
Intro
In the semiconductor manufacturing industry,
data volume increases exponentially during the
manufacturing process, w...
Background
•Heterogeneous storages
(such as FTP, SQL Server, HDFS, HBASE...)
•Data transfer between storages
(moving, copi...
Goal
•For easy administration between storages
•Compatible with underlying storages
(We can communicate with different sto...
Design
Design
•HDFS Interface
Design
•HttpServer (Usage Pattern)
http://<hds_host>/access?from=smb://user/a.data&to=ftp:///dir/a.data
Design
•Transparency by Mixing HDFS and HBase
Design
•Compliance with HDFS Interfaces, and thus Hadoop
Ecosystem
Design
•Load Balancing
Design
•Load Balancing
Experimental Setup
•Server Spec
CPU : Intel Xeon E7-8850 @2GHz (80 cores)
Mem : 512GB
Disk : 750GB * 16
•Virtual Machine S...
Experimental Setup
•Cluster Settings
Hadoop 2.6.0-cdh5.10.0
HBase 1.2.0-cdh5.10.0
ZooKeeper 3.4.5-cdh5.10.0
Yarn 2.6.0-cdh...
Performance Results (Transparency)
Hive and Spark are studied over our HDS
•Hive (r,s) : read small files with hive
(SELEC...
Performance Results (Transparency)
Performance Results (Load balancing)
Performance Results (Overheads)
Overhead by HDS, compared with the native HDFS.
The workloads are represented by (1,10000)...
Performance Results (Overheads)
Summary
•Solve small file problem in HDFS
•Transparency to different storages
•Compatible with hadoop-eco project
•Improve...
Thanks!
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage...
Próximos SlideShares
Carregando em…5
×

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries

Huan-Ping Su (蘇桓平), Yi-Sheng Lien (連奕盛) National Cheng Kung University
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/

  • Entre para ver os comentários

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries

  1. 1. Bridging the Gap between Big Data System Software Stack and Applications: The Case of Distributed Storage Service for Semiconductor Wafer Fabrication Foundries Huan-Ping Su (蘇桓平), Yi-Sheng Lien (連奕盛) National Cheng Kung University
  2. 2. Agenda •Introduction •Background •Goal •Design •Performance •Summary
  3. 3. Intro In the semiconductor manufacturing industry, data volume increases exponentially during the manufacturing process, which greatly helps in monitoring and improving production quality.
  4. 4. Background •Heterogeneous storages (such as FTP, SQL Server, HDFS, HBASE...) •Data transfer between storages (moving, coping, ETL...) •Learning curve for the sophisticated storage
  5. 5. Goal •For easy administration between storages •Compatible with underlying storages (We can communicate with different storages in one protocol ) •Intuitive operation
  6. 6. Design
  7. 7. Design •HDFS Interface
  8. 8. Design •HttpServer (Usage Pattern) http://<hds_host>/access?from=smb://user/a.data&to=ftp:///dir/a.data
  9. 9. Design •Transparency by Mixing HDFS and HBase
  10. 10. Design •Compliance with HDFS Interfaces, and thus Hadoop Ecosystem
  11. 11. Design •Load Balancing
  12. 12. Design •Load Balancing
  13. 13. Experimental Setup •Server Spec CPU : Intel Xeon E7-8850 @2GHz (80 cores) Mem : 512GB Disk : 750GB * 16 •Virtual Machine Spec (16 nodes) CPU : 80 * 2GHz Mem : 32GB Disk : 750GB
  14. 14. Experimental Setup •Cluster Settings Hadoop 2.6.0-cdh5.10.0 HBase 1.2.0-cdh5.10.0 ZooKeeper 3.4.5-cdh5.10.0 Yarn 2.6.0-cdh5.10.0 Hive 1.1.0-cdh5.10.0
  15. 15. Performance Results (Transparency) Hive and Spark are studied over our HDS •Hive (r,s) : read small files with hive (SELECT queries over 30000 files, each with 0.001 MBytes) •Hive (r,l) : read large files with hive (SELECT queries over 12 files, each with 16000 MBytes) •Hive (w) : write with hive (Hive consequently generates one 32 GBytes file)
  16. 16. Performance Results (Transparency)
  17. 17. Performance Results (Load balancing)
  18. 18. Performance Results (Overheads) Overhead by HDS, compared with the native HDFS. The workloads are represented by (1,10000), (100,100), (1000,10), and (10000,1), where (x,y) denotes the replication of y files of each in x MBytes from an FTP server to the HDS cluster.
  19. 19. Performance Results (Overheads)
  20. 20. Summary •Solve small file problem in HDFS •Transparency to different storages •Compatible with hadoop-eco project •Improve 1% yield rate of semiconductor manufacturing
  21. 21. Thanks!

×