2. 魂▪創▪通
22
IV
III
Introduction to the GLORY-FSII
Future RoadmapV
Highlights of the GLORY-FS
GLORY-FS use in Korea
About ETRII
Other ProjectsAppendix
3. 魂▪創▪通
33
I. About ETRI
Electronics and Telecommunications Research Institute (www.etri.re.kr)
• Government funded Research Institute (since 1976)
• Personnel status: Total 1,894 (Researcher/Technical Staff: 1,737)
• 8 Division (Bigdata SW, SW-SOC, Broadcasting & Communication, …)
• Project Status: 547 Projects/547,618 million won
(http://www.etri.re.kr)
2011 Transparent display
2010 4G LTE-advanced technology
2007 3.6Gbps 4th generation mobile communication system
2006 Wireless Home Network (UWB)
2005 Terrestrial DMB service
2004 WiBro
1999 IMT 2000(CDMA2000) STP system
1996 ATM Exchanging Machine
1995 Commercialization of CDMA
1991 TDX-10, TiCOM II
1989 256M DRAM
1988 8bit Educational Computer
1982 Korea’s first semiconductor product “32K ROM chip”
Daejon (大田)
4. 魂▪創▪通
44
II. Introduction to the GLORY-FS
Multimedia metadata
management/retrieval
Multimedia Data distribution
Large-scale distributed data
management
Large-scale distributed parallel
processing
Large-scale File management
Large-scale cluster management
Low-power platform OS & HW
Retrieval: thousands of pages per
second(for hundreds of millions
of web pages)
Data parallel processing: 3,000
nodes
Storage capacity: Up to Petabytes
I/O performance: Up to 100Gb/sec
Cluster management: 10,000
nodes
Power saving: 20% reduction
Contents-based retrieval for
large-scale video data
10 ,000-node distributed data
processing middleware
Global cluster management
Global file management
Linux based 20% power saving
Internet service solution test
Provide global internet service solution specialized in UCC & IPTV
by developing open source-based GLORY platform
Global internet service solution S/W (GLORY-FS, GLORY-DB, GLORY-DP, GLORY-CL)
Multi-IDC testbed (256 nodes * 3 data centers)
Overview of the GLORY project (2007~2012)
This work was supported by the IT R&D program of MKE/KEIT.
[K1001703, Development of Cost Effective and Large Scale Global Internet Service Solution]
5. 魂▪創▪通
55
UCC Retrieval Service IPTV Service E-learning Service
Authoring Tagging Storing Retrieval Delivery
Dynamic Service
Management
Internet Services
Video Data Management Components
Internet
Services
Community
Components
Distributed job Scheduling
Job Partitioning & Merging
Data Distribution
Service Data Management
Data Access and Recovery
File Metadata
Management
Distributed Data
Storing & Replication
Remote
Backup & Archiving
Low-Cost Server Platform Node Manager Low-Power OS & H/W
Resource
Monitoring
Cluster
Orchestration
Automatic
Provisioning
II. Introduction to the GLORY-FS
SW Architecture of GLORY platform
6. 魂▪創▪通
66
GLORY-FS
Metadata Server GLORY-FS Data Server GLORY-FS Data Server
GLORY-FS
Metadata Server
GLORY-FS
Client Filesystem
GLORY-FS
Client Filesystem
Global Namespace
Storage TCO minimization by utilizing commodity servers as storage servers
High Performance by linearly scalable I/O performance
High Availability by efficient failure management
High Compatibility by supporting POSIX-FS standards
II. Introduction to the GLORY-FS
7. 魂▪創▪通
77
Internet Applications Service based on Videos (UCC/IPTV/e-Learning/etc.)
Large-scale and highly available File Services
Lustre
PVFS2
Panasas
Google FS
Mogile FS
GLORY-FS
(SW)
GlusterFS
Isilon
(Appliance)
II. Introduction to the GLORY-FS
HPC
Storage
Object
Store
Hadoop FS
Swift FS
Scale-out
NAS
High Performance
High Cost
Full POSIX Compliance
High Capacity
Low Cost
No POSIX Compliance
High Throughput
Low Cost
Near POSIX Compliance
8. 魂▪創▪通
88
Scalability in performance and capacity (up to 150 GB/s, Petabytes)
Availability with commodity HW (x86 server, SATA HDD)
High Compatibility with no kernel dependency (compatibility with any existing SW including web server)
Minimize management overhead
I/O and file sharing optimized to the internet services (web-disk, video, image content services)
III. Highlight of the GLORY-FS
POSIX API
Online storage
server expansion/
maintenance
*Storage reconfiguration
(Migration, Rebalance)
Multiple I/O data path
I/O for large, seq, read-
intensive workload
Lock-free
Cache Consistency Control
(NFS sharing semantic)
Sync, updatable
N-way data
replication
Parallel Replica
consistency
Asynchronous
MDS H/A
Web-based
management tool
*Highly scalable
metadata
cluster server
(3 billion scale)
*Synchronous
MDS H/A
*M+N striped
storage
(RAID Double
Parity)
Software Development
Kit
Scalable Capacity Scalable Performance Scalable Availability High Compatibility Simple Management
Virtual Metadata
Management
System
Windows Support
(*): Experimental
Over Petabyte
storage capacity
Up to 100Gb/s Data
input/output
performance
Up to 1 billion
files management
Disk Relocation
for fast rebalance
User-defined
Event handling
Unattended Recovery
Self-Diagnosis
on the SATA HDD
Hot-spot
Avoidance
(Self-tuning)
Private Replication
Network support
9. 魂▪創▪通
99
GLORY-FS Data Server
GLORY-FS Client File System
GLORY-FS Metadata Server
1g/10g Ethernet
Switch
Volume //
homehome shareshare
big.avi
Data
Metadata
Data
III. Highlight of the GLORY-FS
10. 魂▪創▪通
1010
III. Highlight of the GLORY-FS
#Client
Performance Scalability Test Metadata Clustering Effect Test
Data Server Cache Hit
Data Server Disk Hit
Client Cache Hit
Linearly scales-out
Linearly scales-out with multiple metadata servers
11. 魂▪創▪通
1111
III. Highlight of the GLORY-FS
Data Server Data Server Data Server Data Server
Each file is sliced into pieces, called CHUNK, and stored across multiple data servers
While CHUNKs are stored, REPLICA chunks are made synchronously to different data servers
When data server failure occurs, RECOVERS lost chunks from their replicas
All REPLICAs are used for file Read Access (Read load balance)
Arbitrary range of file are UPDATABLE at any time (Hadoop FS don’t allow file update operation)
Write performance with synchronous replication
Less than 5% replication overhead
C0
C1
C2
File
C0 C0C0 C0 C0
Memory Buffer
12. 魂▪創▪通
1212
III. Highlight of the GLORY-FS
REPLICAted file may become a Distributed RAID file when their access rate decreases
Generate PARITY chunks to different data servers from existing CHUNK
Remove REPLICA to save storage usage after PARITY coding completes
When data server failure occurs, RECOVERS lost chunks from their CHUNK & PARITY
Currently read-only access is allowed to RAID files . (for update access, revert to REPLICAed file)
Data Server Data Server Data Server Data Server
C0
Memory Buffer
C1 C2
P
R0 R1 R2C0 C1 C2
100GB files (1GB each) conversion time Read performance
13. 魂▪創▪通
1313
III. Highlight of the GLORY-FS
Filesystem Client
Data Servers
Service I/O Traffic Data Replication Traffic
Gigabit Switch 1/10 Gigabit Switch
Data Servers
New Data Server
(Empty)
Old Data Server
(Full of data)
Capacity Balanced
14. 魂▪創▪通
1414
III. Highlight of the GLORY-FS
Data Server Data Server Data Server Data Server Data Server
H H
File “H” is HOT
H HH HH HH HH H
File “H” is REPLICATED
For instant explosive read access such as hot movie
Hot file will be detected and replicated among more data servers automatically
to distribute load to other servers
Youtube Hot File Rank
16. 魂▪創▪通
1616
III. Highlight of the GLORY-FS
Built-in CPU, Disk, NIC, file access statistics monitoring
Accumulated statistics are visualized with MRTG-like chart
(daily, weekly, monthly, yearly statistics are also provided)
17. 魂▪創▪通
1717
Category References within Korea Year Capacity (TB)
Service
Company
KTH
('09)
Internet portal/UCC service '09~ 630
14,418
UCC service (Image/Video) '09.04~ 190
5GB mail attachment service '09.10 80
Mail service '09.12~ 300
N-Screen service '10.10 60
LG U+
('10)
LG U+ Internet Portal/UCC '10.11~ 8,700
Multimedia N-Screen service (U+ Box) '10.11~ 8,000
Dacom web-disk service '10.12 700
SKT ('10) Cloud storage service similar to the Amazon S3 (Ez-storage) '12.4~ 4,000
GS Neotech Storage for Content Delivery Network Service ‘12.1~ 200
Storage
Company
PSPACE
('07, '08, '09, '10)
InfiniStore (Appliance) '07~ 570
3D render farm storage '07~ 210
KBSn (IPTV, VOD) '10 72
MBN (IPTV, VOD) '10 48
KT (IPTV, VOD) '10 80
Neowiz (game portal) '10 60
BBMC (internet broadcasting) '10 100
MacroImpact
('09)
Sanique SFS '09~ 318
Storage for Content Delivery Network Service '09.10 318
Gluesys
('11)
Cluster NAS (Applicance) '11~
-
IV. GLORY-FS References within Korea
2007: NAS for High Capacity
2008~10: Storage intensive internet Services like web-mail, web-disk and image/video hosting service
2011~: Cloud Storage Service, Cloud CDN Service
18. 魂▪創▪通
1818
V. Future Roadmap
Project Name Period Status
Global Internet Service Solution (MKE)
–Global File System(GLORY-FS)
’07.3~’12.2
(5 year)
Closed
Supercomputing System for Genome Analysis (MKE)
– High Performance File System (MAHA-FS)
’11.3~’15.2
(5 year)
Open
Unified Storage Solution for Peta-scale (MKE)
’11.12~’13.11
(2 year)
Open
File System SW for Large-Scale Virtual Desktop Infrastructure (MKE)
‘12.5~’15.4
(3 year)
Open
GLORY-FS
Unified Storage
High Performance File System
VDI File System
07 08 09 10 11 12 13 14 15
Current Projects related with GLORY-FS
21. 魂▪創▪通
21
Unified Storage Solution for Peta-Scale
- Commercialization Project for GLORY-FS (World-best SW Program, MKE)
§ Low Cost/Large Scale à Low Cost/Large Scale/Higher Efficiency (Higher Storage Utilization)
§ iSCSI, NFS/CIFS support
§ Amazon S3 like API support (Restful API)
§ Sophisticated Management
Objectives
Appendix
22. 魂▪創▪通
22
File System SW for Large-scale virtual desktop infrastructure (VDI) service
- Commodity storage strategy for large-scale VDI service
§ Cost Wall: SAN storage à Commodity storage (50% cost saving)
§ Performance Wall: Low latency storage for VDI service (less than 20ms of VDI experience)
§ Scalability Wall: Up to 10,000 VDI user support
Objectives
Source Users
Boot
(IOPS)
Login
(IOPS)
Steady
(IOPS)
NetApp 2,500 46,000 29,000 10,000
EMC 500 63,500 14,500 10,500
Nimble 200 12,400 - 4,000
Estimated
(Worst)
1 127.0 112.0 32.0
10,000 1,270,000 1,120,000 320,000
Appendix
23. 魂▪創▪通
23
High Performance File System for Genome Analysis
- Overcome performance limitation of GLORY-FS for Petabyte-scale Genome Analysis
§ Hybrid Use of SSD + HDD (100 Million IOPS)
§ Totally Re-design of I/O subsystem for HPC
- Lower storage power consumption with the MAIS and MAID technology
§ MAIS: Massive Array of Idle Server (power on/off un-accessed storage server actively)
§ MAID: Massive Array of Idle Disk (power on/off un-accessed disks actively)
Objectives
Bandwidth
IOPS
Built-in Data Availability
(Replica, Distributed RAID)
Metadata OPS
Sharing Level
(POSIX Compliance, Locking)
TCO
Lustre MAHA-FS
CloudFS
Appendix