Mais conteúdo relacionado
Semelhante a Gluster Fs (20)
Gluster Fs
- 1. Z RESEARCH Inc.
GlusterFS
Cluster File System
Z RESEARCH Inc.
Non-stop Storage
© 2007 Z RESEARCH
- 2. Z RESEARCH Inc.
GlusterFS Cluster File System
GlusterFS is a Cluster File System that aggregates
multiple storage bricks over InfiniBand RDMA into
one large parallel network file system.
+ +
=
N x Performance & Capacity
© 2007 Z RESEARCH
- 3. Z RESEARCH Inc.
Key Design Considerations
Capacity Scaling
Scalable beyond Peta Bytes
I/O Throughput Scaling
Pluggable Clustered I/O Schedulers
Advantage of RDMA transport
Reliability
Non Stop Storage
No Meta Data
Ease of Manageability
Self Heal
NFS like Disk Layout
Elegance in Design
Stackable Modules
Not tied to I/O Profiles or Hardware or OS
© 2007 Z RESEARCH
- 4. Z RESEARCH Inc.
GlusterFS Design
de
Compatibility with
Si
MS Windows
Storage Clients
nt
and other Unices
Cluster of Clients (Supercomputer, Data Center)
ie
Cl
GLFS Client GLFS Client
GigE
ClusteredClient
GLFS Vol Manager ClusteredClient
GLFS Vol Manager NFS / SAMBA
GlusterFS Client
Clustered Vol Manager
GlusterFS Client
Clustered Vol Manager over TCP/IP
Clustered I/O Scheduler
Clustered Vol Manager Clustered I/O Scheduler
Clustered Vol Manager
Clustered I/O Scheduler Clustered I/O Scheduler Storage Gateway
Storage Gateway
Storage Gateway
Clustered I/O Scheduler Clustered I/O Scheduler NFS/Samba
NFS/Samba
NFS/Samba
GLFS Client
GLFS Client
GLFS Client
RDMA RDMA
InfiniBand RDMA (or) TCP/IP
RDMA
e
Si d
GlusterFS Clustered Filesystem on x86-64 platform
er
rv
GLFSD
Se
Storage Brick 1 Storage Brick 2 Storage Brick 3 Storage Brick 4
GLFSD
Volume
GlusterFS GlusterFS GlusterFS GlusterFS
Volume
Volume Volume Volume Volume
© 2007 Z RESEARCH
- 5. Z RESEARCH Inc.
Stackable Design
n t
l ie VFS
C
S
e rF Read Ahead
u st
l I/O Cache
G
Unify
GlusterFS
GlusterFS Server
Client Client Client
TCPIP – GigE, 10GigE / InfiniBand - RDMA
GlusterFS Server GlusterFS Server GlusterFS Server
Server Server Server
POSIX POSIX POSIX
Ext3 Ext3 Ext3
Brick 1 Brick 2 Brick n
© 2007 Z RESEARCH
- 6. Z RESEARCH Inc.
Volume Layout
BRICK1 BRICK2 BRICK3
Unify Volume
work.ods benchmark.pdf mylogo.xcf
corporate.odp test.ogg driver.c
test.m4a
driver.c initcore.c ether.c
Mirror Volume
accounts-2007.db accounts-2007.db accounts-2007.db
backup.db.zip backup.db.zip backup.db.zip
accounts-2006.db accounts-2006.db accounts-2006.db
Stripe Volume
north-pole-map north-pole-map north-pole-map
dvd-1.iso dvd1.iso dvd1.iso
xen-image xen-image xen-image
© 2007 Z RESEARCH
- 7. Z RESEARCH Inc.
I/O Scheduling
➢Adaptive least usage (ALU)
➢NUFA
➢Random
➢Custom
➢Round robin
volume bricks
type cluster/unify
subvolumes ss1c ss2c ss3c ss4c
option scheduler alu
option alu.limits.min-free-disk 60GB
option alu.limits.max-open-files 10000
option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
option alu.disk-usage.entry-threshold 2GB # Units in KB, MB and GB are allowed
option alu.disk-usage.exit-threshold 60MB # Units in KB, MB and GB are allowed
option alu.open-files-usage.entry-threshold 1024
option alu.open-files-usage.exit-threshold 32
option alu.stat-refresh.interval 10sec
end-volume
© 2007 Z RESEARCH
- 8. Z RESEARCH Inc.
GlusterFS Benchmarks
Benchmark Environment
Method: Multiple 'dd' of varying blocks are read and written from multiple clients
simultaneously.
GlusterFS Brick Configuration (16 bricks)
Processor - Dual Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
RAM - 8GB FB-DIMM
Linux Kernel - 2.6.18-5+em64t+ofed111 (Debian)
Disk - SATA-II 500GB
HCA - Mellanox MHGS18-XT/S InfiniBand HCA
Client Configuration (64 clients)
RAM - 4GB DDR2 (533 Mhz)
Processor - Single Intel(R) Pentium(R) D CPU 3.40GHz
Linux Kernel - 2.6.18-5+em64t+ofed111 (Debian)
Disk - SATA-II 500GB
HCA - Mellanox MHGS18-XT/S InfiniBand HCA
Interconnect Switch: Voltaire port InfiniBand Switch (14U)
GlusterFS version 1.3.pre0-BENKI
© 2007 Z RESEARCH
- 9. Z RESEARCH Inc.
Aggregated Bandwidth
Aggregated I/O Benchmark on 16 bricks(servers) and 64 clients over IB Verbs transport.
Bps!
13G
Peak aggregated read throughput -13GBps.
After a particular threshold, write performance plateaus because of disk I/O bottleneck.
System memory greater than the peak load will ensure best possible performance.
ib-verbs transport driver is about 30% faster than ib-sdp transport driver.
© 2007 Z RESEARCH
- 10. Z RESEARCH Inc.
Scalability
Performance improves when the number of bricks are increased
Throughput increases with corresponding increased in servers from 1 to 16
© 2007 Z RESEARCH
- 11. Z RESEARCH Inc.
Hardware Platform Example
Storage Building Block
Intel SE5000PSL (Star Lake) baseboard
Intel EPSD
Uses 2 (or 1) dual core LV Intel® Xeon® McKay Creek
(Woodcrest) processors
Uses 1 Intel SRCSAS144e (Boiler Bay) Add-in Card Options
Infiniband
RAID card 10 GbE
Two 2 ½” boot HDDs or boot from DOM Fibre Channel
2U form factor, 12 hot-swap HDDs
3 ½” SATA 3.0Gbps (7.2k or 10k RPM)
SAS (10k or 15k RPM)
SES compliant enclosure management FW
External SAS JBOD expansion
© 2007 Z RESEARCH
- 12. Z RESEARCH Inc.
http://www.gluster.org
http://www.zresearch.com
Thank You!
Hitesh Chellani
Tel: 510-354-6801
hitesh@zresearch.com
© 2007 Z RESEARCH