SlideShare uma empresa Scribd logo
1 de 87
Baixar para ler offline
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 1 
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 2 
Agenda
!  About me
!  The new features
!  Additional news
!  Conclusion
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 3 
About
Me
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 4 
About me
!  Work at as a ...
!  Programmer
!  (Most recently)
! Also Interested in
!  Software Engineer
! Cloud Computing
! Big Data/Data Science
! Something new technologies
! Supporting GlusterFS/Red Hat Storage Introduction
with Red Hat K.K.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 5 
About me
! GlusterFS since 2007 (v1.3.7)
!  for my internet crawler at first.
!  Love Gluster because of the ...
!  Potential
!  Performance
!  Code
!  Community
!  Introduced or introducing it into ...
!  Printer and scanner solution (field trial)
!  Email services
!  File storage services (WebDAV, NFS)
!  Backup services
!  Shared storage platform
!  Medical service
!  A board member of the Gluster
Community
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 6 
My system
Seg.1: 192.168.79.0/24, GigE
Seg.2: 10.0.0.0/8, 100BaseT(USB Ethernet)
eins
zwei
drei
vier
fuenf
sechs
sieben
.1
.2
.3
.4
.5
.6
.7
.79.0.1
.79.0.2
.79.0.3
.79.0.4
.79.0.5
.79.0.6
•  Seven nodes, connected to two separated physical network
segments.
•  Seg.1 is for GlusterFS and Seg.2 is for other purposes (e.g. SSH)
•  Each node is setup with:
•  CentOS 6.5 x86_64
•  GlusterFS 3.5.0 (from source tarball)
.79.0.7
storage
pool
(mainly)
client
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 7 
My system
•  Intel NUC DN2820FYKH
•  Celeron2.4GHz dual-core, 1MB cache
•  8GB RAM
•  1TB Solid-state hard drive (w/ 8GB SLC SSD)
•  7.5W TDP
•  Why?
•  Separate several
loads (mainly of disk
accesses and
network traffics)
•  Enough cheap to
build (38k JPY/node)
•  Save money on
electricity (2 JPY/d/
node)
•  Suppress my room's
temperature
increasing
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 8 
My system
% sudo yum install -y openssh-clients make rpm-build bison flex automake libtool ncurses-
devel readline-devel openssl-devel libxml2-devel libibverbs-devel libacl-devel libattr-devel
python-devel python-setuptools lvm2-devel systemtap-sdt-devel libaio-devel xfsprogs glib2-
devel
% tar xzf glusterfs-3.5.0.tar.gz && cd glusterfs-3.5.0
% ./configure --prefix=/usr/local/glusterfs-3.5.0 --enable-bd-xlator --enable-fusermount --
enable-systemtap --enable-debug --enable-crypt-xlator --enable-qemu-block --enable-glupy
% make && sudo make install
# ln -sfn /usr/local/glusterfs-3.5.0 /usr/local/glusterfs
# cp -p /etc/init.d/glusterd /etc/init.d/glusterd-3.5.0
# cat <<EOF >> ~/.zshrc
export PATH=$PATH:/usr/local/glusterfs/sbin
export MANPATH=$MANPATH:/usr/local/glusterfs/share/man
EOF
# source ~/.zshrc
# echo "/usr/local/glusterfs/lib" > /etc/ld.so.conf.d/glusterfs.conf
# ldconfig
# sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
# chkconfig iptables off
# /etc/init.d/iptables stop
GlusterFS 3.5.0 was installed on each node
in following way:
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 9 
12 new features
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 10 
Overview
Features Open
Stack
Opera-
tion
Mana-
gement
Scala-
bility
Perfor-
mance
Stab-
ility
Sec-
urity
Dev
AFR_CLI_enhancements ✔️
Exposing Volume
Capabilities
✔️
File Snapshot ✔️
GFID Access ✔️
On-Wire Compression +
Decompression
✔️
Prevent NFS restart on
Volume change (Part 1)
✔️
Quota Scalability ✔️ ✔️
readdir_ahead ✔️
zerofill ✔️ ✔
Brick Failure Detection ✔️
Disk encryption ✔️
Geo-Replication
Enhancement
✔ ✔️
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 11 
OpenStack
Integration
Enhancements
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 12 
File Snapshot
# setfattr -n 
trusted.glusterfs.block-format 
-v qcow2:<file_size(in KB/MB/GB)> 
<file_name>
features/
qemu-block
xlator
# setfattr -n 
trusted.glusterfs.block-snapshot-create 
-v <snapshot_name1> <file_name>
# setfattr -n 
trusted.glusterfs.block-snapshot-create 
-v <snapshot_name2> <file_name>
# setfattr -n 
trusted.glusterfs.block-snapshot-goto 
-v <snapshot_name1> <file_name>
# setfattr -n 
trusted.glusterfs.block-snapshot-delete 
-v <snapshot_name2> <file_name>
a file
<file_name>
under
a mount
point
of a volume
fuse
hook
to
glusterfs
client
process
Restore from a snapshot
Take a snapshot
Take a snapshot
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 13 
File Snapshot
features/
qemu-block
xlator
a file
<file_name>
under
a mount
point
of a volume
as a block
storage
for
Cinder
fuse
hook
to
glusterfs
client
process
Restore from a snapshot
Take a snapshot
Take a snapshot
OpenStack
Cinder
BD
xlator
block-format
block-snapshot-create
block-snapshot-create
block-snapshot-goto
block-snapshot-delete
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 14 
zerofill
glusterfsdglusterfsd
AFR
libgfapi
User App
(e.g. Cinder)
0000
0000
0000
0000
0000
0000
posix_do_zerofill function
ZEROFILL fop
(glfs_zerofill function)
SCSI WRITESAME
command
BLKZEROOUT ioctl on Linux
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 15 
zerofill
Server offloaded zerofill vs repeated zeroing
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
http://www.gluster.org/community/documentation/index.php/Features/zerofill
1.23 times faster!
1.25 times faster!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 16 
Operation
Enhancements
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 17 
AFR_CLI_enhancements
Before 3.5.0
# gluster volume heal vol1
Heal operation on volume vol1 has been successful
# gluster volume heal vol1 info
...
# gluster volume heal vol1 info healed
...
# gluster volume heal vol1 info heal-failed
...
# gluster volume heal vol1 info split-brain
...
Too many operations to know all
the situations...
What I want to know is not the
file names...
How long the healing takes?
I don't know when the split-brain
detected but...
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 18 
AFR_CLI_enhancements
After 3.5.0
# gluster volume heal vol1 statistics
Gathering crawl statistics on volume vol1 has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick eins
Starting time of crawl: Mon May 19 10:13:02 2014
Ending time of crawl: Mon May 19 10:13:02 2014
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
...
Wow! I can get the statistic and historical
information at a glance!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 19 
Management
Enhancements
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 20 
# gluster volume info
Volume Name: bd0
Type: Distribute
Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: sieben:/tmp/bd0-meta
Exposing Volume Capabilities
I confuse which volume type the volume supports.
So I should manage it with other tools like Excel...
Before 3.5.0
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 21 
# gluster volume info
Volume Name: bd0
Type: Distribute
Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746
Status: Started
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: sieben:/tmp/bd0-meta
Brick1 VG: bd0-vg
Exposing Volume Capabilities
Probe the type of
volume
Provide list of
capabilities of a
xlator/volume.
Yeah! I can understand the volume type
and the detail!
After 3.5.0
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 22 
Review: How to use BD xlator
# dd if=/dev/zero of=/tmp/bd-loop6 bs=1M count=2048
# losetup /dev/loop6 /tmp/bd-loop6
# pvcreate /dev/loop6
# vgcreate bd0-vg /dev/loop6
Volume group "bd0-vg" successfully created
# lvcreate --thin bd0-vg -L 1000M
Logical volume "lvol0" created
Logical volume "lvol1" created
This VG becomes a
volume of GlusterFS
If you want to get the BDs
thin-provisioned ones, hit
the lvcreate command.
(And the names are fixed.)
Here created a VG with
a single 2GB of PV
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 23 
Review: How to use BD xlator
# lvdisplay bd0-vg
--- Logical volume ---
LV Name lvol1
VG Name bd0-vg
LV UUID PSAFkr-Vyr8-fkGU-kDnA-rWUF-fFFT-111Snr
LV Write Access read/write
LV Creation host, time sieben, 2014-05-18 14:38:21 +0900
LV Pool transaction ID 0
LV Pool metadata lvol1_tmeta
LV Pool data lvol1_tdata
LV Pool chunk size 64.00 KiB
LV Zero new blocks yes
LV Status available
# open 0
LV Size 1000.00 MiB
Allocated pool data 0.00%
Allocated metadata 0.88%
Current LE 250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:5
A logical volume pool for
thin-provisioning.
No need when using no
thin-provisioning.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 24 
Review: How to use BD xlator
# mkdir /tmp/bd0-meta
# gluster volume create bd0 sieben:/tmp/bd0-meta?bd0-vg force
volume create: bd0: success: please start the volume to access data
# gluster volume start bd0
volume start: bd0: success
# gluster volume info bd0
Volume Name: bd0
Type: Distribute
Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746
Status: Started
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: sieben:/tmp/bd0-meta
Brick1 VG: bd0-vg
# mkdir /mnt/glusterfs/bd0
# mount-t glusterfs sieben:/bd0 /mnt/glusterfs/bd0
Meta data store
for BD xlator
"?" (question
mark) is the
separator
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 25 
Review: How to use BD xlator
# touch /mnt/glusterfs/bd0/lv0
# setfattr -n "user.glusterfs.bd" -v "thin:1024MB" /mnt/glusterfs/bd0/lv0
# lvdisplay bd0-vg
--- Logical volume ---
LV Name lvol1
VG Name bd0-vg
LV UUID PSAFkr-Vyr8-fkGU-kDnA-rWUF-fFFT-111Snr
LV Write Access read/write
LV Creation host, time sieben.infinibridge.net, 2014-05-18 14:38:21 +0900
LV Pool transaction ID 1
LV Pool metadata lvol1_tmeta
LV Pool data lvol1_tdata
LV Pool chunk size 64.00 KiB
LV Zero new blocks yes
LV Status available
# open 0
LV Size 1000.00 MiB
Allocated pool data 0.00%
Allocated metadata 0.98%
Current LE 250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:5
Create a file that
is backed by an
LV
Or simply -v "lv" when
no need for thin-
provisioning
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 26 
Review: How to use BD xlator
--- Logical volume ---
LV Path /dev/bd0-vg/a9790eba-ffbf-4d9c-a674-e02c61ece935
LV Name a9790eba-ffbf-4d9c-a674-e02c61ece935
VG Name bd0-vg
LV UUID Z4HtWM-W0jk-YiK5-66ED-zOMw-YhFp-nrnRUU
LV Write Access read/write
LV Creation host, time sieben.infinibridge.net, 2014-05-18 14:47:31 +0900
LV Pool name lvol1
LV Status available
# open 0
LV Size 1.00 GiB
Mapped size 0.00%
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:9
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 27 
Review: How to use BD xlator
# for i in `seq 1 9`; do touch /mnt/glusterfs/bd0/lv$i; setfattr -n "user.glusterfs.bd" -v
"thin:1024MB" /mnt/glusterfs/bd0/lv$i; done
# lvdisplay -C bd0-vg
LV VG Attr LSize Pool Origin Data% Move
Log Cpy%Sync Convert
39b82644-f8ef-435d-b14e-d199a7e264fa bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
6002ddb2-28f1-463c-8666-f683fe2441ed bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
69993340-d691-4502-a9d5-375b8be0fb9e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
82af50a2-0124-41d8-a887-d8c30427a663 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
996969dd-3e32-491b-95d1-f279e6808d5b bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
a9790eba-ffbf-4d9c-a674-e02c61ece935 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
d6fd964a-67f8-4d48-96d1-343bed4ee792 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
ea58b011-3a41-4bf0-9fe6-3862e24b86f6 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
f7df48e5-09b1-4314-b729-1f38e5ceec2e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00
lvol1 bd0-vg twi-a-tz-- 1000.00m 0.00
Here we create other
nine LVs in the same
way.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 28 
Review: How to use BD xlator
# mkdir /mnt/bd0-lv/{39b82644-f8ef-435d-b14e-d199a7e264fa,6002ddb2-28f1-463c-8666-
f683fe2441ed,69993340-d691-4502-a9d5-375b8be0fb9e,82af50a2-0124-41d8-a887-
d8c30427a663,996969dd-3e32-491b-95d1-f279e6808d5b,a19ac2af-94df-4d01-b7c3-
bbfcbfe5d09e,a9790eba-ffbf-4d9c-a674-
e02c61ece935,d6fd964a-67f8-4d48-96d1-343bed4ee792,ea58b011-3a41-4bf0-9fe6-3862e24b86f6,f7df4
8e5-09b1-4314-b729-1f38e5ceec2e}
# ls /mnt/bd0-lv
39b82644-f8ef-435d-b14e-d199a7e264fa a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e
6002ddb2-28f1-463c-8666-f683fe2441ed a9790eba-ffbf-4d9c-a674-e02c61ece935
69993340-d691-4502-a9d5-375b8be0fb9e d6fd964a-67f8-4d48-96d1-343bed4ee792
82af50a2-0124-41d8-a887-d8c30427a663 ea58b011-3a41-4bf0-9fe6-3862e24b86f6
996969dd-3e32-491b-95d1-f279e6808d5b f7df48e5-09b1-4314-b729-1f38e5ceec2e
# for x in 39b82644-f8ef-435d-b14e-d199a7e264fa 6002ddb2-28f1-463c-8666-f683fe2441ed
69993340-d691-4502-a9d5-375b8be0fb9e 82af50a2-0124-41d8-a887-d8c30427a663
996969dd-3e32-491b-95d1-f279e6808d5b a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e a9790eba-
ffbf-4d9c-a674-e02c61ece935 d6fd964a-67f8-4d48-96d1-343bed4ee792
ea58b011-3a41-4bf0-9fe6-3862e24b86f6 f7df48e5-09b1-4314-b729-1f38e5ceec2e; do mkfs.xfs -i
size=512 /dev/bd0-vg/$x && mount -t xfs /dev/bd0-vg/$x /mnt/bd0-lv/$x; done
Creating mount
point for each LV.
Formatting each LV
in XFS and mount it.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 29 
Review: How to use BD xlator
# df -h | grep bd0-lv
/dev/dm-13 1014M 33M 982M 4% /mnt/bd0-lv/39b82644-f8ef-435d-b14e-d199a7e264fa
/dev/dm-16 1014M 33M 982M 4% /mnt/bd0-lv/6002ddb2-28f1-463c-8666-f683fe2441ed
/dev/dm-18 1014M 33M 982M 4% /mnt/bd0-lv/69993340-d691-4502-a9d5-375b8be0fb9e
/dev/dm-11 1014M 33M 982M 4% /mnt/bd0-lv/82af50a2-0124-41d8-a887-d8c30427a663
/dev/dm-12 1014M 33M 982M 4% /mnt/bd0-lv/996969dd-3e32-491b-95d1-f279e6808d5b
/dev/dm-17 1014M 33M 982M 4% /mnt/bd0-lv/a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e
/dev/dm-9 1014M 33M 982M 4% /mnt/bd0-lv/a9790eba-ffbf-4d9c-a674-e02c61ece935
/dev/dm-14 1014M 33M 982M 4% /mnt/bd0-lv/d6fd964a-67f8-4d48-96d1-343bed4ee792
/dev/dm-15 1014M 33M 982M 4% /mnt/bd0-lv/ea58b011-3a41-4bf0-9fe6-3862e24b86f6
/dev/dm-10 1014M 33M 982M 4% /mnt/bd0-lv/f7df48e5-09b1-4314-b729-1f38e5ceec2e
# mount | grep bd0-lv
/dev/mapper/bd0--vg-39b82644--f8ef--435d--b14e--d199a7e264fa on /mnt/bd0-lv/39b82644-f8ef-435d-b14e-d199a7e264fa type xfs (rw)
/dev/mapper/bd0--vg-6002ddb2--28f1--463c--8666--f683fe2441ed on /mnt/bd0-lv/6002ddb2-28f1-463c-8666-f683fe2441ed type xfs (rw)
/dev/mapper/bd0--vg-69993340--d691--4502--a9d5--375b8be0fb9e on /mnt/bd0-lv/69993340-d691-4502-a9d5-375b8be0fb9e type xfs (rw)
/dev/mapper/bd0--vg-82af50a2--0124--41d8--a887--d8c30427a663 on /mnt/bd0-lv/82af50a2-0124-41d8-a887-d8c30427a663 type xfs (rw)
/dev/mapper/bd0--vg-996969dd--3e32--491b--95d1--f279e6808d5b on /mnt/bd0-lv/996969dd-3e32-491b-95d1-f279e6808d5b type xfs (rw)
/dev/mapper/bd0--vg-a19ac2af--94df--4d01--b7c3--bbfcbfe5d09e on /mnt/bd0-lv/a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e type xfs (rw)
/dev/mapper/bd0--vg-a9790eba--ffbf--4d9c--a674--e02c61ece935 on /mnt/bd0-lv/a9790eba-ffbf-4d9c-a674-e02c61ece935 type xfs (rw)
/dev/mapper/bd0--vg-d6fd964a--67f8--4d48--96d1--343bed4ee792 on /mnt/bd0-lv/d6fd964a-67f8-4d48-96d1-343bed4ee792 type xfs (rw)
/dev/mapper/bd0--vg-ea58b011--3a41--4bf0--9fe6--3862e24b86f6 on /mnt/bd0-lv/ea58b011-3a41-4bf0-9fe6-3862e24b86f6 type xfs (rw)
/dev/mapper/bd0--vg-f7df48e5--09b1--4314--b729--1f38e5ceec2e on /mnt/bd0-lv/f7df48e5-09b1-4314-b729-1f38e5ceec2e type xfs (rw)
'Cause of thin-provisioning, in
total 10GB of block devices are
created on the 2GB of VG!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 30 
Review: How to use BD xlator
[sechs]# mount -t glusterfs localhost:/bd0 /mnt/glusterfs/bd0
[sechs]# mount -t xfs -o loop /mnt/glusterfs/bd0/lv0
[sechs]# df -h | grep bd0-lv
1014M 33M 982M 4% /mnt/bd0-lv/lv1
The block devices are shared
with GlusterFS as files.
raw block device
physical volume
volume group
LV LV LV
BD volume=
file file file
Convert them with lvm2
development library
=
Shared with
GlusterFS
Snapshot and
clone are
capable as LV
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 31 
Brick Failure Detection
Before 3.5.0
1. One of the
backend storage
failed!
2. R/W ops from
a client
glusterfsdglusterfsd
AFR
3. glusterfsd returned
"Input/output error" or
"Read-only filesystem"
directly.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 32 
Brick Failure Detection
After 3.5.0
1. One of the
backend storage
failed!
3. R/W ops from
a client
glusterfsdglusterfsd
AFR
4. The client gets no
error and completes
the operation.
2. glusterfsd outputs logs
and shutdowns itself.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 33 
Brick Failure Detection
# brick="/mnt/lv4/vol4"; gluster volume create vol4 eins:$brick zwei:$brick drei:$brick
vier:$brick fuenf:$brick sechs:$brick
# gluster volume start vol4
# gluster volume set vol4 storage.health-check-interval 10
# gluster volume info vol4
Volume Name: vol4
Type: Distribute
Volume ID: 706122a9-44fc-4d1d-8c3b-97482d98b95c
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv4/vol4
Brick2: zwei:/mnt/lv4/vol4
Brick3: drei:/mnt/lv4/vol4
Brick4: vier:/mnt/lv4/vol4
Brick5: fuenf:/mnt/lv4/vol4
Brick6: sechs:/mnt/lv4/vol4
Options Reconfigured:
storage.health-check-interval: 10
Setup for test
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 34 
Brick Failure Detection
[sechs]# dmsetup table
vg0-swift: 0 209715200 linear 8:7 838862848
vg0-cinder: 0 209715200 linear 8:7 419432448
vg0-lv4: 0 209715200 linear 8:7 1468008448
vg0-lv3: 0 209715200 linear 8:7 1258293248
vg0-lv2: 0 209715200 linear 8:7 1048578048
vg0-lv1: 0 209715200 linear 8:7 209717248
vg0-lv0: 0 209715200 linear 8:7 2048
vg0-glance: 0 209715200 linear 8:7 629147648
Setup for test (contd.)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 35 
Brick Failure Detection
[sechs]# echo 0 209715200 error > dmsetup-error-target
[sechs]# dmsetup load vg0-lv4 dmsetup-error-target
[sechs]# dmsetup resume vg0-lv4
[sechs]# dmsetup table
vg0-swift: 0 209715200 linear 8:7 838862848
vg0-cinder: 0 209715200 linear 8:7 419432448
vg0-lv4: 0 209715200 error
vg0-lv3: 0 209715200 linear 8:7 1258293248
vg0-lv2: 0 209715200 linear 8:7 1048578048
vg0-lv1: 0 209715200 linear 8:7 209717248
vg0-lv0: 0 209715200 linear 8:7 2048
vg0-glance: 0 209715200 linear 8:7 629147648
Brick failure test
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 36 
Brick Failure Detection
[2014-05-18 18:49:53.720594] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file
changed
[2014-05-18 18:50:04.238239] W [posix-helpers.c:1294:posix_health_check_thread_proc] 0-vol4-
posix: stat() on /mnt/lv4/vol4 returned: Input/output error
[2014-05-18 18:50:04.238328] M [posix-helpers.c:1314:posix_health_check_thread_proc] 0-vol4-
posix: health-check failed, going down
Message from syslogd@sechs at May 19 03:50:04 ...
glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c:
1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down
[2014-05-18 18:50:34.238551] M [posix-helpers.c:1319:posix_health_check_thread_proc] 0-vol4-
posix: still alive! -> SIGTERM
Message from syslogd@sechs at May 19 03:50:34 ...
glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c:
1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM
[2014-05-18 18:50:34.238910] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib64/libc.so.
6(clone+0x6d) [0x7f1144ebab7d] (-->/lib64/libpthread.so.0(+0x79d1) [0x7f114554d9d1] (-->/
usr/local/glusterfs-3.5.0/sbin/glusterfsd(glusterfs_sigwaiter+0xf0) [0x4085af]))) 0-:
received signum (15), shutting down
var/log/glusterfs/bricks/mnt-lv4-vol4.log on the failed node
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 37 
Brick Failure Detection
May 19 03:49:55 sechs kernel: XFS (dm-7): metadata I/O error: block 0x0
("xfs_buf_iodone_callbacks") error 5 buf count 4096
May 19 03:49:57 sechs kernel: XFS (dm-7): metadata I/O error: block 0x6400108
("xlog_iodone") error 5 buf count 4096
May 19 03:49:57 sechs kernel: XFS (dm-7): xfs_do_force_shutdown(0x2) called from line 1062
of file fs/xfs/xfs_log.c. Return address = 0xffffffffa04dd131
May 19 03:49:57 sechs kernel: XFS (dm-7): Log I/O Error Detected. Shutting down filesystem
May 19 03:49:57 sechs kernel: XFS (dm-7): Please umount the filesystem and rectify the
problem(s)
May 19 03:50:04 sechs glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c:
1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down
Message from syslogd@sechs at May 19 03:50:04 ...
glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c:
1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down
May 19 03:50:27 sechs kernel: XFS (dm-7): xfs_log_force: error 5 returned.
Message from syslogd@sechs at May 19 03:50:34 ...
glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c:
1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM
May 19 03:50:34 sechs glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c:
1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM
May 19 03:50:57 sechs kernel: XFS (dm-7): xfs_log_force: error 5 returned.
syslog on the failed node
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 38 
Brick Failure Detection
# gluster volume status vol4
Status of volume: vol4
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick eins:/mnt/lv4/vol4 49160 Y 2925
Brick zwei:/mnt/lv4/vol4 49159 Y 440
Brick drei:/mnt/lv4/vol4 49152 Y 32500
Brick vier:/mnt/lv4/vol4 49152 Y 32657
Brick fuenf:/mnt/lv4/vol4 49152 Y 24517
Brick sechs:/mnt/lv4/vol4 N/A N N/A
NFS Server on localhost 2049 Y 29535
NFS Server on zwei N/A N N/A
NFS Server on vier N/A N N/A
NFS Server on drei N/A N N/A
NFS Server on eins N/A N N/A
NFS Server on fuenf N/A N N/A
NFS Server on sechs N/A N N/A
Task Status of Volume vol4
------------------------------------------------------------------------------
There are no active volume tasks
gluster volume status
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 39 
Brick Failure Detection
# ps -ef | grep glusterfsd | grep -v grep | wc -l
0
processes on the failed node
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 40 
Brick Failure Detection
[sechs]# service glusterd restart
restart glusterd (and glusterfsd) on the failed node
[2014-05-18 18:58:17.197872] I [glusterfsd.c:1959:main] 0-/usr/local/glusterfs-3.5.0/sbin/
glusterfsd: Started running /usr/local/glusterfs-3.5.0/sbin/glusterfsd version 3.5git (/usr/
local/glusterfs-3.5.0/sbin/glusterfsd -s sechs --volfile-id vol4.sechs.mnt-lv4-vol4 -p /var/
lib/glusterd/vols/vol4/run/sechs-mnt-lv4-vol4.pid -S /var/run/
23afc72b5ceddccd28b405b1cdf5b4df.socket --brick-name /mnt/lv4/vol4 -l /usr/local/
glusterfs-3.5.0/var/log/glusterfs/bricks/mnt-lv4-vol4.log --xlator-option *-posix.glusterd-
uuid=0765d288-a59b-4ccf-90ae-c3332c83dbf4 --brick-port 49152 --xlator-option vol4-
server.listen-port=49152)
[2014-05-18 18:58:17.205310] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support
is NOT enabled
[2014-05-18 18:58:17.205486] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system
polling thread
[2014-05-18 18:58:17.205880] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT
enabled
[2014-05-18 18:58:17.205949] I [socket.c:3576:socket_init] 0-glusterfs: using system polling
thread
[2014-05-18 18:58:18.834910] I [graph.c:254:gf_add_cmdline_options] 0-vol4-server: adding
option 'listen-port' for volume 'vol4-server' with value '49152'
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 41 
Brick Failure Detection
[2014-05-18 18:58:18.834976] I [graph.c:254:gf_add_cmdline_options] 0-vol4-posix: adding
option 'glusterd-uuid' for volume 'vol4-posix' with value '0765d288-a59b-4ccf-90ae-
c3332c83dbf4'
[2014-05-18 18:58:18.837332] I [rpcsvc.c:2064:rpcsvc_set_outstanding_rpc_limit] 0-rpc-
service: Configured rpc.outstanding-rpc-limit with value 64
[2014-05-18 18:58:18.837510] W [options.c:848:xl_opt_validate] 0-vol4-server: option
'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with
correction
[2014-05-18 18:58:18.837572] I [socket.c:3561:socket_init] 0-tcp.vol4-server: SSL support is
NOT enabled
[2014-05-18 18:58:18.837601] I [socket.c:3576:socket_init] 0-tcp.vol4-server: using system
polling thread
[2014-05-18 18:58:18.838445] E [common-utils.c:93:mkdir_p] 0-: Failed due to reason Input/
output error
[2014-05-18 18:58:18.838505] I [mem-pool.c:539:mem_pool_destroy] 0-vol4-changelog: size=108
max=0 total=0
[2014-05-18 18:58:18.838533] E [xlator.c:403:xlator_init] 0-vol4-changelog: Initialization
of volume 'vol4-changelog' failed, review your volfile again
[2014-05-18 18:58:18.838561] E [graph.c:307:glusterfs_graph_init] 0-vol4-changelog:
initializing translator failed
[2014-05-18 18:58:18.838610] E [graph.c:502:glusterfs_graph_activate] 0-graph: init failed
restart glusterd (and glusterfsd) on the failed node (contd.)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 42 
Brick Failure Detection
[2014-05-18 18:58:18.839480] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/local/
glusterfs-3.5.0/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1b5) [0x7f2981c837d8] (-->/usr/
local/glusterfs-3.5.0/sbin/glusterfsd(mgmt_getspec_cbk+0x36a) [0x40cf77] (-->/usr/local/
glusterfs-3.5.0/sbin/glusterfsd(glusterfs_process_volfp+0x18a) [0x408bf2]))) 0-: received
signum (0), shutting down
restart glusterd (and glusterfsd) on the failed node (contd.)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 43 
Scalability
Enhancement
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 44 
Quota Scalability
Before 3.5.0
Directory Quota limitation
= a few hundreds per volume
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 45 
Quota Scalability
After 3.5.0
Directory Quota limitation
= 65536 per volume
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 46 
Performance
Enhancements
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 47 
On-Wire Compression + Decompression
FUSE client Storage pool
FUSE client Storage pool
Write ops
Read ops
3. Transport
2. Compression
1. open
and
write
4. Decompression
and
write to disk
1. open
and
read
4. Decompression 3. Transport
2. read and
Compression
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 48 
On-Wire Compression + Decompression
# gluster volume create vol-comp eins:/mnt/lv3/vol-comp
# gluster volume set vol-comp network.compression on
# gluster volume set vol-comp network.compression.compression-level 8
# gluster volume set vol-comp network.compression.min-size 50
# gluster volume set vol-comp performance.write-behind off
# gluster volume set vol-comp performance.strict-write-ordering on
# gluster volume set vol-comp performance.open-behind off
# gluster volume info vol-comp
Volume Name: vol-comp
Type: Distribute
Volume ID: 92b47734-2552-4168-b3c3-151093562e4f
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv3/vol-comp
Options Reconfigured:
network.compression.min-size: 50
network.compression.compression-level: 8
performance.open-behind: off
performance.write-behind: off
performance.strict-write-ordering: on
network.compression.mode: server
network.compression: on
Data is
compressed only
when its size
exceeds the
above value in
bytes.
-1: default
compression (= 8)
0: no compression
1: best speed
9: best
compression
Turn off the
performance
translators
to avoid
Input/output error
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 49 
On-Wire Compression + Decompression
# gluster volume start vol-comp
# mount -t glusterfs localhost:/vol-comp /mnt/glusterfs/vol-comp
# dd if=/dev/zero of=/mnt/glusterfs/vol-comp/1gb.dat bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 33.8606 s, 31.7 MB/s
# diff /mnt/glusterfs/vol-comp/1gb.dat /tmp/1gb.dat
#
•  CPU load on client becomes higher than the one without
network compression.
•  Tcpdump showed the 1GB of zero compressed into non-zero
one.
•  High-end CPU might show greater performance.
•  There are still issues and limitations
•  It cannot work with striped volumes.
•  For glusterfs versions <= 3.5, it cannot work with AFR.
117 MB/s when no
compression
Compression and
Decompression
executed correctly
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 50 
readdir_ahead
Before 3.5.0
volume
read-ahead
Sequential file access can be
fast, but sequential directory
access like "ls" cannot.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 51 
readdir_ahead
After 3.5.0
volume
Sequential reads of large
directories can complete
faster!
readdir-aheadread-ahead
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 52 
readdir_ahead
# gluster volume set vol0 readdir-ahead enable
volume set: success
# gluster volume info vol0
Volume Name: vol0
Type: Distribute
Volume ID: cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv0/vol0
Brick2: zwei:/mnt/lv0/vol0
Options Reconfigured:
performance.readdir-ahead: enable
disabled by default
How-to
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 53 
readdir_ahead
# brick="/mnt/lv4/vol4"; gluster volume create vol4 eins:$brick zwei:$brick drei:$brick
vier:$brick fuenf:$brick sechs:$brick
# gluster volume start vol4
# mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4
# mkdir /mnt/glusterfs/vol4/manyfiles
# for a in `seq 0 9`; do for b in `seq 0 9`; do for c in `seq 0 9`; for d in `seq 0 9`; do
for e in `seq 0 9`; do for f in `seq 0 9`; do for g in `seq 0 9`; do for h in `seq 0 9`; do
for i in `seq 0 9`; do file="/mnt/glusterfs/vol4/manyfiles/8kb${a}${b}${c}${d}${e}${f}${g}${h}
${i}.dat"; echo ${file}; dd if=/dev/zero of=${file} bs=1K count=8; if [ $? -ne 0 ]; then
break; fi; done; done; done; done; done; done; done; done
...
^C
# df -ki /mnt/glusterfs/vol4
Filesystem Inodes IUsed IFree IUse% Mounted on
localhost:vol4 314572800 3394646 311178154 2% /mnt/glusterfs/vol4
# umount /mnt/glusterfs/vol4
Setup for evaluation
3 million 8K files
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 54 
readdir_ahead
# mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4
# for i in `seq 0 2`; do time ls /mnt/glusterfs/vol4/manyfiles > /dev/null; done
26.24s user 18.70s system 6% cpu 12:03.05 total
26.58s user 12.10s system 5% cpu 11:45.92 total
26.53s user 21.61s system 5% cpu 14:14.75 total
# umount /mnt/glusterfs/vol4
Evaluation
# gluster volume stop vol4 && gluster volume start vol4
# gluster volume set vol4 readdir-ahead enable
# mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4
# for i in `seq 0 2`; do time ls /mnt/glusterfs/vol4/manyfiles > /dev/null; done
26.24s user 17.97s system 11% cpu 6:25.09 total
26.58s user 22.36s system 10% cpu 8:02.83 total
26.57s user 22.83s system 10% cpu 8:13.01 total
# gluster volume reset vol4
# umount /mnt/glusterfs/vol4 1.68 times faster!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 55 
Stability
Enhancements
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 56 
Prevent NFS restart on Volume change (Part 1)
Gluster NFS Graph
nfs/server
option nfs3.vol4.volume-id 706122a9-44fc-4d1d-8c3b-97482d98b95c
option rpc-auth.addr.vol4.allow *
option nfs3.vol-gfid-access.volume-id 73abf812-4fff-42bd-822b-3036b72f060d
option rpc-auth.addr.vol-gfid-access.allow *
option nfs3.vol2.volume-id d0517697-5372-44a1-960f-6db0d988f3b2
option rpc-auth.addr.vol2.allow *
option nfs3.vol-comp.volume-id 92b47734-2552-4168-b3c3-151093562e4f
option rpc-auth.addr.vol-comp.allow *
option nfs3.vol1.volume-id ba03d1e6-a520-4e7f-ac4c-2440a205e80e
option rpc-auth.addr.vol1.allow *
option nfs3.vol0.volume-id cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59
option rpc-auth.addr.vol0.allow *
option nfs.drc on
option nfs.nlm on
option nfs.dynamic-volumes on
vol0
debug/io-stats
vol0-write-behind
performance/write-behind
vol0-dht
cluster/distribute
vol0-client-0
protocol/client
vol0-client-1
protocol/client
vol1
debug/io-stats
vol1-write-behind
performance/write-behind
vol1-dht
cluster/distribute
vol1-client-0
protocol/client
vol1-client-1
protocol/client
vol2
debug/io-stats
vol2-write-behind
performance/write-behind
vol2-dht
cluster/distribute
vol2-client-0
protocol/client
vol2-client-1
protocol/client
Single nfs/server
exists on the top of
all the volumes
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 57 
Vol2
My presentation last year (2013)
!  NFS and Multi-tenancy
! 'nfs.rpc-auth-allow' for multi-tenancy
! some operations on a volume affect
IOs to other volumes
Vol1
Vol0
e.g.
gluster volume set ...
IO
IO
IO
Vol2
Vol1
Vol0
IO
IO
IO
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 58 
Prevent NFS restart on Volume change (Part 1)
!   "Some operations" on a volume
!  gluster volume {set|reset} <volumeName> nfs.rpc-auth-allow
!  gluster volume {start|stop} <volumeName>
!  gluster volume add-brick
!  gluster volume remove-brick <volumeName> <brick1> ... <brickn>
commit
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 59 
Prevent NFS restart on Volume change (Part 1)
!   Internal NFS options became unaffected by volume changes.
!  nfs.readdir-size
!  nfs.nlm
!  nfs.acl
!  nfs.mount-rmtab
!  nfs.drc
!  nfs.drc-size
!  nfs.read-size
!  nfs.write-size
!  nfs.readdir-size
!  nfs.export-dir
!  nfs.export-dirs
!  nfs.enable-ino32
!  nfs.export-volumes
!  nfs.addr-namelookup
!  nfs.outstanding-rpc-limit
!  nfs.mount-mtab
!  nfs.register-with-portmap
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 60 
Geo-Replication Enhancement
storage
pool
(a cluster)
gsyncd
Before 3.5.0
SPOF!
identify file
changes with
xattrs
directory crawl
with rsync
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 61 
Geo-Replication Enhancement
storage
pool
(a cluster)
gsyncd
for each
peer
After 3.5.0
identify file changes with
changelog in memory
gsyncd
for each
peer
gsyncd
for each
peer
gsyncd
for each
peer
gsyncd
for each
peer
gsyncd
for each
peer
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 62 
Geo-Replication Enhancement
# cat /var/lib/glusterd/vols/vol0/vol0.eins.mnt-lv0-vol0.vol
volume vol0-posix
type storage/posix
option volume-id cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59
option directory /mnt/lv0/vol0
end-volume
volume vol0-changelog
type features/changelog
option changelog-dir /mnt/lv0/vol0/.glusterfs/changelogs
option changelog-brick /mnt/lv0/vol0
subvolumes vol0-posix
end-volume
...
volume vol0-server
type protocol/server
option auth.addr./mnt/lv0/vol0.allow *
option auth.login.863ccc05-1ba2-47cc-8a15-240ad4e8c736.password c8d200d6-db0b-4f87-
be0f-664e08f4ceee
option auth.login./mnt/lv0/vol0.allow 863ccc05-1ba2-47cc-8a15-240ad4e8c736
option transport-type tcp
subvolumes /mnt/lv0/vol0
end-volume
Changelog
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 63 
Geo-Replication Enhancement
# ls -a /mnt/lv0/vol0/.glusterfs/changelogs
. ..
Changelog (contd.)
No use without
gsyncd???
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 64 
Security
Enhancement
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 65 
Disk encryption
FUSE client Storage pool
FUSE client Storage pool
Write ops
Read ops
1. open
and
write
2. Encryption
3. Transport
4. Write the
encrypted data
to disk
1. open
and
read
4. Decryption 3. Transport
2. read from
underlying disks
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 66 
Disk encryption
# gluster volume info
Volume Name: vol2
Type: Replicate
Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv2/vol2
Brick2: zwei:/mnt/lv2/vol2
# gluster volume set vol2 encryption on
volume set: success
# for x in quick-read write-behind open-behind; do gluster volume set vol2 performance.$x
off; done
# gluster volume set vol2 encryption.master-key /var/lib/glusterd/vols/vol2/
encryption.master-key
# openssl rand -hex 32 > /var/lib/glusterd/vols/vol2/encryption.master-key
# gluster volume set vol2 encryption.data-key-size 512
Setup
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 67 
Disk encryption
# gluster volume info
Volume Name: vol2
Type: Replicate
Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv2/vol2
Brick2: zwei:/mnt/lv2/vol2
Options Reconfigured:
encryption.data-key-size: 512
encryption.master-key: /var/lib/glusterd/vols/vol2/encryption.master-key
performance.open-behind: off
performance.write-behind: off
performance.quick-read: off
features.encryption: on
# mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/var/lib/glusterd/vols/vol2/
encryption.master-key localhost:/vol2 /mnt/glusterfs/vol2
Setup (contd.)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 68 
Disk encryption
# echo "test" > /mnt/glusterfs/vol2/test.txt
# cat /mnt/glusterfs/vol2/test.txt
test
[eins]# cat /mnt/lv2/vol2/test.txt
Zd??]K!q??tuv
[zwei]# cat /mnt/lv2/vol2/test.txt
Zd??]K!q??tuv
Encryption test
# dd if=/dev/zero of=/mnt/glusterfs/vol1/test.dat bs=1 count=32
# dd if=/dev/zero of=/mnt/glusterfs/vol2/test.dat bs=1 count=32
[eins]# dd if=/dev/zero of=/tmp/test.dat bs=1 count=32
[eins]# diff /tmp/test.dat /mnt/lv2/vol2/test.dat
Binary files /tmp/test.dat and /mnt/lv2/vol2/test.dat differ
[eins]# diff /tmp/test.dat /mnt/lv1/vol1/test.dat
#
# tcpdump -i eth0 -XX
Can see the
transported
zeroed data
fully encrypted.
ASCII files on the
bricks are
encrypted.
Binary files on the
bricks are also
encrypted.
Confirm that no use
of encryption never
encrypt the data, so
you can access the
raw data on several
bricks without
encryption.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 69 
Disk encryption
# dd if=/dev/zero of=/tmp/1gb.dat bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.61505 s, 297 MB/s
# diff3 /tmp/1gb.dat /mnt/glusterfs/vol1/1gb.dat /mnt/glusterfs/vol2/1gb.dat
#
Decryption test
Perfect!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 70 
Disk encryption
# dd if=/dev/zero of=/mnt/glusterfs/vol1/1gb.dat bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 18.4542 s, 58.2 MB/s
# dd if=/dev/zero of=/mnt/glusterfs/vol2/1gb.dat bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 263.633 s, 4.1 MB/s
Performance test
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 71 
Disk encryption
# mount -t nfs -o vers=3,hard,intr,nosuid localhost:/vol2 /mnt/nfs/vol2
mount.nfs: Connection timed out
Work with NFS? (No!)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 72 
Disk encryption
# cp /var/lib/glusterd/vols/vol2/encryption.master-key /tmp
# mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/tmp/encryption.master-key
localhost:/vol2 /mnt/glusterfs/vol-crypt
# diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt
#
Compromising with the same MK
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 73 
Disk encryption
# openssl rand -hex 32 > /tmp/encryption.master-key
# diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt
#
Compromising with a different MK keeping mounted
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 74 
Disk encryption
# umount /mnt/glusterfs/vol-crypt
# mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/tmp/encryption.master-key
localhost:/vol2 /mnt/glusterfs/vol-crypt
# diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt
diff: /mnt/glusterfs/vol-crypt/test.txt: Invalid argument
# ls -lh /mnt/glusterfs/vol-crypt
total 1.1G
-rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat
-rw-r--r-- 1 root root 32 May 18 22:57 test.dat
-rw-r--r-- 1 root root 5 May 18 22:55 test.txt
# cp /mnt/glusterfs/vol-crypt/test.txt ~/
cp: reading `/mnt/glusterfs/vol-crypt/test.txt': Invalid argument
# ls -l ~/test.txt
-rw-r--r-- 1 root root 0 May 19 00:38 /root/test.txt
Compromising with an invalid MK
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 75 
Disk encryption
# echo "test2" > /mnt/glusterfs/vol-crypt/test2.txt
# cat /mnt/glusterfs/vol-crypt/test2.txt
test2
# diff /mnt/glusterfs/vol-crypt/test2.txt /tmp/test2.txt
#
Compromising with an invalid MK (contd.)
# rm /mnt/glusterfs/vol-crypt/test.txt
rm: cannot remove `/mnt/glusterfs/vol-crypt/test.txt': Invalid argument
# ls -lh /mnt/glusterfs/vol-crypt
total 1.1G
-rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat
-rw-r--r-- 1 root root 6 May 19 00:39 test2.txt
-rw-r--r-- 1 root root 32 May 18 22:57 test.dat
-rw-r--r-- 1 root root 5 May 18 22:55 test.txt
# rm /mnt/glusterfs/vol-crypt/test2.txt
# ls -lh /mnt/glusterfs/vol-crypt
total 1.1G
-rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat
-rw-r--r-- 1 root root 32 May 18 22:57 test.dat
-rw-r--r-- 1 root root 5 May 18 22:55 test.txt
Enable to write
a file with an
invalid MK.
(Is it okay?)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 76 
Disk encryption
# mv /mnt/glusterfs/vol-crypt/test.txt /mnt/glusterfs/vol-crypt/test2.txt
mv: cannot move `/mnt/glusterfs/vol-crypt/test.txt' to a subdirectory of itself, `/mnt/
glusterfs/vol-crypt/test2.txt'
Compromising with an invalid MK (contd.)
# umount /mnt/glusterfs/vol-crypt
# mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/var/lib/glusterd/vols/vol2/
encryption.master-key localhost:/vol2 /mnt/glusterfs/vol-crypt
# ls -lh /mnt/glusterfs/vol-crypt
total 1.1G
-rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat
-rw-r--r-- 1 root root 6 May 19 00:44 test2.txt
-rw-r--r-- 1 root root 32 May 18 22:57 test.dat
-rw-r--r-- 1 root root 5 May 18 22:55 test.txt
# cat /mnt/glusterfs/vol-crypt/test2.txt
cat: /mnt/glusterfs/vol-crypt/test2.txt: Invalid argument
# rm /mnt/glusterfs/vol-crypt/test2.txt
rm: cannot remove `/mnt/glusterfs/vol-crypt/test2.txt': Invalid argument
The proper user
cannot handle
the file created
with the invalid
MK.
(Is it okay?)
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 77 
Disk encryption
# gluster volume info vol-crypt
Volume Name: vol2
Type: Replicate
Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv2/vol2
Brick2: zwei:/mnt/lv2/vol2
Options Reconfigured:
encryption.data-key-size: 512
encryption.master-key: /var/lib/glusterd/vols/vol2/encryption.master-key
performance.open-behind: off
performance.write-behind: off
performance.quick-read: off
features.encryption: on
# gluster volume reset vol-crypt
volume reset: success: reset volume successful
Compromising with volume reset
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 78 
Disk encryption
# gluster volume info vol-crypt
Volume Name: vol2
Type: Replicate
Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: eins:/mnt/lv2/vol2
Brick2: zwei:/mnt/lv2/vol2
Compromising with volume reset (contd.)
# cat /mnt/glusterfs/vol-crypt/test2.txt
U�%U?0��x^-�bO
# cat /mnt/glusterfs/vol-crypt/test.txt
Zd��]K!q�tuv
May be a way
of cracking?
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 79 
Enhancement
for Developers
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 80 
A Volume
GFID Access
A Volume/.gfid
62fe0d4f-dfe9-4a2d-b811-176c6d347a7c
52fd93ea-45ba-47f0-916a-bd3774239237
e5593949-79c3-463c-909b-8cc8ef014eb4
bf758c70-ff2b-4f0d-bfc9-860ece79c246
70aaacf9-1c09-44e2-97a2-9486adf10225
e5498fc4-7345-4f5f-af59-81acff1fd083
f6a608ed-0c68-4d1a-a4d7-fb375ba8fd63
d420cbb3-c1e8-47d3-b317-0c8afbc7a8c4
a76dd563-e878-45a0-ac48-59084d86bd0c
f9dbc760-c8f3-41e6-8d24-68b24c4c577b
5c0374a8-18fe-4dd6-89e7-f6551111d980
You can deal
with each file by
GFID
Single
namespace, just
under the mount
point
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 81 
GFID Access
# brick="/mnt/lv3/vol-gfid-access";gluster volume create vol-gfid-access eins:$brick zwei:
$brick
# gluster volume start vol-gfid-access
# mkdir /mnt/glusterfs/vol-gfid-access
# mount.glusterfs -o aux-gfid-mount localhost:/vol-gfid-access /mnt/glusterfs/vol-gfid-
access
# for i in `seq 0 9`; do dd if=/dev/zero of=/mnt/glusterfs/vol-gfid-access/$i.dat bs=1M
count=1; done
# ls -a /mnt/glusterfs/vol-gfid-access/.gfid
ls: cannot open directory /mnt/glusterfs/vol-gfid-access/.gfid: Stale file handle
# ls -a '/mnt/glusterfs/vol-gfid-access/.gfid/0svu9Cc1wVRLOBiu5NqF3ncw=='
ls: cannot access /mnt/glusterfs/vol-gfid-access/.gfid/0svu9Cc1wVRLOBiu5NqF3ncw==: No such
file or directory
# ls -ld /mnt/glusterfs/vol-gfid-access/.gfid/
drwxr-xr-x 3 root root 166 May 19 03:03 /mnt/glusterfs/vol-gfid-access/.gfid/
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 82 
GFID Access
# stat /mnt/glusterfs/vol-gfid-access/.gfid/
File: `/mnt/glusterfs/vol-gfid-access/.gfid/'
Size: 166 Blocks: 0 IO Block: 131072 directory
Device: 16h/22d Inode: 13 Links: 3
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-05-19 03:03:14.146605880 +0900
Modify: 2014-05-19 03:03:04.968605874 +0900
Change: 2014-05-19 03:03:04.968605874 +0900
# strace ls -a /mnt/glusterfs/vol-gfid-access/.gfid
...
stat("/mnt/glusterfs/vol-gfid-access/.gfid", {st_mode=S_IFDIR|0755, st_size=166, ...}) = 0
open("/mnt/glusterfs/vol-gfid-access/.gfid", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1
ESTALE (Stale file handle)
...
•  How can I let it work well?
•  If it becomes to work fine, applications using GlusterFS can
manage their data in a single namespace.
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 83 
Additional
news
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 84 
Added files
!   cli/src/cli-quoted-client
! contrib/qemu
!   error-codes.json
!   extras
!  geo-rep
!  glusterfs-georep-logrotate
!  gluster-rsyslog-*.conf
!  hook-scripts/add-brick
!  logger.conf.example
!  post-upgrade-script-for-quota.sh
!  pre-upgrade-script-for-quota.sh
!   geo-replication
! gf-error-codes.h.template
! libgfchangelog.pc.in
! libglusterfs/src
!  client_t
!  glusterfs-acl
!  timespec
! rpc/rpc-lib/src/rpc-drc
!   run-tests.sh
!   tests
! xlators
!  cluster
!  dht/src
!  dht-shared.c
!  encryption
!  crypt
!  features
!  changelog
!  compress
!  gfid-access
! glupy
! qemu-block
!  quota
!  quota-enforcer-client.c
!  quoted-aggregator
!  quoted-helpers
!  performance
!  readdir-ahead
! playground
!  storage
!  bd (replacement of
bd_map)
related qemu
codes
glupy
has merged!
a lot of test
codes!
template for
xlator
development
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 85 
Conclusion
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 86 
Conclusion
!  Update for "everyone"
! 12 features, 8 categories.
!  Contribution by HekaFS
! Disk encryption has been one of my dream since 2.0.2.
!  Voice of users
! Brick Failure Detection
! Prevent NFS restart on Volume change
These are just the great community's power!
Use the latest version, and join us!
Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 87 
To contact us, e-mail here -> storage-contact@nttpc.co.jp

Mais conteúdo relacionado

Mais procurados

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDViswesuwara Nathan
 
GlusterFS As an Object Storage
GlusterFS As an Object StorageGlusterFS As an Object Storage
GlusterFS As an Object StorageKeisuke Takahashi
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateDanielle Womboldt
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Samsung Open Source Group
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsNeependra Khare
 
Linux network stack
Linux network stackLinux network stack
Linux network stackTakuya ASADA
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disksiammutex
 

Mais procurados (20)

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBD
 
GlusterFS As an Object Storage
GlusterFS As an Object StorageGlusterFS As an Object Storage
GlusterFS As an Object Storage
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
Linux network stack
Linux network stackLinux network stack
Linux network stack
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Bluestore
BluestoreBluestore
Bluestore
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disks
 

Semelhante a Trying and evaluating the new features of GlusterFS 3.5

Chicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyChicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyMediafly
 
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per SecondAmazon Web Services
 
Explorando Go em Ambiente Embarcado
Explorando Go em Ambiente EmbarcadoExplorando Go em Ambiente Embarcado
Explorando Go em Ambiente EmbarcadoAlvaro Viebrantz
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rulesFreddy Buenaño
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep diveNaoto MATSUMOTO
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactAlessandro Selli
 
Docker 활용법: dumpdocker
Docker 활용법: dumpdockerDocker 활용법: dumpdocker
Docker 활용법: dumpdockerJaehwa Park
 
1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBSJim Plush
 
Pentesting111111 Cheat Sheet_OSCP_2023.pdf
Pentesting111111 Cheat Sheet_OSCP_2023.pdfPentesting111111 Cheat Sheet_OSCP_2023.pdf
Pentesting111111 Cheat Sheet_OSCP_2023.pdffaker1842002
 
Building a Gateway Server
Building a Gateway ServerBuilding a Gateway Server
Building a Gateway ServerDashamir Hoxha
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Red Hat Developers
 
Parrot Drones Hijacking
Parrot Drones HijackingParrot Drones Hijacking
Parrot Drones HijackingPriyanka Aash
 
Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012Joe Arnold
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetesWilliam Stewart
 

Semelhante a Trying and evaluating the new features of GlusterFS 3.5 (20)

Chicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyChicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - Mediafly
 
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second
 
Explorando Go em Ambiente Embarcado
Explorando Go em Ambiente EmbarcadoExplorando Go em Ambiente Embarcado
Explorando Go em Ambiente Embarcado
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rules
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep dive
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
 
Docker 활용법: dumpdocker
Docker 활용법: dumpdockerDocker 활용법: dumpdocker
Docker 활용법: dumpdocker
 
1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS
 
Dev ops
Dev opsDev ops
Dev ops
 
Pentesting111111 Cheat Sheet_OSCP_2023.pdf
Pentesting111111 Cheat Sheet_OSCP_2023.pdfPentesting111111 Cheat Sheet_OSCP_2023.pdf
Pentesting111111 Cheat Sheet_OSCP_2023.pdf
 
Building a Gateway Server
Building a Gateway ServerBuilding a Gateway Server
Building a Gateway Server
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!
 
Parrot Drones Hijacking
Parrot Drones HijackingParrot Drones Hijacking
Parrot Drones Hijacking
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
App container rkt
App container rktApp container rkt
App container rkt
 
Online spanish meetup #2
Online spanish meetup #2Online spanish meetup #2
Online spanish meetup #2
 

Mais de Keisuke Takahashi

Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)
Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)
Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)Keisuke Takahashi
 
パーフェクト"Elixir情報収集"
パーフェクト"Elixir情報収集"パーフェクト"Elixir情報収集"
パーフェクト"Elixir情報収集"Keisuke Takahashi
 
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会GlusterFS Updates (and more) in 第六回クラウドストレージ研究会
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会Keisuke Takahashi
 
Big Data入門に見せかけたFluentd入門
Big Data入門に見せかけたFluentd入門Big Data入門に見せかけたFluentd入門
Big Data入門に見せかけたFluentd入門Keisuke Takahashi
 
Creating a shared storage service with GlusterFS
Creating a shared storage service with GlusterFSCreating a shared storage service with GlusterFS
Creating a shared storage service with GlusterFSKeisuke Takahashi
 
GlusterFSとInfiniBandの小話
GlusterFSとInfiniBandの小話GlusterFSとInfiniBandの小話
GlusterFSとInfiniBandの小話Keisuke Takahashi
 
GlusterFS 技術と動向 2of2
GlusterFS 技術と動向 2of2GlusterFS 技術と動向 2of2
GlusterFS 技術と動向 2of2Keisuke Takahashi
 
GlusterFS 技術と動向 1of2
GlusterFS 技術と動向 1of2GlusterFS 技術と動向 1of2
GlusterFS 技術と動向 1of2Keisuke Takahashi
 
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)Keisuke Takahashi
 
GlusterFS モジュール超概論
GlusterFS モジュール超概論GlusterFS モジュール超概論
GlusterFS モジュール超概論Keisuke Takahashi
 
GlusterFS座談会テクニカルセッション
GlusterFS座談会テクニカルセッションGlusterFS座談会テクニカルセッション
GlusterFS座談会テクニカルセッションKeisuke Takahashi
 

Mais de Keisuke Takahashi (13)

Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)
Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)
Azure Database for PostgreSQL 入門 (PostgreSQL Conference Japan 2021)
 
パーフェクト"Elixir情報収集"
パーフェクト"Elixir情報収集"パーフェクト"Elixir情報収集"
パーフェクト"Elixir情報収集"
 
GlusterFS Masakari Talks
GlusterFS Masakari TalksGlusterFS Masakari Talks
GlusterFS Masakari Talks
 
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会GlusterFS Updates (and more) in 第六回クラウドストレージ研究会
GlusterFS Updates (and more) in 第六回クラウドストレージ研究会
 
Big Data入門に見せかけたFluentd入門
Big Data入門に見せかけたFluentd入門Big Data入門に見せかけたFluentd入門
Big Data入門に見せかけたFluentd入門
 
Gluster in Japan 2012-2013
Gluster in Japan 2012-2013Gluster in Japan 2012-2013
Gluster in Japan 2012-2013
 
Creating a shared storage service with GlusterFS
Creating a shared storage service with GlusterFSCreating a shared storage service with GlusterFS
Creating a shared storage service with GlusterFS
 
GlusterFSとInfiniBandの小話
GlusterFSとInfiniBandの小話GlusterFSとInfiniBandの小話
GlusterFSとInfiniBandの小話
 
GlusterFS 技術と動向 2of2
GlusterFS 技術と動向 2of2GlusterFS 技術と動向 2of2
GlusterFS 技術と動向 2of2
 
GlusterFS 技術と動向 1of2
GlusterFS 技術と動向 1of2GlusterFS 技術と動向 1of2
GlusterFS 技術と動向 1of2
 
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)
最新技術動向 GlusterFS (仮想化DAY, Internet Week 2011)
 
GlusterFS モジュール超概論
GlusterFS モジュール超概論GlusterFS モジュール超概論
GlusterFS モジュール超概論
 
GlusterFS座談会テクニカルセッション
GlusterFS座談会テクニカルセッションGlusterFS座談会テクニカルセッション
GlusterFS座談会テクニカルセッション
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Trying and evaluating the new features of GlusterFS 3.5

  • 1. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 1 
  • 2. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 2  Agenda !  About me !  The new features !  Additional news !  Conclusion
  • 3. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 3  About Me
  • 4. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 4  About me !  Work at as a ... !  Programmer !  (Most recently) ! Also Interested in !  Software Engineer ! Cloud Computing ! Big Data/Data Science ! Something new technologies ! Supporting GlusterFS/Red Hat Storage Introduction with Red Hat K.K.
  • 5. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 5  About me ! GlusterFS since 2007 (v1.3.7) !  for my internet crawler at first. !  Love Gluster because of the ... !  Potential !  Performance !  Code !  Community !  Introduced or introducing it into ... !  Printer and scanner solution (field trial) !  Email services !  File storage services (WebDAV, NFS) !  Backup services !  Shared storage platform !  Medical service !  A board member of the Gluster Community
  • 6. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 6  My system Seg.1: 192.168.79.0/24, GigE Seg.2: 10.0.0.0/8, 100BaseT(USB Ethernet) eins zwei drei vier fuenf sechs sieben .1 .2 .3 .4 .5 .6 .7 .79.0.1 .79.0.2 .79.0.3 .79.0.4 .79.0.5 .79.0.6 •  Seven nodes, connected to two separated physical network segments. •  Seg.1 is for GlusterFS and Seg.2 is for other purposes (e.g. SSH) •  Each node is setup with: •  CentOS 6.5 x86_64 •  GlusterFS 3.5.0 (from source tarball) .79.0.7 storage pool (mainly) client
  • 7. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 7  My system •  Intel NUC DN2820FYKH •  Celeron2.4GHz dual-core, 1MB cache •  8GB RAM •  1TB Solid-state hard drive (w/ 8GB SLC SSD) •  7.5W TDP •  Why? •  Separate several loads (mainly of disk accesses and network traffics) •  Enough cheap to build (38k JPY/node) •  Save money on electricity (2 JPY/d/ node) •  Suppress my room's temperature increasing
  • 8. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 8  My system % sudo yum install -y openssh-clients make rpm-build bison flex automake libtool ncurses- devel readline-devel openssl-devel libxml2-devel libibverbs-devel libacl-devel libattr-devel python-devel python-setuptools lvm2-devel systemtap-sdt-devel libaio-devel xfsprogs glib2- devel % tar xzf glusterfs-3.5.0.tar.gz && cd glusterfs-3.5.0 % ./configure --prefix=/usr/local/glusterfs-3.5.0 --enable-bd-xlator --enable-fusermount -- enable-systemtap --enable-debug --enable-crypt-xlator --enable-qemu-block --enable-glupy % make && sudo make install # ln -sfn /usr/local/glusterfs-3.5.0 /usr/local/glusterfs # cp -p /etc/init.d/glusterd /etc/init.d/glusterd-3.5.0 # cat <<EOF >> ~/.zshrc export PATH=$PATH:/usr/local/glusterfs/sbin export MANPATH=$MANPATH:/usr/local/glusterfs/share/man EOF # source ~/.zshrc # echo "/usr/local/glusterfs/lib" > /etc/ld.so.conf.d/glusterfs.conf # ldconfig # sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config # chkconfig iptables off # /etc/init.d/iptables stop GlusterFS 3.5.0 was installed on each node in following way:
  • 9. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 9  12 new features
  • 10. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 10  Overview Features Open Stack Opera- tion Mana- gement Scala- bility Perfor- mance Stab- ility Sec- urity Dev AFR_CLI_enhancements ✔️ Exposing Volume Capabilities ✔️ File Snapshot ✔️ GFID Access ✔️ On-Wire Compression + Decompression ✔️ Prevent NFS restart on Volume change (Part 1) ✔️ Quota Scalability ✔️ ✔️ readdir_ahead ✔️ zerofill ✔️ ✔ Brick Failure Detection ✔️ Disk encryption ✔️ Geo-Replication Enhancement ✔ ✔️
  • 11. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 11  OpenStack Integration Enhancements
  • 12. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 12  File Snapshot # setfattr -n trusted.glusterfs.block-format -v qcow2:<file_size(in KB/MB/GB)> <file_name> features/ qemu-block xlator # setfattr -n trusted.glusterfs.block-snapshot-create -v <snapshot_name1> <file_name> # setfattr -n trusted.glusterfs.block-snapshot-create -v <snapshot_name2> <file_name> # setfattr -n trusted.glusterfs.block-snapshot-goto -v <snapshot_name1> <file_name> # setfattr -n trusted.glusterfs.block-snapshot-delete -v <snapshot_name2> <file_name> a file <file_name> under a mount point of a volume fuse hook to glusterfs client process Restore from a snapshot Take a snapshot Take a snapshot
  • 13. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 13  File Snapshot features/ qemu-block xlator a file <file_name> under a mount point of a volume as a block storage for Cinder fuse hook to glusterfs client process Restore from a snapshot Take a snapshot Take a snapshot OpenStack Cinder BD xlator block-format block-snapshot-create block-snapshot-create block-snapshot-goto block-snapshot-delete
  • 14. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 14  zerofill glusterfsdglusterfsd AFR libgfapi User App (e.g. Cinder) 0000 0000 0000 0000 0000 0000 posix_do_zerofill function ZEROFILL fop (glfs_zerofill function) SCSI WRITESAME command BLKZEROOUT ioctl on Linux
  • 15. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 15  zerofill Server offloaded zerofill vs repeated zeroing [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s http://www.gluster.org/community/documentation/index.php/Features/zerofill 1.23 times faster! 1.25 times faster!
  • 16. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 16  Operation Enhancements
  • 17. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 17  AFR_CLI_enhancements Before 3.5.0 # gluster volume heal vol1 Heal operation on volume vol1 has been successful # gluster volume heal vol1 info ... # gluster volume heal vol1 info healed ... # gluster volume heal vol1 info heal-failed ... # gluster volume heal vol1 info split-brain ... Too many operations to know all the situations... What I want to know is not the file names... How long the healing takes? I don't know when the split-brain detected but...
  • 18. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 18  AFR_CLI_enhancements After 3.5.0 # gluster volume heal vol1 statistics Gathering crawl statistics on volume vol1 has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick eins Starting time of crawl: Mon May 19 10:13:02 2014 Ending time of crawl: Mon May 19 10:13:02 2014 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ... Wow! I can get the statistic and historical information at a glance!
  • 19. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 19  Management Enhancements
  • 20. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 20  # gluster volume info Volume Name: bd0 Type: Distribute Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: sieben:/tmp/bd0-meta Exposing Volume Capabilities I confuse which volume type the volume supports. So I should manage it with other tools like Excel... Before 3.5.0
  • 21. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 21  # gluster volume info Volume Name: bd0 Type: Distribute Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746 Status: Started Xlator 1: BD Capability 1: thin Capability 2: offload_copy Capability 3: offload_snapshot Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: sieben:/tmp/bd0-meta Brick1 VG: bd0-vg Exposing Volume Capabilities Probe the type of volume Provide list of capabilities of a xlator/volume. Yeah! I can understand the volume type and the detail! After 3.5.0
  • 22. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 22  Review: How to use BD xlator # dd if=/dev/zero of=/tmp/bd-loop6 bs=1M count=2048 # losetup /dev/loop6 /tmp/bd-loop6 # pvcreate /dev/loop6 # vgcreate bd0-vg /dev/loop6 Volume group "bd0-vg" successfully created # lvcreate --thin bd0-vg -L 1000M Logical volume "lvol0" created Logical volume "lvol1" created This VG becomes a volume of GlusterFS If you want to get the BDs thin-provisioned ones, hit the lvcreate command. (And the names are fixed.) Here created a VG with a single 2GB of PV
  • 23. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 23  Review: How to use BD xlator # lvdisplay bd0-vg --- Logical volume --- LV Name lvol1 VG Name bd0-vg LV UUID PSAFkr-Vyr8-fkGU-kDnA-rWUF-fFFT-111Snr LV Write Access read/write LV Creation host, time sieben, 2014-05-18 14:38:21 +0900 LV Pool transaction ID 0 LV Pool metadata lvol1_tmeta LV Pool data lvol1_tdata LV Pool chunk size 64.00 KiB LV Zero new blocks yes LV Status available # open 0 LV Size 1000.00 MiB Allocated pool data 0.00% Allocated metadata 0.88% Current LE 250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:5 A logical volume pool for thin-provisioning. No need when using no thin-provisioning.
  • 24. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 24  Review: How to use BD xlator # mkdir /tmp/bd0-meta # gluster volume create bd0 sieben:/tmp/bd0-meta?bd0-vg force volume create: bd0: success: please start the volume to access data # gluster volume start bd0 volume start: bd0: success # gluster volume info bd0 Volume Name: bd0 Type: Distribute Volume ID: 019d0f4b-d11a-480e-9be8-0c79902f0746 Status: Started Xlator 1: BD Capability 1: thin Capability 2: offload_copy Capability 3: offload_snapshot Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: sieben:/tmp/bd0-meta Brick1 VG: bd0-vg # mkdir /mnt/glusterfs/bd0 # mount-t glusterfs sieben:/bd0 /mnt/glusterfs/bd0 Meta data store for BD xlator "?" (question mark) is the separator
  • 25. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 25  Review: How to use BD xlator # touch /mnt/glusterfs/bd0/lv0 # setfattr -n "user.glusterfs.bd" -v "thin:1024MB" /mnt/glusterfs/bd0/lv0 # lvdisplay bd0-vg --- Logical volume --- LV Name lvol1 VG Name bd0-vg LV UUID PSAFkr-Vyr8-fkGU-kDnA-rWUF-fFFT-111Snr LV Write Access read/write LV Creation host, time sieben.infinibridge.net, 2014-05-18 14:38:21 +0900 LV Pool transaction ID 1 LV Pool metadata lvol1_tmeta LV Pool data lvol1_tdata LV Pool chunk size 64.00 KiB LV Zero new blocks yes LV Status available # open 0 LV Size 1000.00 MiB Allocated pool data 0.00% Allocated metadata 0.98% Current LE 250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:5 Create a file that is backed by an LV Or simply -v "lv" when no need for thin- provisioning
  • 26. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 26  Review: How to use BD xlator --- Logical volume --- LV Path /dev/bd0-vg/a9790eba-ffbf-4d9c-a674-e02c61ece935 LV Name a9790eba-ffbf-4d9c-a674-e02c61ece935 VG Name bd0-vg LV UUID Z4HtWM-W0jk-YiK5-66ED-zOMw-YhFp-nrnRUU LV Write Access read/write LV Creation host, time sieben.infinibridge.net, 2014-05-18 14:47:31 +0900 LV Pool name lvol1 LV Status available # open 0 LV Size 1.00 GiB Mapped size 0.00% Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:9
  • 27. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 27  Review: How to use BD xlator # for i in `seq 1 9`; do touch /mnt/glusterfs/bd0/lv$i; setfattr -n "user.glusterfs.bd" -v "thin:1024MB" /mnt/glusterfs/bd0/lv$i; done # lvdisplay -C bd0-vg LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 39b82644-f8ef-435d-b14e-d199a7e264fa bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 6002ddb2-28f1-463c-8666-f683fe2441ed bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 69993340-d691-4502-a9d5-375b8be0fb9e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 82af50a2-0124-41d8-a887-d8c30427a663 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 996969dd-3e32-491b-95d1-f279e6808d5b bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 a9790eba-ffbf-4d9c-a674-e02c61ece935 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 d6fd964a-67f8-4d48-96d1-343bed4ee792 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 ea58b011-3a41-4bf0-9fe6-3862e24b86f6 bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 f7df48e5-09b1-4314-b729-1f38e5ceec2e bd0-vg Vwi-a-tz-- 1.00g lvol1 0.00 lvol1 bd0-vg twi-a-tz-- 1000.00m 0.00 Here we create other nine LVs in the same way.
  • 28. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 28  Review: How to use BD xlator # mkdir /mnt/bd0-lv/{39b82644-f8ef-435d-b14e-d199a7e264fa,6002ddb2-28f1-463c-8666- f683fe2441ed,69993340-d691-4502-a9d5-375b8be0fb9e,82af50a2-0124-41d8-a887- d8c30427a663,996969dd-3e32-491b-95d1-f279e6808d5b,a19ac2af-94df-4d01-b7c3- bbfcbfe5d09e,a9790eba-ffbf-4d9c-a674- e02c61ece935,d6fd964a-67f8-4d48-96d1-343bed4ee792,ea58b011-3a41-4bf0-9fe6-3862e24b86f6,f7df4 8e5-09b1-4314-b729-1f38e5ceec2e} # ls /mnt/bd0-lv 39b82644-f8ef-435d-b14e-d199a7e264fa a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e 6002ddb2-28f1-463c-8666-f683fe2441ed a9790eba-ffbf-4d9c-a674-e02c61ece935 69993340-d691-4502-a9d5-375b8be0fb9e d6fd964a-67f8-4d48-96d1-343bed4ee792 82af50a2-0124-41d8-a887-d8c30427a663 ea58b011-3a41-4bf0-9fe6-3862e24b86f6 996969dd-3e32-491b-95d1-f279e6808d5b f7df48e5-09b1-4314-b729-1f38e5ceec2e # for x in 39b82644-f8ef-435d-b14e-d199a7e264fa 6002ddb2-28f1-463c-8666-f683fe2441ed 69993340-d691-4502-a9d5-375b8be0fb9e 82af50a2-0124-41d8-a887-d8c30427a663 996969dd-3e32-491b-95d1-f279e6808d5b a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e a9790eba- ffbf-4d9c-a674-e02c61ece935 d6fd964a-67f8-4d48-96d1-343bed4ee792 ea58b011-3a41-4bf0-9fe6-3862e24b86f6 f7df48e5-09b1-4314-b729-1f38e5ceec2e; do mkfs.xfs -i size=512 /dev/bd0-vg/$x && mount -t xfs /dev/bd0-vg/$x /mnt/bd0-lv/$x; done Creating mount point for each LV. Formatting each LV in XFS and mount it.
  • 29. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 29  Review: How to use BD xlator # df -h | grep bd0-lv /dev/dm-13 1014M 33M 982M 4% /mnt/bd0-lv/39b82644-f8ef-435d-b14e-d199a7e264fa /dev/dm-16 1014M 33M 982M 4% /mnt/bd0-lv/6002ddb2-28f1-463c-8666-f683fe2441ed /dev/dm-18 1014M 33M 982M 4% /mnt/bd0-lv/69993340-d691-4502-a9d5-375b8be0fb9e /dev/dm-11 1014M 33M 982M 4% /mnt/bd0-lv/82af50a2-0124-41d8-a887-d8c30427a663 /dev/dm-12 1014M 33M 982M 4% /mnt/bd0-lv/996969dd-3e32-491b-95d1-f279e6808d5b /dev/dm-17 1014M 33M 982M 4% /mnt/bd0-lv/a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e /dev/dm-9 1014M 33M 982M 4% /mnt/bd0-lv/a9790eba-ffbf-4d9c-a674-e02c61ece935 /dev/dm-14 1014M 33M 982M 4% /mnt/bd0-lv/d6fd964a-67f8-4d48-96d1-343bed4ee792 /dev/dm-15 1014M 33M 982M 4% /mnt/bd0-lv/ea58b011-3a41-4bf0-9fe6-3862e24b86f6 /dev/dm-10 1014M 33M 982M 4% /mnt/bd0-lv/f7df48e5-09b1-4314-b729-1f38e5ceec2e # mount | grep bd0-lv /dev/mapper/bd0--vg-39b82644--f8ef--435d--b14e--d199a7e264fa on /mnt/bd0-lv/39b82644-f8ef-435d-b14e-d199a7e264fa type xfs (rw) /dev/mapper/bd0--vg-6002ddb2--28f1--463c--8666--f683fe2441ed on /mnt/bd0-lv/6002ddb2-28f1-463c-8666-f683fe2441ed type xfs (rw) /dev/mapper/bd0--vg-69993340--d691--4502--a9d5--375b8be0fb9e on /mnt/bd0-lv/69993340-d691-4502-a9d5-375b8be0fb9e type xfs (rw) /dev/mapper/bd0--vg-82af50a2--0124--41d8--a887--d8c30427a663 on /mnt/bd0-lv/82af50a2-0124-41d8-a887-d8c30427a663 type xfs (rw) /dev/mapper/bd0--vg-996969dd--3e32--491b--95d1--f279e6808d5b on /mnt/bd0-lv/996969dd-3e32-491b-95d1-f279e6808d5b type xfs (rw) /dev/mapper/bd0--vg-a19ac2af--94df--4d01--b7c3--bbfcbfe5d09e on /mnt/bd0-lv/a19ac2af-94df-4d01-b7c3-bbfcbfe5d09e type xfs (rw) /dev/mapper/bd0--vg-a9790eba--ffbf--4d9c--a674--e02c61ece935 on /mnt/bd0-lv/a9790eba-ffbf-4d9c-a674-e02c61ece935 type xfs (rw) /dev/mapper/bd0--vg-d6fd964a--67f8--4d48--96d1--343bed4ee792 on /mnt/bd0-lv/d6fd964a-67f8-4d48-96d1-343bed4ee792 type xfs (rw) /dev/mapper/bd0--vg-ea58b011--3a41--4bf0--9fe6--3862e24b86f6 on /mnt/bd0-lv/ea58b011-3a41-4bf0-9fe6-3862e24b86f6 type xfs (rw) /dev/mapper/bd0--vg-f7df48e5--09b1--4314--b729--1f38e5ceec2e on /mnt/bd0-lv/f7df48e5-09b1-4314-b729-1f38e5ceec2e type xfs (rw) 'Cause of thin-provisioning, in total 10GB of block devices are created on the 2GB of VG!
  • 30. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 30  Review: How to use BD xlator [sechs]# mount -t glusterfs localhost:/bd0 /mnt/glusterfs/bd0 [sechs]# mount -t xfs -o loop /mnt/glusterfs/bd0/lv0 [sechs]# df -h | grep bd0-lv 1014M 33M 982M 4% /mnt/bd0-lv/lv1 The block devices are shared with GlusterFS as files. raw block device physical volume volume group LV LV LV BD volume= file file file Convert them with lvm2 development library = Shared with GlusterFS Snapshot and clone are capable as LV
  • 31. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 31  Brick Failure Detection Before 3.5.0 1. One of the backend storage failed! 2. R/W ops from a client glusterfsdglusterfsd AFR 3. glusterfsd returned "Input/output error" or "Read-only filesystem" directly.
  • 32. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 32  Brick Failure Detection After 3.5.0 1. One of the backend storage failed! 3. R/W ops from a client glusterfsdglusterfsd AFR 4. The client gets no error and completes the operation. 2. glusterfsd outputs logs and shutdowns itself.
  • 33. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 33  Brick Failure Detection # brick="/mnt/lv4/vol4"; gluster volume create vol4 eins:$brick zwei:$brick drei:$brick vier:$brick fuenf:$brick sechs:$brick # gluster volume start vol4 # gluster volume set vol4 storage.health-check-interval 10 # gluster volume info vol4 Volume Name: vol4 Type: Distribute Volume ID: 706122a9-44fc-4d1d-8c3b-97482d98b95c Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv4/vol4 Brick2: zwei:/mnt/lv4/vol4 Brick3: drei:/mnt/lv4/vol4 Brick4: vier:/mnt/lv4/vol4 Brick5: fuenf:/mnt/lv4/vol4 Brick6: sechs:/mnt/lv4/vol4 Options Reconfigured: storage.health-check-interval: 10 Setup for test
  • 34. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 34  Brick Failure Detection [sechs]# dmsetup table vg0-swift: 0 209715200 linear 8:7 838862848 vg0-cinder: 0 209715200 linear 8:7 419432448 vg0-lv4: 0 209715200 linear 8:7 1468008448 vg0-lv3: 0 209715200 linear 8:7 1258293248 vg0-lv2: 0 209715200 linear 8:7 1048578048 vg0-lv1: 0 209715200 linear 8:7 209717248 vg0-lv0: 0 209715200 linear 8:7 2048 vg0-glance: 0 209715200 linear 8:7 629147648 Setup for test (contd.)
  • 35. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 35  Brick Failure Detection [sechs]# echo 0 209715200 error > dmsetup-error-target [sechs]# dmsetup load vg0-lv4 dmsetup-error-target [sechs]# dmsetup resume vg0-lv4 [sechs]# dmsetup table vg0-swift: 0 209715200 linear 8:7 838862848 vg0-cinder: 0 209715200 linear 8:7 419432448 vg0-lv4: 0 209715200 error vg0-lv3: 0 209715200 linear 8:7 1258293248 vg0-lv2: 0 209715200 linear 8:7 1048578048 vg0-lv1: 0 209715200 linear 8:7 209717248 vg0-lv0: 0 209715200 linear 8:7 2048 vg0-glance: 0 209715200 linear 8:7 629147648 Brick failure test
  • 36. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 36  Brick Failure Detection [2014-05-18 18:49:53.720594] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2014-05-18 18:50:04.238239] W [posix-helpers.c:1294:posix_health_check_thread_proc] 0-vol4- posix: stat() on /mnt/lv4/vol4 returned: Input/output error [2014-05-18 18:50:04.238328] M [posix-helpers.c:1314:posix_health_check_thread_proc] 0-vol4- posix: health-check failed, going down Message from syslogd@sechs at May 19 03:50:04 ... glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c: 1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down [2014-05-18 18:50:34.238551] M [posix-helpers.c:1319:posix_health_check_thread_proc] 0-vol4- posix: still alive! -> SIGTERM Message from syslogd@sechs at May 19 03:50:34 ... glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c: 1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM [2014-05-18 18:50:34.238910] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib64/libc.so. 6(clone+0x6d) [0x7f1144ebab7d] (-->/lib64/libpthread.so.0(+0x79d1) [0x7f114554d9d1] (-->/ usr/local/glusterfs-3.5.0/sbin/glusterfsd(glusterfs_sigwaiter+0xf0) [0x4085af]))) 0-: received signum (15), shutting down var/log/glusterfs/bricks/mnt-lv4-vol4.log on the failed node
  • 37. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 37  Brick Failure Detection May 19 03:49:55 sechs kernel: XFS (dm-7): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 4096 May 19 03:49:57 sechs kernel: XFS (dm-7): metadata I/O error: block 0x6400108 ("xlog_iodone") error 5 buf count 4096 May 19 03:49:57 sechs kernel: XFS (dm-7): xfs_do_force_shutdown(0x2) called from line 1062 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa04dd131 May 19 03:49:57 sechs kernel: XFS (dm-7): Log I/O Error Detected. Shutting down filesystem May 19 03:49:57 sechs kernel: XFS (dm-7): Please umount the filesystem and rectify the problem(s) May 19 03:50:04 sechs glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c: 1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down Message from syslogd@sechs at May 19 03:50:04 ... glusterfsd: [2014-05-18 18:50:04.238328] M [posix-helpers.c: 1314:posix_health_check_thread_proc] 0-vol4-posix: health-check failed, going down May 19 03:50:27 sechs kernel: XFS (dm-7): xfs_log_force: error 5 returned. Message from syslogd@sechs at May 19 03:50:34 ... glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c: 1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM May 19 03:50:34 sechs glusterfsd: [2014-05-18 18:50:34.238551] M [posix-helpers.c: 1319:posix_health_check_thread_proc] 0-vol4-posix: still alive! -> SIGTERM May 19 03:50:57 sechs kernel: XFS (dm-7): xfs_log_force: error 5 returned. syslog on the failed node
  • 38. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 38  Brick Failure Detection # gluster volume status vol4 Status of volume: vol4 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick eins:/mnt/lv4/vol4 49160 Y 2925 Brick zwei:/mnt/lv4/vol4 49159 Y 440 Brick drei:/mnt/lv4/vol4 49152 Y 32500 Brick vier:/mnt/lv4/vol4 49152 Y 32657 Brick fuenf:/mnt/lv4/vol4 49152 Y 24517 Brick sechs:/mnt/lv4/vol4 N/A N N/A NFS Server on localhost 2049 Y 29535 NFS Server on zwei N/A N N/A NFS Server on vier N/A N N/A NFS Server on drei N/A N N/A NFS Server on eins N/A N N/A NFS Server on fuenf N/A N N/A NFS Server on sechs N/A N N/A Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks gluster volume status
  • 39. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 39  Brick Failure Detection # ps -ef | grep glusterfsd | grep -v grep | wc -l 0 processes on the failed node
  • 40. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 40  Brick Failure Detection [sechs]# service glusterd restart restart glusterd (and glusterfsd) on the failed node [2014-05-18 18:58:17.197872] I [glusterfsd.c:1959:main] 0-/usr/local/glusterfs-3.5.0/sbin/ glusterfsd: Started running /usr/local/glusterfs-3.5.0/sbin/glusterfsd version 3.5git (/usr/ local/glusterfs-3.5.0/sbin/glusterfsd -s sechs --volfile-id vol4.sechs.mnt-lv4-vol4 -p /var/ lib/glusterd/vols/vol4/run/sechs-mnt-lv4-vol4.pid -S /var/run/ 23afc72b5ceddccd28b405b1cdf5b4df.socket --brick-name /mnt/lv4/vol4 -l /usr/local/ glusterfs-3.5.0/var/log/glusterfs/bricks/mnt-lv4-vol4.log --xlator-option *-posix.glusterd- uuid=0765d288-a59b-4ccf-90ae-c3332c83dbf4 --brick-port 49152 --xlator-option vol4- server.listen-port=49152) [2014-05-18 18:58:17.205310] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled [2014-05-18 18:58:17.205486] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread [2014-05-18 18:58:17.205880] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-05-18 18:58:17.205949] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread [2014-05-18 18:58:18.834910] I [graph.c:254:gf_add_cmdline_options] 0-vol4-server: adding option 'listen-port' for volume 'vol4-server' with value '49152'
  • 41. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 41  Brick Failure Detection [2014-05-18 18:58:18.834976] I [graph.c:254:gf_add_cmdline_options] 0-vol4-posix: adding option 'glusterd-uuid' for volume 'vol4-posix' with value '0765d288-a59b-4ccf-90ae- c3332c83dbf4' [2014-05-18 18:58:18.837332] I [rpcsvc.c:2064:rpcsvc_set_outstanding_rpc_limit] 0-rpc- service: Configured rpc.outstanding-rpc-limit with value 64 [2014-05-18 18:58:18.837510] W [options.c:848:xl_opt_validate] 0-vol4-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2014-05-18 18:58:18.837572] I [socket.c:3561:socket_init] 0-tcp.vol4-server: SSL support is NOT enabled [2014-05-18 18:58:18.837601] I [socket.c:3576:socket_init] 0-tcp.vol4-server: using system polling thread [2014-05-18 18:58:18.838445] E [common-utils.c:93:mkdir_p] 0-: Failed due to reason Input/ output error [2014-05-18 18:58:18.838505] I [mem-pool.c:539:mem_pool_destroy] 0-vol4-changelog: size=108 max=0 total=0 [2014-05-18 18:58:18.838533] E [xlator.c:403:xlator_init] 0-vol4-changelog: Initialization of volume 'vol4-changelog' failed, review your volfile again [2014-05-18 18:58:18.838561] E [graph.c:307:glusterfs_graph_init] 0-vol4-changelog: initializing translator failed [2014-05-18 18:58:18.838610] E [graph.c:502:glusterfs_graph_activate] 0-graph: init failed restart glusterd (and glusterfsd) on the failed node (contd.)
  • 42. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 42  Brick Failure Detection [2014-05-18 18:58:18.839480] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/local/ glusterfs-3.5.0/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1b5) [0x7f2981c837d8] (-->/usr/ local/glusterfs-3.5.0/sbin/glusterfsd(mgmt_getspec_cbk+0x36a) [0x40cf77] (-->/usr/local/ glusterfs-3.5.0/sbin/glusterfsd(glusterfs_process_volfp+0x18a) [0x408bf2]))) 0-: received signum (0), shutting down restart glusterd (and glusterfsd) on the failed node (contd.)
  • 43. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 43  Scalability Enhancement
  • 44. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 44  Quota Scalability Before 3.5.0 Directory Quota limitation = a few hundreds per volume
  • 45. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 45  Quota Scalability After 3.5.0 Directory Quota limitation = 65536 per volume
  • 46. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 46  Performance Enhancements
  • 47. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 47  On-Wire Compression + Decompression FUSE client Storage pool FUSE client Storage pool Write ops Read ops 3. Transport 2. Compression 1. open and write 4. Decompression and write to disk 1. open and read 4. Decompression 3. Transport 2. read and Compression
  • 48. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 48  On-Wire Compression + Decompression # gluster volume create vol-comp eins:/mnt/lv3/vol-comp # gluster volume set vol-comp network.compression on # gluster volume set vol-comp network.compression.compression-level 8 # gluster volume set vol-comp network.compression.min-size 50 # gluster volume set vol-comp performance.write-behind off # gluster volume set vol-comp performance.strict-write-ordering on # gluster volume set vol-comp performance.open-behind off # gluster volume info vol-comp Volume Name: vol-comp Type: Distribute Volume ID: 92b47734-2552-4168-b3c3-151093562e4f Status: Created Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv3/vol-comp Options Reconfigured: network.compression.min-size: 50 network.compression.compression-level: 8 performance.open-behind: off performance.write-behind: off performance.strict-write-ordering: on network.compression.mode: server network.compression: on Data is compressed only when its size exceeds the above value in bytes. -1: default compression (= 8) 0: no compression 1: best speed 9: best compression Turn off the performance translators to avoid Input/output error
  • 49. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 49  On-Wire Compression + Decompression # gluster volume start vol-comp # mount -t glusterfs localhost:/vol-comp /mnt/glusterfs/vol-comp # dd if=/dev/zero of=/mnt/glusterfs/vol-comp/1gb.dat bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 33.8606 s, 31.7 MB/s # diff /mnt/glusterfs/vol-comp/1gb.dat /tmp/1gb.dat # •  CPU load on client becomes higher than the one without network compression. •  Tcpdump showed the 1GB of zero compressed into non-zero one. •  High-end CPU might show greater performance. •  There are still issues and limitations •  It cannot work with striped volumes. •  For glusterfs versions <= 3.5, it cannot work with AFR. 117 MB/s when no compression Compression and Decompression executed correctly
  • 50. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 50  readdir_ahead Before 3.5.0 volume read-ahead Sequential file access can be fast, but sequential directory access like "ls" cannot.
  • 51. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 51  readdir_ahead After 3.5.0 volume Sequential reads of large directories can complete faster! readdir-aheadread-ahead
  • 52. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 52  readdir_ahead # gluster volume set vol0 readdir-ahead enable volume set: success # gluster volume info vol0 Volume Name: vol0 Type: Distribute Volume ID: cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv0/vol0 Brick2: zwei:/mnt/lv0/vol0 Options Reconfigured: performance.readdir-ahead: enable disabled by default How-to
  • 53. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 53  readdir_ahead # brick="/mnt/lv4/vol4"; gluster volume create vol4 eins:$brick zwei:$brick drei:$brick vier:$brick fuenf:$brick sechs:$brick # gluster volume start vol4 # mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4 # mkdir /mnt/glusterfs/vol4/manyfiles # for a in `seq 0 9`; do for b in `seq 0 9`; do for c in `seq 0 9`; for d in `seq 0 9`; do for e in `seq 0 9`; do for f in `seq 0 9`; do for g in `seq 0 9`; do for h in `seq 0 9`; do for i in `seq 0 9`; do file="/mnt/glusterfs/vol4/manyfiles/8kb${a}${b}${c}${d}${e}${f}${g}${h} ${i}.dat"; echo ${file}; dd if=/dev/zero of=${file} bs=1K count=8; if [ $? -ne 0 ]; then break; fi; done; done; done; done; done; done; done; done ... ^C # df -ki /mnt/glusterfs/vol4 Filesystem Inodes IUsed IFree IUse% Mounted on localhost:vol4 314572800 3394646 311178154 2% /mnt/glusterfs/vol4 # umount /mnt/glusterfs/vol4 Setup for evaluation 3 million 8K files
  • 54. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 54  readdir_ahead # mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4 # for i in `seq 0 2`; do time ls /mnt/glusterfs/vol4/manyfiles > /dev/null; done 26.24s user 18.70s system 6% cpu 12:03.05 total 26.58s user 12.10s system 5% cpu 11:45.92 total 26.53s user 21.61s system 5% cpu 14:14.75 total # umount /mnt/glusterfs/vol4 Evaluation # gluster volume stop vol4 && gluster volume start vol4 # gluster volume set vol4 readdir-ahead enable # mount -t glusterfs localhost:/vol4 /mnt/glusterfs/vol4 # for i in `seq 0 2`; do time ls /mnt/glusterfs/vol4/manyfiles > /dev/null; done 26.24s user 17.97s system 11% cpu 6:25.09 total 26.58s user 22.36s system 10% cpu 8:02.83 total 26.57s user 22.83s system 10% cpu 8:13.01 total # gluster volume reset vol4 # umount /mnt/glusterfs/vol4 1.68 times faster!
  • 55. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 55  Stability Enhancements
  • 56. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 56  Prevent NFS restart on Volume change (Part 1) Gluster NFS Graph nfs/server option nfs3.vol4.volume-id 706122a9-44fc-4d1d-8c3b-97482d98b95c option rpc-auth.addr.vol4.allow * option nfs3.vol-gfid-access.volume-id 73abf812-4fff-42bd-822b-3036b72f060d option rpc-auth.addr.vol-gfid-access.allow * option nfs3.vol2.volume-id d0517697-5372-44a1-960f-6db0d988f3b2 option rpc-auth.addr.vol2.allow * option nfs3.vol-comp.volume-id 92b47734-2552-4168-b3c3-151093562e4f option rpc-auth.addr.vol-comp.allow * option nfs3.vol1.volume-id ba03d1e6-a520-4e7f-ac4c-2440a205e80e option rpc-auth.addr.vol1.allow * option nfs3.vol0.volume-id cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59 option rpc-auth.addr.vol0.allow * option nfs.drc on option nfs.nlm on option nfs.dynamic-volumes on vol0 debug/io-stats vol0-write-behind performance/write-behind vol0-dht cluster/distribute vol0-client-0 protocol/client vol0-client-1 protocol/client vol1 debug/io-stats vol1-write-behind performance/write-behind vol1-dht cluster/distribute vol1-client-0 protocol/client vol1-client-1 protocol/client vol2 debug/io-stats vol2-write-behind performance/write-behind vol2-dht cluster/distribute vol2-client-0 protocol/client vol2-client-1 protocol/client Single nfs/server exists on the top of all the volumes
  • 57. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 57  Vol2 My presentation last year (2013) !  NFS and Multi-tenancy ! 'nfs.rpc-auth-allow' for multi-tenancy ! some operations on a volume affect IOs to other volumes Vol1 Vol0 e.g. gluster volume set ... IO IO IO Vol2 Vol1 Vol0 IO IO IO
  • 58. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 58  Prevent NFS restart on Volume change (Part 1) !   "Some operations" on a volume !  gluster volume {set|reset} <volumeName> nfs.rpc-auth-allow !  gluster volume {start|stop} <volumeName> !  gluster volume add-brick !  gluster volume remove-brick <volumeName> <brick1> ... <brickn> commit
  • 59. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 59  Prevent NFS restart on Volume change (Part 1) !   Internal NFS options became unaffected by volume changes. !  nfs.readdir-size !  nfs.nlm !  nfs.acl !  nfs.mount-rmtab !  nfs.drc !  nfs.drc-size !  nfs.read-size !  nfs.write-size !  nfs.readdir-size !  nfs.export-dir !  nfs.export-dirs !  nfs.enable-ino32 !  nfs.export-volumes !  nfs.addr-namelookup !  nfs.outstanding-rpc-limit !  nfs.mount-mtab !  nfs.register-with-portmap
  • 60. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 60  Geo-Replication Enhancement storage pool (a cluster) gsyncd Before 3.5.0 SPOF! identify file changes with xattrs directory crawl with rsync
  • 61. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 61  Geo-Replication Enhancement storage pool (a cluster) gsyncd for each peer After 3.5.0 identify file changes with changelog in memory gsyncd for each peer gsyncd for each peer gsyncd for each peer gsyncd for each peer gsyncd for each peer
  • 62. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 62  Geo-Replication Enhancement # cat /var/lib/glusterd/vols/vol0/vol0.eins.mnt-lv0-vol0.vol volume vol0-posix type storage/posix option volume-id cf9db2aa-5ee8-40c3-8ca9-8316ab31ba59 option directory /mnt/lv0/vol0 end-volume volume vol0-changelog type features/changelog option changelog-dir /mnt/lv0/vol0/.glusterfs/changelogs option changelog-brick /mnt/lv0/vol0 subvolumes vol0-posix end-volume ... volume vol0-server type protocol/server option auth.addr./mnt/lv0/vol0.allow * option auth.login.863ccc05-1ba2-47cc-8a15-240ad4e8c736.password c8d200d6-db0b-4f87- be0f-664e08f4ceee option auth.login./mnt/lv0/vol0.allow 863ccc05-1ba2-47cc-8a15-240ad4e8c736 option transport-type tcp subvolumes /mnt/lv0/vol0 end-volume Changelog
  • 63. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 63  Geo-Replication Enhancement # ls -a /mnt/lv0/vol0/.glusterfs/changelogs . .. Changelog (contd.) No use without gsyncd???
  • 64. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 64  Security Enhancement
  • 65. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 65  Disk encryption FUSE client Storage pool FUSE client Storage pool Write ops Read ops 1. open and write 2. Encryption 3. Transport 4. Write the encrypted data to disk 1. open and read 4. Decryption 3. Transport 2. read from underlying disks
  • 66. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 66  Disk encryption # gluster volume info Volume Name: vol2 Type: Replicate Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv2/vol2 Brick2: zwei:/mnt/lv2/vol2 # gluster volume set vol2 encryption on volume set: success # for x in quick-read write-behind open-behind; do gluster volume set vol2 performance.$x off; done # gluster volume set vol2 encryption.master-key /var/lib/glusterd/vols/vol2/ encryption.master-key # openssl rand -hex 32 > /var/lib/glusterd/vols/vol2/encryption.master-key # gluster volume set vol2 encryption.data-key-size 512 Setup
  • 67. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 67  Disk encryption # gluster volume info Volume Name: vol2 Type: Replicate Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv2/vol2 Brick2: zwei:/mnt/lv2/vol2 Options Reconfigured: encryption.data-key-size: 512 encryption.master-key: /var/lib/glusterd/vols/vol2/encryption.master-key performance.open-behind: off performance.write-behind: off performance.quick-read: off features.encryption: on # mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/var/lib/glusterd/vols/vol2/ encryption.master-key localhost:/vol2 /mnt/glusterfs/vol2 Setup (contd.)
  • 68. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 68  Disk encryption # echo "test" > /mnt/glusterfs/vol2/test.txt # cat /mnt/glusterfs/vol2/test.txt test [eins]# cat /mnt/lv2/vol2/test.txt Zd??]K!q??tuv [zwei]# cat /mnt/lv2/vol2/test.txt Zd??]K!q??tuv Encryption test # dd if=/dev/zero of=/mnt/glusterfs/vol1/test.dat bs=1 count=32 # dd if=/dev/zero of=/mnt/glusterfs/vol2/test.dat bs=1 count=32 [eins]# dd if=/dev/zero of=/tmp/test.dat bs=1 count=32 [eins]# diff /tmp/test.dat /mnt/lv2/vol2/test.dat Binary files /tmp/test.dat and /mnt/lv2/vol2/test.dat differ [eins]# diff /tmp/test.dat /mnt/lv1/vol1/test.dat # # tcpdump -i eth0 -XX Can see the transported zeroed data fully encrypted. ASCII files on the bricks are encrypted. Binary files on the bricks are also encrypted. Confirm that no use of encryption never encrypt the data, so you can access the raw data on several bricks without encryption.
  • 69. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 69  Disk encryption # dd if=/dev/zero of=/tmp/1gb.dat bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 3.61505 s, 297 MB/s # diff3 /tmp/1gb.dat /mnt/glusterfs/vol1/1gb.dat /mnt/glusterfs/vol2/1gb.dat # Decryption test Perfect!
  • 70. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 70  Disk encryption # dd if=/dev/zero of=/mnt/glusterfs/vol1/1gb.dat bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 18.4542 s, 58.2 MB/s # dd if=/dev/zero of=/mnt/glusterfs/vol2/1gb.dat bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 263.633 s, 4.1 MB/s Performance test
  • 71. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 71  Disk encryption # mount -t nfs -o vers=3,hard,intr,nosuid localhost:/vol2 /mnt/nfs/vol2 mount.nfs: Connection timed out Work with NFS? (No!)
  • 72. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 72  Disk encryption # cp /var/lib/glusterd/vols/vol2/encryption.master-key /tmp # mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/tmp/encryption.master-key localhost:/vol2 /mnt/glusterfs/vol-crypt # diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt # Compromising with the same MK
  • 73. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 73  Disk encryption # openssl rand -hex 32 > /tmp/encryption.master-key # diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt # Compromising with a different MK keeping mounted
  • 74. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 74  Disk encryption # umount /mnt/glusterfs/vol-crypt # mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/tmp/encryption.master-key localhost:/vol2 /mnt/glusterfs/vol-crypt # diff /mnt/glusterfs/vol-crypt/test.txt /tmp/test.txt diff: /mnt/glusterfs/vol-crypt/test.txt: Invalid argument # ls -lh /mnt/glusterfs/vol-crypt total 1.1G -rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat -rw-r--r-- 1 root root 32 May 18 22:57 test.dat -rw-r--r-- 1 root root 5 May 18 22:55 test.txt # cp /mnt/glusterfs/vol-crypt/test.txt ~/ cp: reading `/mnt/glusterfs/vol-crypt/test.txt': Invalid argument # ls -l ~/test.txt -rw-r--r-- 1 root root 0 May 19 00:38 /root/test.txt Compromising with an invalid MK
  • 75. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 75  Disk encryption # echo "test2" > /mnt/glusterfs/vol-crypt/test2.txt # cat /mnt/glusterfs/vol-crypt/test2.txt test2 # diff /mnt/glusterfs/vol-crypt/test2.txt /tmp/test2.txt # Compromising with an invalid MK (contd.) # rm /mnt/glusterfs/vol-crypt/test.txt rm: cannot remove `/mnt/glusterfs/vol-crypt/test.txt': Invalid argument # ls -lh /mnt/glusterfs/vol-crypt total 1.1G -rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat -rw-r--r-- 1 root root 6 May 19 00:39 test2.txt -rw-r--r-- 1 root root 32 May 18 22:57 test.dat -rw-r--r-- 1 root root 5 May 18 22:55 test.txt # rm /mnt/glusterfs/vol-crypt/test2.txt # ls -lh /mnt/glusterfs/vol-crypt total 1.1G -rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat -rw-r--r-- 1 root root 32 May 18 22:57 test.dat -rw-r--r-- 1 root root 5 May 18 22:55 test.txt Enable to write a file with an invalid MK. (Is it okay?)
  • 76. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 76  Disk encryption # mv /mnt/glusterfs/vol-crypt/test.txt /mnt/glusterfs/vol-crypt/test2.txt mv: cannot move `/mnt/glusterfs/vol-crypt/test.txt' to a subdirectory of itself, `/mnt/ glusterfs/vol-crypt/test2.txt' Compromising with an invalid MK (contd.) # umount /mnt/glusterfs/vol-crypt # mount -t glusterfs -o xlator-option=vol2-crypt.master-key=/var/lib/glusterd/vols/vol2/ encryption.master-key localhost:/vol2 /mnt/glusterfs/vol-crypt # ls -lh /mnt/glusterfs/vol-crypt total 1.1G -rw-r--r-- 1 root root 1.0G May 18 23:31 1gb.dat -rw-r--r-- 1 root root 6 May 19 00:44 test2.txt -rw-r--r-- 1 root root 32 May 18 22:57 test.dat -rw-r--r-- 1 root root 5 May 18 22:55 test.txt # cat /mnt/glusterfs/vol-crypt/test2.txt cat: /mnt/glusterfs/vol-crypt/test2.txt: Invalid argument # rm /mnt/glusterfs/vol-crypt/test2.txt rm: cannot remove `/mnt/glusterfs/vol-crypt/test2.txt': Invalid argument The proper user cannot handle the file created with the invalid MK. (Is it okay?)
  • 77. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 77  Disk encryption # gluster volume info vol-crypt Volume Name: vol2 Type: Replicate Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv2/vol2 Brick2: zwei:/mnt/lv2/vol2 Options Reconfigured: encryption.data-key-size: 512 encryption.master-key: /var/lib/glusterd/vols/vol2/encryption.master-key performance.open-behind: off performance.write-behind: off performance.quick-read: off features.encryption: on # gluster volume reset vol-crypt volume reset: success: reset volume successful Compromising with volume reset
  • 78. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 78  Disk encryption # gluster volume info vol-crypt Volume Name: vol2 Type: Replicate Volume ID: e0332771-a3c2-4fe5-980c-b3860cfe3baf Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: eins:/mnt/lv2/vol2 Brick2: zwei:/mnt/lv2/vol2 Compromising with volume reset (contd.) # cat /mnt/glusterfs/vol-crypt/test2.txt U�%U?0��x^-�bO # cat /mnt/glusterfs/vol-crypt/test.txt Zd��]K!q�tuv May be a way of cracking?
  • 79. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 79  Enhancement for Developers
  • 80. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 80  A Volume GFID Access A Volume/.gfid 62fe0d4f-dfe9-4a2d-b811-176c6d347a7c 52fd93ea-45ba-47f0-916a-bd3774239237 e5593949-79c3-463c-909b-8cc8ef014eb4 bf758c70-ff2b-4f0d-bfc9-860ece79c246 70aaacf9-1c09-44e2-97a2-9486adf10225 e5498fc4-7345-4f5f-af59-81acff1fd083 f6a608ed-0c68-4d1a-a4d7-fb375ba8fd63 d420cbb3-c1e8-47d3-b317-0c8afbc7a8c4 a76dd563-e878-45a0-ac48-59084d86bd0c f9dbc760-c8f3-41e6-8d24-68b24c4c577b 5c0374a8-18fe-4dd6-89e7-f6551111d980 You can deal with each file by GFID Single namespace, just under the mount point
  • 81. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 81  GFID Access # brick="/mnt/lv3/vol-gfid-access";gluster volume create vol-gfid-access eins:$brick zwei: $brick # gluster volume start vol-gfid-access # mkdir /mnt/glusterfs/vol-gfid-access # mount.glusterfs -o aux-gfid-mount localhost:/vol-gfid-access /mnt/glusterfs/vol-gfid- access # for i in `seq 0 9`; do dd if=/dev/zero of=/mnt/glusterfs/vol-gfid-access/$i.dat bs=1M count=1; done # ls -a /mnt/glusterfs/vol-gfid-access/.gfid ls: cannot open directory /mnt/glusterfs/vol-gfid-access/.gfid: Stale file handle # ls -a '/mnt/glusterfs/vol-gfid-access/.gfid/0svu9Cc1wVRLOBiu5NqF3ncw==' ls: cannot access /mnt/glusterfs/vol-gfid-access/.gfid/0svu9Cc1wVRLOBiu5NqF3ncw==: No such file or directory # ls -ld /mnt/glusterfs/vol-gfid-access/.gfid/ drwxr-xr-x 3 root root 166 May 19 03:03 /mnt/glusterfs/vol-gfid-access/.gfid/
  • 82. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 82  GFID Access # stat /mnt/glusterfs/vol-gfid-access/.gfid/ File: `/mnt/glusterfs/vol-gfid-access/.gfid/' Size: 166 Blocks: 0 IO Block: 131072 directory Device: 16h/22d Inode: 13 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2014-05-19 03:03:14.146605880 +0900 Modify: 2014-05-19 03:03:04.968605874 +0900 Change: 2014-05-19 03:03:04.968605874 +0900 # strace ls -a /mnt/glusterfs/vol-gfid-access/.gfid ... stat("/mnt/glusterfs/vol-gfid-access/.gfid", {st_mode=S_IFDIR|0755, st_size=166, ...}) = 0 open("/mnt/glusterfs/vol-gfid-access/.gfid", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ESTALE (Stale file handle) ... •  How can I let it work well? •  If it becomes to work fine, applications using GlusterFS can manage their data in a single namespace.
  • 83. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 83  Additional news
  • 84. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 84  Added files !   cli/src/cli-quoted-client ! contrib/qemu !   error-codes.json !   extras !  geo-rep !  glusterfs-georep-logrotate !  gluster-rsyslog-*.conf !  hook-scripts/add-brick !  logger.conf.example !  post-upgrade-script-for-quota.sh !  pre-upgrade-script-for-quota.sh !   geo-replication ! gf-error-codes.h.template ! libgfchangelog.pc.in ! libglusterfs/src !  client_t !  glusterfs-acl !  timespec ! rpc/rpc-lib/src/rpc-drc !   run-tests.sh !   tests ! xlators !  cluster !  dht/src !  dht-shared.c !  encryption !  crypt !  features !  changelog !  compress !  gfid-access ! glupy ! qemu-block !  quota !  quota-enforcer-client.c !  quoted-aggregator !  quoted-helpers !  performance !  readdir-ahead ! playground !  storage !  bd (replacement of bd_map) related qemu codes glupy has merged! a lot of test codes! template for xlator development
  • 85. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 85  Conclusion
  • 86. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 86  Conclusion !  Update for "everyone" ! 12 features, 8 categories. !  Contribution by HekaFS ! Disk encryption has been one of my dream since 2.0.2. !  Voice of users ! Brick Failure Detection ! Prevent NFS restart on Volume change These are just the great community's power! Use the latest version, and join us!
  • 87. Copyright (C)  2014, NTTPC Communications, Inc. All Rights Reserved. 87  To contact us, e-mail here -> storage-contact@nttpc.co.jp