1. Solaris 10 Administration Topics Workshop
3 - File Systems
By Peter Baer Galvin
For Usenix
Last Revision April 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
2. About the Speaker
Peter Baer Galvin - 781 273 4100
pbg@cptech.com
www.cptech.com
peter@galvin.info
My Blog: www.galvin.info
Bio
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading
systems integrator and VAR, and was the Systems Manager for Brown University's
Computer Science Department. He has written articles for Byte and other magazines. He
was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's
Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the
systems administration column there. He is now Sun columnist for the Usenix ;login:
magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating
Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials
in security and system administration and given talks at many conferences and
institutions.
Copyright 2009 Peter Baer Galvin - All Rights Reserved 2
Saturday, May 2, 2009
3. Objectives
Cover a wide variety of topics in Solaris 10
Useful for experienced system administrators
Save time
Avoid (my) mistakes
Learn about new stuff
Answer your questions about old stuff
Won't read the man pages to you
Workshop for hands-on experience and to reinforce concepts
Note – Security covered in separate tutorial
Copyright 2009 Peter Baer Galvin - All Rights Reserved 3
Saturday, May 2, 2009
4. More Objectives
What makes novice vs. advanced administrator?
Bytes as well as bits, tactics and strategy
Knows how to avoid trouble
How to get out of it once in it
How to not make it worse
Has reasoned philosophy
Has methodology
Copyright 2009 Peter Baer Galvin - All Rights Reserved 4
Saturday, May 2, 2009
5. Prerequisites
Recommend at least a couple of years of
Solaris experience
Or at least a few years of other Unix
experience
Best is a few years of admin experience,
mostly on Solaris
Copyright 2009 Peter Baer Galvin - All Rights Reserved 5
Saturday, May 2, 2009
6. About the Tutorial
Every SysAdmin has a different knowledge set
A lot to cover, but notes should make good
reference
So some covered quickly, some in detail
Setting base of knowledge
Please ask questions
But let’s take off-topic off-line
Solaris BOF
Copyright 2009 Peter Baer Galvin - All Rights Reserved 6
Saturday, May 2, 2009
7. Fair Warning
Sites vary
Circumstances vary
Admin knowledge varies
My goals
Provide information useful for each of you at
your sites
Provide opportunity for you to learn from
each other
Copyright 2009 Peter Baer Galvin - All Rights Reserved 7
Saturday, May 2, 2009
8. Why Listen to Me
20 Years of Sun experience
Seen much as a consultant
Hopefully, you've used:
My Usenix ;login: column
The Solaris Corner @ www.samag.com
The Solaris Security FAQ
SunWorld “Pete's Wicked World”
SunWorld “Pete's Super Systems”
Unix Secure Programming FAQ (out of date)
Operating System Concepts (The Dino Book), now 8th ed
Applied Operating System Concepts
Copyright 2009 Peter Baer Galvin - All Rights Reserved 8
Saturday, May 2, 2009
9. Slide Ownership
As indicated per slide, some slides
copyright Sun Microsystems
Feel free to share all the slides - as long as
you don’t charge for them or teach from
them for fee
Copyright 2009 Peter Baer Galvin - All Rights Reserved 9
Saturday, May 2, 2009
10. Overview
Lay of the Land
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
11. Schedule
Times and Breaks
Copyright 2009 Peter Baer Galvin - All Rights Reserved 11
Saturday, May 2, 2009
12. Coverage
Solaris 10+, with some Solaris 9 where
needed
Selected topics that are new, different,
confusing, underused, overused, etc
Copyright 2009 Peter Baer Galvin - All Rights Reserved 12
Saturday, May 2, 2009
13. Outline
Overview
Objectives
Choosing the most appropriate file system(s)
UFS / SDS
Veritas FS / VM (not in detail)
ZFS
Copyright 2009 Peter Baer Galvin - All Rights Reserved 13
Saturday, May 2, 2009
14. Polling Time
Solaris releases in use?
Plans to upgrade?
Other OSes in use?
Use of Solaris rising or falling?
SPARC and x86
OpenSolaris?
Copyright 2009 Peter Baer Galvin - All Rights Reserved 14
Saturday, May 2, 2009
15. Your Objectives?
Copyright 2009 Peter Baer Galvin - All Rights Reserved 15
Saturday, May 2, 2009
16. Lab Preparation
Have device capable of telnet on the
USENIX network
Or have a buddy
Learn your “magic number”
Telnet to 131.106.62.100+”magic number”
User “root, password “lisa”
It’s all very secure
Copyright 2009 Peter Baer Galvin - All Rights Reserved 16
Saturday, May 2, 2009
17. Lab Preparation
Or...
Use virtualbox
Use your own system
Use a remote machine you have legit
access to
Copyright 2009 Peter Baer Galvin - All Rights Reserved 17
Saturday, May 2, 2009
18. Choosing the Most Appropriate File Systems
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
19. Choosing the Most Appropriate File Systems
Many file systems, many not optional (tmpfs et al)
Where you have choice, how to choose?
Consider
Solaris version being used
< S10 means no ZFS
ISV support
For each ISV make sure desired FS is supported
Apps, backups, clustering
Priorities
Now weigh priorities of performance, reliability, experience,
features, risk / reward
Copyright 2009 Peter Baer Galvin - All Rights Reserved 19
Saturday, May 2, 2009
20. Consider...
Pros and cons of mixing file systems
Root file system
Not much value in using vxfs / vxvm here
unless used elsewhere
Interoperability (need to detach from one type
of system and attach to another?)
Cost
Supportability & support model
Non-production vs. production use
Copyright 2009 Peter Baer Galvin - All Rights Reserved 20
Saturday, May 2, 2009
21. Root Disk Mirroring
The Crux of Performance
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
22. Topics
•Root disk mirroring
•ZFS
Copyright 2009 Peter Baer Galvin - All Rights Reserved 22
Saturday, May 2, 2009
23. Root Disk Mirroring
Complicated because
Must be bootable
Want it protected from disk failure
And want the protection to work
Can increase or decrease upgrade
complexity
Veritas
Live upgrade
Copyright 2009 Peter Baer Galvin - All Rights Reserved 23
Saturday, May 2, 2009
24. Manual Mirroring
Vxvm encapsulation can cause lack of availability
Vxvm needs a rootdg disk
Any automatic mirroring can propagate errors
Consider
Use disksuite (Solaris Volume Manager) to mirror boot disk
Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror
copy
Or use 10Mb rootdg on 2 boot disks in disksuite to do the
mirroring
Best of all worlds – details in column at
www.samag.com/solaris
Copyright 2009 Peter Baer Galvin - All Rights Reserved 24
Saturday, May 2, 2009
25. Manual Mirroring
Sometimes want more than no mirroring, less than real mirroring
Thus "manual mirroring"
Nightly cron job to copy partitions elsewhere
Can be used to duplicate root disk, if installboot used
Combination of newfs, mount, ufsdump | ufsrestore
Quite effective, useful, and cheap
Easy recovery from corrupt root image, malicious error, sysadmin
error
Has saved at least one client
But disk failure can require manual intervention
Complete script can be found at www.samag.com/solaris
Copyright 2009 Peter Baer Galvin - All Rights Reserved 25
Saturday, May 2, 2009
26. Best Practice – Root Disk
Have 4 disks for root!
1st is primary boot device
2nd is disksuite mirror of first
3rd is manual mirror of 1st
4th is manual mirror, kept on a shelf!
Put nothing but systems files on these disks
(/, /var, /opt, /usr, swap)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 26
Saturday, May 2, 2009
27. Aside: Disk Performance
Which is faster?
73GB drive 300GB drive
10000 RPM 10000 RPM
3Gb/sec 3Gb/sec
Copyright 2009 Peter Baer Galvin - All Rights Reserved 27
Saturday, May 2, 2009
28. UFS / SDS
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
29. UFS Overview
Standard Pre-Solaris 10 file system
Many years old, updated continously
But still showing its age
No integrated volume manager, instead use SDS
(disk suite)
Very fast, but feature poor
For example snapshots exist but only useful for
backups
Painful to manage, change, repair
Copyright 2009 Peter Baer Galvin - All Rights Reserved 29
Saturday, May 2, 2009
30. Features
64-bit pointers
16TB file systems (on 64-bit Solaris)
1TB maximum file size
metadata logging (by default) increases
performance and keeps file systems (usually)
consistent after a crash
Lots of ISV and internal command (dump) support
Only bootable Solaris file system (until S10 10/08)
Dynamic multipathing, but via separate “traffic
manager” facility
Copyright 2009 Peter Baer Galvin - All Rights Reserved 30
Saturday, May 2, 2009
31. Issues
Sometimes there is still corruption
Need to run fsck
Sometimes it fails
Many limits
Many features lacking (compared to ZFS)
Lots of manual administration tasks
format to slice up a disk
newfs to format the file system, fsck to check it
mount and /etc/vfstab to mount a file system
share commands, plus svcadm commands, to NFS export
Plus separate volume management
Copyright 2009 Peter Baer Galvin - All Rights Reserved 31
Saturday, May 2, 2009
32. Volume Management
Separate set of commands (meta*) to manage volumes (RAID et al)
For example, to mirror the root file system
Have 2 disks with identical partitioning
Have 2 small partition per disk for meta-data (here
slices 5 and 6)
newfs the file systems
Create meta-data state databases (at least 3, for quorum)
# metadb -a /dev/dsk/c0t0d0s5
# metadb -a /dev/dsk/c0t0d0s6
# metadb -a /dev/dsk/c0t1d0s5
# metadb -a /dev/dsk/c0t1d0s6
Copyright 2009 Peter Baer Galvin - All Rights Reserved 32
Saturday, May 2, 2009
33. Volume Management (cont)
Initialize submirrors (components of mirrors) and mirror the partitions - here
we do /, swap, and /var
# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d0 -m d10
Make the new / bootable
# metaroot d0
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d1 -m d11
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d4 -m d14
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d7 -m d17
Copyright 2009 Peter Baer Galvin - All Rights Reserved 33
Saturday, May 2, 2009
34. Volume Management (cont)
Update /etc/vfstab to reflect new meta devices
/dev/md/dsk/d1 - - swap - no -
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes -
/dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes -
Finally attach the submirror to each device to be mirrored
# metattach d0 d20
# metattach d1 d21
# metattach d4 d24
# metattach d7 d27
Now the root disk is mirrored, and commands such as Solaris upgrade, live
upgrade, and boot understand that
Copyright 2009 Peter Baer Galvin - All Rights Reserved 34
Saturday, May 2, 2009
35. Veritas VM / FS
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
36. Overview
A popular, commercial addition to Solaris
64-bit
Integrated volume management (vxfs + vxvm)
Mirrored root disk via “encapsulation”
Good ISV support
Good extended features such as snapshots, replication
Shrink and grow file systems
Extent based (for better and worse), journaled,
clusterable
Cross-platform
Copyright 2009 Peter Baer Galvin - All Rights Reserved 36
Saturday, May 2, 2009
37. Features
Very large limits
Dynamic multipathing included
Hot spares to automatically replace failed
disks
Dirty region logging (DRL) volume
transaction logs for fast recovery from
crash
But still can require consistency check
Copyright 2009 Peter Baer Galvin - All Rights Reserved 37
Saturday, May 2, 2009
38. Issues
$$$
Adds supportability complexities (who do
you call)
Complicates OS upgrades (unencapsulate
first)
Fairly complex to manage
Comparison of performance vs. ZFS at
http://www.sun.com/software/whitepapers/
solaris10/zfs_veritas.pdf
Copyright 2009 Peter Baer Galvin - All Rights Reserved 38
Saturday, May 2, 2009
39. ZFS
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Saturday, May 2, 2009
40. ZFS
Looks to be the “next great thing”
Shipped officially in S10U2 (the 06/06 release)
From scratch file system
Includes volume management, file system, reliability,
scalability, performance, snapshots, clones,
replication
128-bit file system, almost everything is “infinite”
Checksumming throughout
Simple, endian independent, export/importable…
Still using traffic manager for multipathing
(some following slides are from ZFS talk by Jeff Bonwick
and Bill Moore – ZFS team leads at Sun)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 40
Saturday, May 2, 2009
41. Trouble with Existing Filesystems
No defense against silent data corruption
Any defect in disk, controller, cable, driver, or firmware can
corrupt data silently; like running a server without ECC
memory
Brutal to manage
Labels, partitions, volumes, provisioning, grow/shrink, /etc/
vfstab...
Lots of limits: filesystem/volume size, file size, number of files,
files per directory, number of snapshots, ...
Not portable between platforms (e.g. x86 to/from SPARC)
Dog slow
Linear-time create, fat locks, fixed block size, naïve prefetch,
slow random writes, dirty region logging
Copyright 2009 Peter Baer Galvin - All Rights Reserved 41
Saturday, May 2, 2009
42. Design Principles
Pooled storage
Completely eliminates the antique notion of volumes
Does for storage what VM did for memory
End-to-end data integrity
Historically considered “too expensive”
Turns out, no it isn't
And the alternative is unacceptable
Transactional operation
Keeps things always consistent on disk
Removes almost all constraints on I/O order
Allows us to get huge performance wins
Copyright 2009 Peter Baer Galvin - All Rights Reserved 42
Saturday, May 2, 2009
43. Why “volumes” Exist
In the beginning, each filesystem managed a
single disk
Customers wanted more space, bandwidth,
reliability
Rewrite filesystems to handle many disks: hard
Insert a little shim (“volume”) to cobble disks together:
easy
An industry grew up around the FS/volume
model
Filesystems, volume managers sold as separate products
Inherent problems in FS/volume interface can't be fixed
Copyright 2009 Peter Baer Galvin - All Rights Reserved 43
Saturday, May 2, 2009
44. Traditional Volumes
FS FS
Volume Volume
(stripe) (mirror)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 44
Saturday, May 2, 2009
45. ZFS Pools
Abstraction: malloc/free
No partitions to manage
Grow/shrink automatically
All bandwidth always available
All storage in the pool is shared
Copyright 2009 Peter Baer Galvin - All Rights Reserved 45
Saturday, May 2, 2009
46. ZFS Pooled Storage
FS FS FS FS FS
Storage Pool Storage Pool
(RAIDZ) (Mirror)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 46
Saturday, May 2, 2009
48. ZFS Data Integrity Model
Everything is copy-on-write
Never overwrite live data
On-disk state always valid – no “windows of
vulnerability”
No need for fsck(1M)
Everything is transactional
Related changes succeed or fail as a whole
No need for journaling
Everything is checksummed
No silent data corruption
No panics due to silently corrupted metadata
Copyright 2009 Peter Baer Galvin - All Rights Reserved 48
Saturday, May 2, 2009
72. Terms
Pool - set of disks in one or more RAID
formats (i.e. mirrored stripe)
No “/”
File system - mountable-container of files
Data set - file system, block device,
snapshot, volume or clone within a pool
Named via pool/path[@snapshot]
Copyright 2009 Peter Baer Galvin - All Rights Reserved 72
Saturday, May 2, 2009
73. Terms (cont)
ZIL - ZFS intent log
On-disk duplicate of in-memory log of
changes to make to data sets
Write goes to memory, ZIL, is
acknowledged, then goes to disk
ARC - in-memory read cache
L2ARC - level 2 ARC - on flash memory
Copyright 2009 Peter Baer Galvin - All Rights Reserved 73
Saturday, May 2, 2009
74. What ZFS doesn’t do
Can’t remove individual devices from pools
Rather, replace the device, or 3-way mirror
including the device and then remove the device
Can’t shrink a pool (yet)
Can add individual devices, but not optimum (yet)
If adding disk to RAIDZ or RAIDZ2, then end up
with RAIDZ(2)+ 1 concatenated device
Instead add full RAID elements to a pool
Add a mirror pair or RAIDZ(2) set
Copyright 2009 Peter Baer Galvin - All Rights Reserved 74
Saturday, May 2, 2009
75. zpool
# zpool
missing command
usage: zpool command args ...
where 'command' is one of the following:
create [-fn] [-o property=value] ...
[-O file-system-property=value] ...
[-m mountpoint] [-R root] <pool> <vdev> ...
destroy [-f] <pool>
add [-fn] <pool> <vdev> ...
remove <pool> <device> ...
list [-H] [-o property[,...]] [pool] ...
iostat [-v] [pool] ... [interval [count]]
status [-vx] [pool] ...
online <pool> <device> ...
offline [-t] <pool> <device> ...
clear <pool> [device]
Copyright 2009 Peter Baer Galvin - All Rights Reserved
75
Saturday, May 2, 2009
76. zpool (cont)
attach [-f] <pool> <device> <new-device>
detach <pool> <device>
replace [-f] <pool> <device> [new-device]
scrub [-s] <pool> ...
import [-d dir] [-D]
import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] -a
import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id>
[newpool]
export [-f] <pool> ...
upgrade
upgrade -v
upgrade [-V version] <-a | pool ...>
history [-il] [<pool>] ...
get <"all" | property[,...]> <pool> ...
set <property=value> <pool>
Copyright 2009 Peter Baer Galvin - All Rights Reserved 76
Saturday, May 2, 2009
77. zpool (cont)
# zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0
# zpool status -v
pool: ezfs
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
ezfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
errors: No known data errors
Copyright 2009 Peter Baer Galvin - All Rights Reserved 77
Saturday, May 2, 2009
78. zpool (cont)
pool: zfs
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c0d0s7 ONLINE 0 0 0
c0d1s7 ONLINE 0 0 0
c1d1 ONLINE 0 0 0
c1d0 ONLINE 0 0 0
errors: No known data errors
Copyright 2009 Peter Baer Galvin - All Rights Reserved 78
Saturday, May 2, 2009
82. zpool (cont)
Note that for import and export, a pool is
the delineator
You can’t import or export a file system
because it’s an integral part of a pool
Might cause you to use smaller pools
than other
Copyright 2009 Peter Baer Galvin - All Rights Reserved 82
Saturday, May 2, 2009
83. zfs
# zfs
missing command
usage: zfs command args ...
where 'command' is one of the following:
create [-p] [-o property=value] ... <filesystem>
create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume>
destroy [-rRf] <filesystem|volume|snapshot>
snapshot [-r] [-o property=value] ... <filesystem@snapname|
volume@snapname>
rollback [-rRf] <snapshot>
clone [-p] [-o property=value] ... <snapshot> <filesystem|volume>
promote <clone-filesystem>
rename <filesystem|volume|snapshot> <filesystem|volume|snapshot>
rename -p <filesystem|volume> <filesystem|volume>
rename -r <snapshot> <snapshot>
Copyright 2009 Peter Baer Galvin - All Rights Reserved 83
Saturday, May 2, 2009
84. zfs (cont)
list [-rH] [-o property[,...]] [-t type[,...]] [-s
property] ...
[-S property] ... [filesystem|volume|snapshot] ...
set <property=value> <filesystem|volume|snapshot> ...
get [-rHp] [-o field[,...]] [-s source[,...]]
<"all" | property[,...]> [filesystem|volume|
snapshot] ...
inherit [-r] <property> <filesystem|volume|snapshot> ...
upgrade [-v]
upgrade [-r] [-V version] <-a | filesystem ...>
mount
mount [-vO] [-o opts] <-a | filesystem>
unmount [-f] <-a | filesystem|mountpoint>
share <-a | filesystem>
unshare [-f] <-a | filesystem|mountpoint>
Copyright 2009 Peter Baer Galvin - All Rights Reserved 84
Saturday, May 2, 2009
85. zfs (cont)
send [-R] [-[iI] snapshot] <snapshot>
receive [-vnF] <filesystem|volume|snapshot>
receive [-vnF] -d <filesystem>
allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...]
<filesystem|volume>
allow [-ld] -e <perm|@setname>[,...] <filesystem|volume>
allow -c <perm|@setname>[,...] <filesystem|volume>
allow -s @setname <perm|@setname>[,...] <filesystem|volume>
unallow [-rldug] <"everyone"|user|group>[,...]
[<perm|@setname>[,...]] <filesystem|volume>
unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem|
volume>
Each dataset is of the form: pool/[dataset/]*dataset[@name]
For the property list, run: zfs set|get
For the delegated permission list, run: zfs allow|unallow
Copyright 2009 Peter Baer Galvin - All Rights Reserved 85
Saturday, May 2, 2009
86. zfs (cont)
# zfs get
missing property argument
usage:
get [-rHp] [-o field[,...]] [-s source[,...]]
<"all" | property[,...]> [filesystem|volume|snapshot] ...
The following properties are supported:
PROPERTY EDIT INHERIT VALUES
available NO NO <size>
compressratio NO NO <1.00x or higher if compressed>
creation NO NO <date>
mounted NO NO yes | no
origin NO NO <snapshot>
referenced NO NO <size>
type NO NO filesystem | volume | snapshot
used NO NO <size>
aclinherit YES YES discard | noallow | restricted |
passthrough
aclmode YES YES discard | groupmask | passthrough
atime YES YES on | off
Copyright 2009 Peter Baer Galvin - All Rights Reserved 86
Saturday, May 2, 2009
87. zfs (cont)
canmount YES NO on | off | noauto
casesensitivity NO YES sensitive | insensitive | mixed
checksum YES YES on | off | fletcher2 | fletcher4 |
sha256
compression YES YES on | off | lzjb | gzip | gzip-[1-9]
copies YES YES 1 | 2 | 3
devices YES YES on | off
exec YES YES on | off
mountpoint YES YES <path> | legacy | none
nbmand YES YES on | off
normalization NO YES none | formC | formD | formKC |
formKD
primarycache YES YES all | none | metadata
quota YES NO <size> | none
readonly YES YES on | off
recordsize YES YES 512 to 128k, power of 2
refquota YES NO <size> | none
refreservation YES NO <size> | none
reservation YES NO <size> | none
Copyright 2009 Peter Baer Galvin - All Rights Reserved 87
Saturday, May 2, 2009
88. zfs (cont)
secondarycache YES YES all | none | metadata
setuid YES YES on | off
shareiscsi YES YES on | off | type=<type>
sharenfs YES YES on | off | share(1M)
options
sharesmb YES YES on | off | sharemgr(1M)
options
snapdir YES YES hidden | visible
utf8only NO YES on | off
version YES NO 1 | 2 | 3 | current
volblocksize NO YES 512 to 128k, power of 2
volsize YES NO <size>
vscan YES YES on | off
xattr YES YES on | off
zoned YES YES on | off
Sizes are specified in bytes with standard units such as K, M, G,
etc.
User-defined properties can be specified by using a name
containing a colon (:).
Copyright 2009 Peter Baer Galvin - All Rights Reserved 88
Saturday, May 2, 2009
89. zfs (cont)
(/)# zfs list
NAME USED AVAIL REFER MOUNTPOINT
bigp 630G 384G - /zfs/bigp
bigp/big 630G 384G 630G /zfs/bigp/big
(root@sparky)-(7/pts)-(06:35:11/05/05)-
(/)# zfs snapshot bigp/big@5-nov
(root@sparky)-(8/pts)-(06:35:11/05/05)-
(/)# zfs list
NAME USED AVAIL REFER MOUNTPOINT
bigp 630G 384G - /zfs/bigp
bigp/big 630G 384G 630G /zfs/bigp/big
bigp/big@5-nov 0 - 630G /zfs/bigp/big@5-nov
# zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/
big@5-nov
# zfs send -i 5-nov big/bigp@6-nov | ssh host
zfs receive poolB/received/big
Copyright 2009 Peter Baer Galvin - All Rights Reserved 89
Saturday, May 2, 2009
90. zfs (cont)
# zpool history
History for 'zpbg':
2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0
c11t0d0 c12t0d0 c13t0d0
2006-04-03.18:19:48 zfs receive zpbg/imp
2006-04-03.18:41:39 zfs receive zpbg/home
2006-04-03.19:04:22 zfs receive zpbg/photos
2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home
2006-04-03.19:44:22 zfs receive zpbg/mail
2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail
2006-04-03.20:14:32 zfs receive zpbg/mqueue
2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/
mqueue
# zfs create -V 2g tank/volumes/v2
# zfs set shareiscsi=on tank/volumes/v2
# iscsitadm list target
Target: tank/volumes/v2
iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-
cf9a72aa062a
Connections: 0
Copyright 2009 Peter Baer Galvin - All Rights Reserved 90
Saturday, May 2, 2009
91. zpool history -l
Shows user name, host name, and zone of
command
# zpool history -l users
History for ’users’:
2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
[user root on corona:global]
2008-07-10.09:43:13 zfs create users/marks
[user root on corona:global]
2008-07-10.09:43:44 zfs destroy users/marks
[user root on corona:global]
2008-07-10.09:43:48 zfs create users/home
[user root on corona:global]
2008-07-10.09:43:56 zfs create users/home/markm
[user root on corona:global]
2008-07-10.09:44:02 zfs create users/home/marks
[user root on corona:global]
Copyright 2009 Peter Baer Galvin - All Rights Reserved 91
Saturday, May 2, 2009
92. zpool history -i
Shows zfs internal activities - useful for
debugging
# zpool history -i users
History for ’users’:
2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
2008-07-10.09:43:13 [internal create txg:6] dataset = 21
2008-07-10.09:43:13 zfs create users/marks
2008-07-10.09:43:48 [internal create txg:12] dataset = 27
2008-07-10.09:43:48 zfs create users/home
2008-07-10.09:43:55 [internal create txg:14] dataset = 33
Copyright 2009 Peter Baer Galvin - All Rights Reserved 92
Saturday, May 2, 2009
93. ZFS Delegate Admin
Use zfs allow and zfs unallow to grant
and remove permissions
Use “delegation” property to manage if
delegation enabled
Then delegate
# zfs allow cindys create,destroy,mount,snapshot tank/cindys
# zfs allow tank/cindys
-------------------------------------------------------------
Local+Descendent permissions on (tank/cindys)
user cindys create,destroy,mount,snapshot
-------------------------------------------------------------
# zfs unallow cindys tank/cindys
# zfs allow tank/cindys
Copyright 2009 Peter Baer Galvin - All Rights Reserved 93
Saturday, May 2, 2009
94. ZFS - Odds and Ends
zfs get all will display all set attributes of all ZFS file
systems
Recursive snapshots (via -r) as of S10 8/07
zfs clone makes a RW copy of a snapshot
zfs promote sets the root of the file system to be the
specified clone
You can undo a zpool destroy with zpool import
-D
As of S10 8/07 ZFS is integrated with FMA
As of S10 11/06 ZFS supports double-RAID parity
Copyright 2009 Peter Baer Galvin - All Rights Reserved 94
Saturday, May 2, 2009
95. ZFS “GUI”
Did you know that Solaris has an admin
GUI?
Webconsole enabled by default
Turn off via svcadm if not used
By default (on Nevada B64 at least) ZFS
only on-by-default feature
Copyright 2009 Peter Baer Galvin - All Rights Reserved 95
Saturday, May 2, 2009
97. ZFS Automatic Snapshots
In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris
2008.11
SMF service and GNOME app
Can take automatic scheduled snapshots
By default all zfs file systems, at boot, then every 15
minutes, every hour, every day, etc
Auto delete of oldest snapshots if user-defined
amount of space is not available
Can perform incremental or full backups via those snapshots
Nautilus integration allows user to browse and restore files
graphically
Copyright 2009 Peter Baer Galvin - All Rights Reserved 97
Saturday, May 2, 2009
98. ZFS Automatic Snapshots (cont)
One SMF service per time frequency:
frequent snapshots every 15 mins, keeping 4 snapshots
hourly snapshots every hour, keeping 24 snapshots
daily snapshots every day, keeping 31 snapshots
weekly snapshots every week, keeping 7 snapshots
monthly snapshots every month, keeping 12 snapshots
Details here: http://src.opensolaris.org/source/xref/jds/zfs-
snapshot/README.zfs-auto-snapshot.txt
Copyright 2009 Peter Baer Galvin - All Rights Reserved 98
Saturday, May 2, 2009
99. ZFS Automatic Snapshots (cont)
Service properties provide more details
zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the
system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to
true, so to take frequent snapshots of tank/timf, run the following zfs command:
# zfs set com.sun:auto-snapshot:frequent=true tank/timf
The "snap-children" property is ignored when using this fs-name value. Instead, the system
automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system,
based on the values of the ZFS user properties.
zfs/interval [ hours | days | months | none]
When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to
manually fire the method script whenever they want - useful for snapshotting on system events.
zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four
most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has
been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot.
Setting to "all" keeps all snapshots.
zfs/period How often you want to take snapshots, in intervals set according to "zfs/
interval" (eg. every 10 days)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 99
Saturday, May 2, 2009
100. ZFS Automatic Snapshots (cont)
zfs/snapshot-children "true" if you would like to recursively take snapshots of all child
filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//'
zfs/backup [ full | incremental | none ]
zfs/backup-save-cmd The command string used to save the backup stream.
zfs/backup-lock You shouldn't need to change this - but it should be set to "unlocked"
by default. We use it to indicate when a backup is running.
zfs/label A label that can be used to differentiate this set of snapshots from
others, not required. If multiple schedules are running on the same machine, using
distinct labels for each schedule is needed - otherwise oneschedule could remove
snapshots taken by another schedule according to it's snapshot-retention policy. (see
"zfs/keep")
zfs/verbose Set to false by default, setting to true makes the service
produce more output about what it's doing.
zfs/avoidscrub Set to false by default, this determines whether we should avoid
taking snapshots on any pools that have a scrub or resilver in progress. More info in the
bugid:
6343667 need itinerary so interrupted scrub/resilver doesn't have to start over
Copyright 2009 Peter Baer Galvin - All Rights Reserved 100
Saturday, May 2, 2009
101. ZFS Automatic Snapshot (cont)
http://blogs.sun.com/erwann/resource/
menu-location.png
Copyright 2009 Peter Baer Galvin - All Rights Reserved 101
Saturday, May 2, 2009
102. ZFS Automatic Snapshot (cont)
If life-preserver icon enabled in file browser,
then backup of directory is available
Press to bring up nav bar
Copyright 2009 Peter Baer Galvin - All Rights Reserved 102
Saturday, May 2, 2009
103. ZFS Automatic Snapshot (cont)
Drag slider into past to show previous version
of files in the directory
Then right-click on afile and select “Restore to
Desktop” if you want it back
More features coming
Press to bring up nav bar
Copyright 2009 Peter Baer Galvin - All Rights Reserved 103
Saturday, May 2, 2009
104. ZFS Status
Netbackup, Legato support ZFS for
backup / restore
VCS supports ZFS as file system of
clustered services
Most vendors don’t care which file system
app runs on
Performance as good as other file systems
Feature set better
Copyright 2009 Peter Baer Galvin - All Rights Reserved 104
Saturday, May 2, 2009
105. ZFS Futures
Support by ISVs
Backup / restore
Some don’t get metadata (yet)
Use zfs send to emit file containing filesystem
Clustering (see Lustre)
Performance still a work in progress
Being ported to BSD, Mac OS Leopard
Check out the ZFS FAQ at
http://www.opensolaris.org/os/community/zfs/faq/
Copyright 2009 Peter Baer Galvin - All Rights Reserved 105
Saturday, May 2, 2009
106. ZFS Performance
From http://www.opensolaris.org/jive/thread.jspa?
messageID=14997
billm
Reply
On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote:
> Does ZFS reorganize (ie. defrag) the files over time?
Not yet.
> If it doesn't, it might not perform well in "write-little read-much"
> scenarios (where read performance is much more important than write
> performance).
As always, the correct answer is "it depends". Let's take a look at
several cases:
- Random reads: No matter if the data was written randomly or
sequentially, random reads are random for any filesystem,
regardless of their layout policy. Not much you can do to
optimize these, except have the best I/O scheduler possible.
Copyright 2009 Peter Baer Galvin - All Rights Reserved 106
Saturday, May 2, 2009
107. ZFS Performance (cont)
- Sequential writes, sequential reads: With ZFS, sequential writes
lead to sequential layout on disk. So sequential reads will
perform quite well in this case.
- Random writes, sequential reads: This is the most interesting
case. With random writes, ZFS turns them into sequential writes,
which go *really* fast. With sequential reads, you know which
order the reads are going to be coming in, so you can kick off
a bunch of prefetch reads. Again, with a good I/O scheduler
(which ZFS just happens to have), you can turn this into good read
performance, if not entirely as good as totally sequential.
Believe me, we've thought about this a lot. There is a lot we can do to
improve performance, and we're just getting started.
Copyright 2009 Peter Baer Galvin - All Rights Reserved 107
Saturday, May 2, 2009
108. ZFS Performance (cont)
For DBs and other direct-disk-access-
wanting applications
There is no direct I/O in ZFS
But can get very good performance by
matching I/O size of the app (e.g.
Oracle uses 8K) with recordsize of zfs
file system
This is set at filesystem create time
Copyright 2009 Peter Baer Galvin - All Rights Reserved 108
Saturday, May 2, 2009
109. ZFS Performance (cont)
The ZIL can be a bottleneck on NFS servers
NFS does sync writes
Put the ZIL on another disk, or on SSD
ZFS aggressively uses memory for caching
Low priority user, but can cause temporary
conflicts with other users
Use arcstat to monitor memory use
http://www.solarisinternals.com/wiki/index.php/
Arcstat
Copyright 2009 Peter Baer Galvin - All Rights Reserved 109
Saturday, May 2, 2009
110. ZFS Backup Tool
Zetaback is a thin-agent based ZFS backup tool
Runs from a central host
Scans clients for new ZFS filesystems
Manages varying desired backup intervals (per host) for
full backups
incremental backups
Maintain varying retention policies (per host)
Summarize existing backups
Restore any host:fs backup at any point in time to any target
host
https://labs.omniti.com/trac/zetaba
Copyright 2009 Peter Baer Galvin - All Rights Reserved 110
Saturday, May 2, 2009
111. zfs upgrade
On-disk format of ZFS changes over time
Forward-upgradeable, but not backward
compatible
Watch out when attaching and detaching zpools
Also “sent” not readable by older zfs versions
# zfs upgrade
This system is currently running ZFS filesystem version 2.
The following filesystems are out of date, and can be upgraded. After being
upgraded, these filesystems (and any ’zfs send’ streams generated from
subsequent snapshots) will no longer be accessible by older software
versions.
VER FILESYSTEM
--- ------------
1 datab
1 datab/users
1 datab/users/area51
Copyright 2009 Peter Baer Galvin - All Rights Reserved 111
Saturday, May 2, 2009
112. Automatic Snapshots and Backups
Unsupported services, may become
supported
http://blogs.sun.com/timf/entry/
zfs_automatic_snapshots_0_10
http://blogs.sun.com/timf/entry/
zfs_automatic_for_the_people
Copyright 2009 Peter Baer Galvin - All Rights Reserved 112
Saturday, May 2, 2009
113. ZFS - Smashing!
http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18
Copyright 2009 Peter Baer Galvin - All Rights Reserved 113
Saturday, May 2, 2009
115. Build an OpenSolaris Storage Server in 10 Minutes
http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html
Example 1: ZFS Filesystem
Objectives:
Understand the purpose of the ZFS filesystem.
Configure a ZFS pool and filesystem.
Requirements:
A server (SPARC or x64 based) running the OpenSolaris OS.
Configuration details from the running server.
Step 1: Identify your Disks.
Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here:
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t2d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0
1. c0t3d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0
Specify disk (enter its number): ^D
Copyright 2009 Peter Baer Galvin - All Rights Reserved 115
Saturday, May 2, 2009
116. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Add your disks to your ZFS pool.
# zpool create -f mypool c0t3d0s0
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 10G 94K 10.0G 0% ONLINE -
Step 3: Create a filesystem in your pool.
# zfs create mypool/myfs
# df -h /mypool/myfs
Filesystem size used avail capacity Mounted on
mypool/myfs 9.8G 18K 9.8G 1% /mypool/myfs
Copyright 2009 Peter Baer Galvin - All Rights Reserved 116
Saturday, May 2, 2009
117. Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 2: Network File System (NFS)
Objectives:
Understand the purpose of the NFS filesystem.
Create an NFS shared filesystem on a server and mount it on a client.
Requirements:
Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS.
Configuration details from the running systems.
Step 1: Create the NFS shared filesystem on the server.
Switch on the NFS service on the server:
# svcs nfs/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/nfs/server:default
# svcadm enable nfs/server
Share the ZFS filesystem over NFS:
# zfs set sharenfs=on mypool/myfs
# dfshares
RESOURCE SERVER ACCESS TRANSPORT
x4100:/mypool/myfs x4100 - -
Copyright 2009 Peter Baer Galvin - All Rights Reserved 117
Saturday, May 2, 2009
118. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Switch on the NFS service on the client.
This is similar to the the procedure for the server:
# svcs nfs/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/nfs/client:default
# svcadm enable nfs/client
Mount the shared filesystem on the client:
# mkdir /mountpoint
# mount -F nfs x4100:/mypool/myfs /mountpoint
# df -h /mountpoint
Filesystem size used avail capacity Mounted on
x4100:/mypool/myfs 9.8G 18K 9.8G 1% /mountpoint
Copyright 2009 Peter Baer Galvin - All Rights Reserved 118
Saturday, May 2, 2009
119. Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 3: Common Internet File System (CIFS)
Objectives:
Understand the purpose of the CIFS filesystem.
Configure a CIFS share on one machine (from the previous example) and make it available on the other machine.
Requirements:
Two servers (SPARC or x64 based) running the OpenSolaris OS.
Configuration details provided here.
Step 1: Create a ZFS filesystem for CIFS.
# zfs create -o casesensitivity=mixed mypool/myfs2
# df -h /mypool/myfs2
Filesystem size used avail capacity Mounted on
mypool/myfs 2 9.8G 18K 9.8G 1% /mypool/myfs2
Step 2: Switch on the SMB Server service on the server.
# svcs smb/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/smb/server:default
# svcadm enable smb/server
Copyright 2009 Peter Baer Galvin - All Rights Reserved 119
Saturday, May 2, 2009
120. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Share the filesystem using CIFS.
# zfs set sharesmb=on mypool/myfs2
Verify using the following command:
# zfs get sharesmb mypool/myfs2
NAME PROPERTY VALUE SOURCE
mypool/myfs2 sharesmb on local
Step 4: Verify the CIFS naming.
Because we have not explicitly named the share, we can examine the default name assigned to it using the following command:
# sharemgr show -vp
default nfs=()
zfs
zfs/mypool/myfs nfs=()
/mypool/myfs
zfs/mypool/myfs2 smb=()
mypool_myfs2=/mypool/myfs2
Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown.
Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS.
Add the following line to the end of the file:
other password required pam_smb_passwd.so.1 nowarn
Copyright 2009 Peter Baer Galvin - All Rights Reserved 120
Saturday, May 2, 2009
121. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 6: Change the password using the passwd command.
# passwd username
New Password:
Re-enter new Password:
passwd: password successfully changed for root
Now repeat Steps 5 and 6 on the Solaris client.
Step 7: Enable CIF client services on the client node.
# svcs smb/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/smb/client:default
# svcadm enable smb/client
Copyright 2009 Peter Baer Galvin - All Rights Reserved 121
Saturday, May 2, 2009
122. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 8: Make a mount point on the client and mount the CIFS resource
from the server.
Mount the resource across the network and check it using the following
command sequence:
# mkdir /mountpoint2
# mount -F smbfs //root@x4100/mypool_myfs2 /mountpoint2
Password: *******
# df -h /mountpoint2
Filesystem size used avail capacity Mounted on
//root@x4100/mypool_myfs2 9.8G 18K 9.8G 1% /
mountpoint2
# df -n
/ : ufs
/mountpoint : nfs
/mountpoint2 : smbfs
Copyright 2009 Peter Baer Galvin - All Rights Reserved 122
Saturday, May 2, 2009
123. Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 4: Comstar Fibre Channel Target
Objectives
Understand the purpose of the Comstar Fibre Channel target.
Configure an FC target and initiator on two servers.
Requirements:
Two servers (SPARC or x64 based) running the OpenSolaris OS.
Configuration details provided here.
Step 1: Start the SSCSI Target Mode Framework and verify it.
Use the following commands to start up and check the service on the host that provides the target:
# svcs stmf
STATE STIME FMRI
disabled 19:15:25 svc:/system/device/stmf:default
# svcadm enable stmf
# stmfadm list-state
Operational Status: online
Config Status : initialized
Copyright 2009 Peter Baer Galvin - All Rights Reserved 123
Saturday, May 2, 2009
124. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Ensure that the framework can see the ports.
Use the following command to ensure that the target mode framework can see the HBA ports:
# stmfadm list-target -v
Target: wwn.210000E08B909221
Operational Status: Online
Provider Name : qlt
Alias : qlt0,0
Sessions : 4
Initiator: wwn.210100E08B272AB5
Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B072AB5
Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60
Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008
Copyright 2009 Peter Baer Galvin - All Rights Reserved 124
Saturday, May 2, 2009
125. Build an OpenSolaris Storage Server in 10 Minutes - cont
Target: wwn.210100E08BB09221
Operational Status: Online
Provider Name : qlt
Alias : qlt1,0
Sessions : 4
Initiator: wwn.210100E08B272AB5
Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B072AB5
Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60
Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008
Copyright 2009 Peter Baer Galvin - All Rights Reserved 125
Saturday, May 2, 2009
126. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Create a device to use as storage for the target.
Use ZFS to create a volume (zvol) for use as the storage behind the
target:
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 68G 94K 68.0G 0% ONLINE -
# zfs create -V 5gb mypool/myvol
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 5.00G 61.9G 18K /mypool
mypool/myvol 5G 66.9G 16K -
Copyright 2009 Peter Baer Galvin - All Rights Reserved 126
Saturday, May 2, 2009
127. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 4: Register the zvol with the framework.
The zvol becomes the SCSI logical unit (disk) behind the target:
# sbdadm create-lu /dev/zvol/rdsk/mypool/myvol
Created the following LU:
GUID DATA SIZE SOURCE
6000ae4093000000000047f3a1930007 5368643584 /dev/zvol/rdsk/mypool/
myvol
Confirm its existence as follows:
# stmfadm list-lu -v
LU Name: 6000AE4093000000000047F3A1930007
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/mypool/myvol
View Entry Count : 0
Copyright 2009 Peter Baer Galvin - All Rights Reserved 127
Saturday, May 2, 2009
128. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs.
Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303
Copyright 2009 Peter Baer Galvin - All Rights Reserved 128
Saturday, May 2, 2009
129. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs.
Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303
. . .
Copyright 2009 Peter Baer Galvin - All Rights Reserved 129
Saturday, May 2, 2009
130. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA
ports to it.
Name the group mygroup:
# stmfadm create-hg mygroup
# stmfadm list-hg
Host Group: mygroup
Add the WWNs of the ports to the group:
# stmfadm add-hg-member -g mygroup wwn.210000E08B096A60
wwn.210100E08B296A60
wwn.210100E08B272AB5
wwn.210000E08B072AB5
Now check that everything is in order:
# stmfadmlist-hg-member -v -g mygroup
With the host group created, you're now ready to export the logical unit. This is accomplished by
adding a view entry to the logical unit using this host group, as shown in the following command:
# stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007
Copyright 2009 Peter Baer Galvin - All Rights Reserved 130
Saturday, May 2, 2009
131. Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 7: Check the visibility of the targets on the initiator host.
First, force the devices on the initiator host to be rescanned with a simple
script:
#!/bin/ksh
fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln
do
fcinfo remote-port -p $ln -s >/dev/null 2>&1
done
The disk exported over FC should then appear in the format list:
# format
Searching for disks...done
c6t6000AE4093000000000047F3A1930007d0: configured with
capacity of 5.00GB
Copyright 2009 Peter Baer Galvin - All Rights Reserved 131
Saturday, May 2, 2009
132. Build an OpenSolaris Storage Server in 10 Minutes - cont
...
partition> p
Current partition table (default):
Total disk cylinders available: 20477 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 511 128.00MB (512/0/0) 262144
1 swap wu 512 - 1023 128.00MB (512/0/0) 262144
2 backup wu 0 - 20476 5.00GB (20477/0/0) 10484224
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 1024 - 20476 4.75GB (19453/0/0) 9959936
7 unassigned wm 0 0 (0/0/0) 0
partition>
Copyright 2009 Peter Baer Galvin - All Rights Reserved 132
Saturday, May 2, 2009
133. ZFS Root
Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file
system (as does OpenSolaris)
Note that you can’t as of U6 flash archive a ZFS root system(!)
Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and
upgrading there, then booting there
lucreate to copy the primary BE to create an alternate BE
# zpool create mpool mirror c1t0d0s0 c1t1d0s0
# lucreate -c c1t2d0s0 -n zfsBE -p mpool
The default file systems are created in the specified pool and the non-shared file
systems are then copied into the root pool
Run luupgrade to upgrade the alternate BE (optional)
Run luactivate on the newly upgraded alternatve BE so that when the system is
rebooted, it will be the new primary BE
# luactivate zfsBE
Copyright 2009 Peter Baer Galvin - All Rights Reserved 133
Saturday, May 2, 2009
134. Life is good
Once on ZFS as root, life is good
Mirror the root disk with 1 command (if not mirrored):
# zpool attach rpool c1t0d0s0 c1t1d0s0
Note that you have to manually do an installboot on the
mirrored disk
Now consider all the ZFS features, used on the boot disk
Snapshot before patch, upgrade, any change
Undo change via 1 command
Replicate to another system for backup, DR
...
Copyright 2009 Peter Baer Galvin - All Rights Reserved 134
Saturday, May 2, 2009
135. ZFS Labs
What pools are available in your zone?
What are their states?
What is their performance like?
What ZFS file systems?
Create a new file system
Create a file there
Take a snapshot of that file system
Delete the file
Revert to the file system state as of the snapshot
How do you see the contents of a snapshot?
Copyright 2009 Peter Baer Galvin - All Rights Reserved 135
Saturday, May 2, 2009
136. ZFS Final Thought
Eric Schrock's Weblog - Thursday Nov 17, 2005
UFS/SVM vs. ZFS: Code Complexity
A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People
tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment
to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that
UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being
bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging
effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is
considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a
filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true
measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate
yields:
UFS: kernel= 46806 user= 40147 total= 86953
SVM: kernel= 75917 user=161984 total=237901
TOTAL: kernel=122723 user=202131 total=324854
ZFS: kernel= 50239 user= 21073 total= 71312
The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to
be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code
(kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what
those ZFS numbers will look like in 20 years...
Copyright 2009 Peter Baer Galvin - All Rights Reserved 136
Saturday, May 2, 2009
137. Copyright 2009 Peter Baer Galvin - All Rights Reserved 137
Saturday, May 2, 2009
138. Where to Learn More
Community: http://www.opensolaris.org/os/community/zfs
Wikipedia: http://en.wikipedia.org/wiki/ZFS
ZFS blogs: http://blogs.sun.com/main/tags/zfs
ZFS ports
Apple Mac: http://developer.apple.com/adcnews
FreeBSD: http://wiki.freebsd.org/ZFS
Linux/FUSE: http://zfs-on-fuse.blogspot.com
As an appliance: http://www.nexenta.com
Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/
features/articles/zfs_overview.jsp
Copyright 2009 Peter Baer Galvin - All Rights Reserved 138
Saturday, May 2, 2009
139. Sun Storage 7x10
Copyright 2009 Peter Baer Galvin - All Rights Reserved 139
Saturday, May 2, 2009
140. Speaking of Futures
The future of Sun storage?
Announced 11/10/2008
Copyright 2009 Peter Baer Galvin - All Rights Reserved 140
Saturday, May 2, 2009