Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Stretched Cluster and Active File Management
1. Spectrum Scale 4.1 System Administration
Spectrum Scale
Active File Management (AFM)
Bringing data together across clusters
2. Unit objectives
After completing this unit, you should be able to
• Describe the value of Active File Management (AFM)
• Describe Home and Cache Relationship & Features
• Understand some Client Leveraged Use Cases
• List the various AFM modes and Relationship
• Learn how to create and manage an AFM relationship
4. Spectrum Scale
introduced
concurrent file
system access from
multiple nodes.
Multi-cluster expands the global
namespace by connecting multiple
sites
AFM takes global namespace truly
global by automatically managing
asynchronous replication of data
GPFS
GPFS
GPFS
GPFS
GPFS
GPFS
1993 2005 2011
Evolution of the global namespace: AFM
• Active file management (AFM)
5. 5
IBM Spectrum Scale central site can be a
source where data is created, maintained,
updated/changed.
Central site can push data to edge sites for
WAN optimization
Remote sites can periodically pre-fetch (via
policy) or pull on demand
Data is revalidated when accessed
(staleness check)
Remote sites can be primary (write) owners
and send data back to the central site
Central Data Site has all the directories and
backup/HSM will be managed out of this
site.
Local or long distance users sharing a
dedicated home file system but with
individual home directory
IBM Active File Manager (AFM): unique 21st Century Advanced Global Functionality
On Demand
Push or pull
Can read
or write
Central office / Branch office
Ingest / disseminate
Collaboration in the Cloud
Backup Integration
Central Data
Site
Single Global Name Space
With Global Distribution
Will be
refreshed
Revalidated
copy
By establishing an automated relationship between clusters,
Access to files from anywhere, as if they were local.
8. Synchronous operations (cache validate/miss)
• On a cache miss, we pull attrs and create it local “on demand” (lookup, open, …)
• In case, where cache is setup with empty home, there shouldn’t be any sync ops.
• On a later data read
• Whole file is fetched over NFS and written locally
• The Data read is done in parallel across multiple nodes
• Applications can continue after required data is in cache while the remaining file is
being fetched
• On a cache hit
• The Attributes are revalidated based on revalidation delay
• If data hasn’t changed it is read locally
• On a disconnected node access
• Data access to cached data will fetch local data only
• Files not cached are returned as a not existing error (ENOENT)
• Files written locally will do a lazy sync back to the home site when reconnected.
9. Asynchronous Updates (write, create, remove)
• Updates at the cache site are pushed back lazily (Async)
• The Masks the latency of the WAN
• Data is written to Spectrum Scale at cache site synchronously
• Writeback is asynchronous
• We do provide Configurable asynch delay
• Writeback coalesces updates and accommodates out-of-order
and parallel writes
• Filter I/O as needed (Rewrites to same blocks)
• The Admin can force a sync if needed
mmafmctl --flushPending
10. Active File Management (AFM) Cache
The other side of the relationship is called
the Home or Target (same thing, two
names). A cache is a property of a fileset
and is defined when you create the fileset.
AFM Cache Facts
There is one Home relationship per cache files
The relationship between a Cache and Home is one to one – All a cache knows about is
it’s Home. A Home does not know a cache even exists.
The cache does all the work – The cache checks the Home for changes and sends
updates to the Home. How a cache behaves is determined by the cache mode.
There are four cache modes
Read-Only (ro),
Local-Update (lu),
Single-Writer (sw)
Independent Writer (iw).
Calling this a “cache” may be selling it a little short. Inode and file data in a cache
fileset is the same as an inode and file data in any Spectrum Scale file system. It is a
“real” file stored on disk, the job of the cache is to keep the data in the file
consistent, at some level, with the data on the other side of the relationship.
The Home Site
15. Notes on AFM Modes
• Single Writer
– Only cache can write data. Home can’t change.
– Peer cache needs to be setup as read only
• Read Only
– Cache site can only read data, no data change allowed from the cahce.
• Local Update
– Data is cached from the home and changes are allowed like if SW mode however, changes are
not pushed back to the home.
– Once data is changed the relationship of that data is broken i.e cache and home are no longer in
sync for that file.
• Independent Writer
– Data can change at the home and at any caches
– Different caches can change different files
• Changing Modes
– SW, IW & RO mode cache can be changed to any other mode
– LU cache can’t be changed (because it assumes data will be different)
17. 1
Network Infrastructure
• AFM uses the following network services on specific ports
– Check and make sure that the network infrastructure and firewalls have the following
ports open between the clusters
– 1081 for HTTP, 22 for SSH, 2049 for NFS, 32767 for NFS mount.
Tips:
– In any network, there can be man-in-the-middle firewalls, blocked ports, and/or port
mapping within the infrastructure
– Watch out for situations where port mapping could change if the port goes idle for a
period of time, or where a firewall may close a port due to port idle.
– Plan for and allow time to research, coordinate, find and resolve any of these types of
networking issues.
– Other than these requirements, AFM will run on a standard network infrastructure that
supports NFSv3
– Allow time for network admins to apply standard TCP/IP tuning expertise, such as setting
window sizes and tuning network buffers
• Confirm that ssh logon to remote sites is acceptable
– AFM requires ssh logon to remote sites. Cannot use AFM if ssh not acceptable.
21. Pre-fetching (data is proactively populated)
• Prefetch Files selectively from home to cache
• Runs asynchronously in the background
• Parallel multi-node prefetch (new in 4.1)
• Metadata-only without fetching files (new in 4.1)
• User exit when completed
• You can choose the files to prefetch based on the policy
For Example:
Make a file list using a simple LIST RULE via policy if the home is GPFS,
or using find or ls -lR or any similar tool, and feed this file list to
mmafmctl --prefetch --ns.
This will populate the directory tree in the fileset.
The administrator can migrate either some files selectively or all files
using mmafmctl --prefetch --filelist.
23. Cache Eviction (data on cache is expired / removed)
• Use when
– Cache smaller than home
– Data fills up in cache faster than it can be pushed to home.
– Need to create space for caching other files or space for incoming writes.
– Eviction is linked with fileset quotas.
• For RO fileset cache eviction is triggered automatically
– When fileset usage level goes above fileset soft quota limits
– Chooses files based on LRU
– Files with unsynched data are not evicted
• Eviction can be disabled
• It can be triggered manually
mmafmctl Device evict -j FilesetName
afmEnableAutoEviction
This AFM configuration attribute enables eviction on a given fileset. A yes value specifies that
eviction is allowed on the fileset. A no value specifies that eviction is not allowed on the fileset.
27. Expiration of Data (preventing access to stale data)
• Staleness Control
• Defined based on time since disconnection
• Once cache is expired, no access is allowed to cache
• Manual expire/unexpire option for admin
• mmafmctl –expire/unexpire
• Allowed only for ro mode cache
This prevents access to stale data, where
staleness is defined by the amount of time
that the WAN cache is out of synchronization
with data at the home site.
32. Independent Writer
• Multiple cache filesets can write to single home, as long as
each cache writes to different files.
• The multiple cache sites do re-validation periodically and pull
the new data from home.
• In case, multiple cache filesets write to same file, the sequence
of updates is un-deterministic.
• The writes are pushed to home as they come in independently
because there is no locking between clusters.
• Use Case: (As with unique users at each site updating files in
their home directory).
33. New in Spectrum Scale 4.1
• Spectrum Scale Backend using Spectrum Scale multi-cluster
• Parallel I/O
• Using multiple threads and multiple nodes per file
• Better handling of GW node failures
• Various usability improvements
34. Some Restrictions to Consider
• Hard links
• Hard links at home are not detected
• Creation of hard link in cache are maintained
• The following are NOT supported/cached/replicated
• Clones
• Special files like sockets or device files
• Fileset metadata like quotas, replication parameters, snapshots etc.
• Renames
• Renames at home will result in remove/create in cache
• Locking is restricted to cache cluster only
• Independent Filesets only (NO file system level AFM setup)
• Dependent filesets can’t be linked into AFM filesets
• Peer snapshots are supported only in SW mode
36. NAS client
AFM
(configured
as primary)
AFM
(configured as
secondary)
Push all updates
asynchronously
Client
switches to
secondary on
failure
AFM based DR – 4.1++
Supported in TL2
• Replicate data from primary to secondary site.
• The relationship is Active-Passive (primary – RW,
secondary – RO)
• Allow primary to operate actively with no interruption when
the relationship with secondary fails
• Automatic failback when primary comes back
• Granularity at fileset level
• RPO and RTO support – from min to hours
(depends on data rate change/link bw etc)
Not supported in TL2
• No Cascading mode aka no teritiary and only one
secondary allowed per relationship
• Posix only ops, no appendOnly support
• No file system level support
• Continue with present limitation of not allowing to link
dependent filset inside panache fileset
• No Metadata replication (dependent filesets, user
snapshots, fileset quotas, user quotas, replication factor,
other fileset attributes, support direct io setting)
37. Take a fileset
snapshot at master
Mark the point in
time in the “write-
back queue”
Push all updates
upto point in time
marker
Take a snapshot of
fileset at the replica
Update mgmnt tool
state of last snapshot
time and ids
AFM
(configured as home-
replica)
Push all updates
asynchronously
Continuous replication
with snapshot support
AFM
(configured as
home-master)
1
1
3
Multi site
snapshot
mgmnt tool
SONAS-SPARK
2
1
2
3
Snapshot at cache and home
correspond to same point in time
Psnap Consistent Replication
38. Basics to DR Configuration
• Establish primary-secondary relationship
• Create AFM fileset at primary and associate with the DR Secondary fileset
• Provides DRPrimaryID that should be used when setting up
DRSecondary
• Initialization phase
• Truck the data from the primary to secondary if necessary
• Initial Trucking can be done via AFM or out of band (customer choosen
method like tape etc)
• Normal operation
• Async replication will continuously push data to secondary based on
asyncDelay
• Psnap support to get common consistency points between primary and
secondary.
• Done periodically based on RPO
39. On DR Event (here is what happens)
• Primary Failure
– Promote Secondary to DRPrimary
– Restore data from last consistency point (RPO snapshot)
• Secondary Failure
– Establish a new secondary
– mmafmctl –setNewSecondary
– Takes a initial snapshot and pushes data to new secondary in the background
– RPO snapshot will start after intial sync
• Failback to old primary
– Restore to last RPO snapshot(similar to whats done on secondary during its promotion to
primary)
– Find changes made at secondary and apply back to original primary.
– Incremental or once
– Needs down time in the last iteration to avoid any more changes
– Revert the primary secondary modes