3. 3
Problem description
The customer has setup a new SLES11sp4 2 node
cluster and is running some application tests on it,
they see the file system periodically hangs up and
processes get into a "D" state.
All processes stuck in "D" state were in the ocfs2_cluster_lock code. for example,
[<ffffffffa066f800>] __ocfs2_cluster_lock+0x3b0/0xa60 [ocfs2]
[<ffffffffa0677528>] ocfs2_inode_lock_full_nested+0x178/0x510 [ocfs2]
[<ffffffffa06ec791>] ocfs2_get_acl+0x61/0x120 [ocfs2]
[<ffffffffa06ec95a>] ocfs2_acl_chmod+0x6a/0xe0 [ocfs2]
[<ffffffffa0681121>] ocfs2_setattr+0x671/0xab0 [ocfs2]
[<ffffffff8117de8e>] notify_change+0x17e/0x2d0
[<ffffffff8116136c>] sys_fchmodat+0xdc/0x150
[<ffffffff8147c187>] sysenter_dispatch+0x7/0x32
[<ffffffffffffffff>] 0xffffffffffffffff
4. 4
Interact with the customer
• Mail communication
Make sure the ocfs2 cluster setup is correct.
Understand the customer application scenarios.
Provide tentative suggestions/patches.
• Remote session with the customer
Reproduce bug.
Find ocfs2 related hung processes.
Collect the related data.
5. 5
Collect data from the customer site
• supportconfig/hb_report
SLES HA cluster related data.
• dlm_tool
DLM lock related dump.
• o2image
OCFS2 file system meta-data image.
• echo "c" > /proc/sysrq-trigger
Linux core dump file.
6. 6
Generate core dump in HA cluster
• Why is no Linux core dump left after trigger panic?
Since the fence mechanism resets the machine when
it is doing the Kdump.
• Solutions
1) use stonith:fence_kdump resource agent
please refer to SLE-HA-guide document for more
details.
2) disable hardware watchdog and use soft watchdog
see the detailed steps on the next page.
7. 7
Use soft watchdog temporarily
• Disable hardware watchdog
edit /etc/modprobe.conf file, to add two lines to disable
loading the related kernel modules. (Note: this step
depends on your machine's hardware watchdog
configuration)
blacklist iTCO_wdt
blacklist iTCO_vendor_support
• Enable soft watchdog
edit /etc/init.d/boot.local file, to add one line to load
soft watchdog kernel module at boot.
modprobe softdog
• Reboot the machine to take effect
9. 9
Prepare crash analysis environment
• Crash-setup
This tools can help you set up a crash analysis environment quickly in L3 server according
to the vmcore file, but the access speed is very slow from Beijing site, and HA related
KMP debuginfo/debugsource rpms are missed.
• By yourself
Install the related debuginfo/debugsource rpms
kernel-default-3.0.101-108.68.1
kernel-default-devel-3.0.101-108.68.1
kernel-default-base-3.0.101-108.68.1
kernel-default-debugsource-3.0.101-108.68.1
kernel-default-debuginfo-3.0.101-108.68.1
ocfs2-kmp-default-1.6_3.0.101_63-0.23.40
ocfs2-debugsource-1.6-3.0.101_63-0.23.40
ocfs2-debuginfo-1.6-3.0.101_63-0.23.40
14. 14
Check DLM lock dump
From DLM lock dumps of two nodes, we can find
node04(this DLM lock resource master) has given a
PR Meta lock grant of inode 14797221(0xe1c9a5) to
one process.
19. 19
Root cause
The root cause is the process 31017, which had got
the inode(14797222) DLM EX lock at ocfs2_setattr(),
then the process tried to get the inode DLM PR lock at
ocfs2_get_acl() again, the recursive lock recursive led
to a dead-lock. Then, the related processes among
the cluster were blocked.
The fix patches are as below,
commit 439a36b8ef38657f765b80b775e2885338d72451
Author: Eric Ren <zren@suse.com>
Date: Wed Feb 22 15:40:41 2017 -0800
ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
commit b891fa5024a95c77e0d6fd6655cb74af6fb77f46
Author: Eric Ren <zren@suse.com>
Date: Wed Feb 22 15:40:44 2017 -0800
ocfs2: fix deadlock issue when taking inode lock at vfs entry points
commit 8818efaaacb78c60a9d90c5705b6c99b75d7d442
Author: Eric Ren <zren@suse.com>
Date: Fri Jun 23 15:08:55 2017 -0700
ocfs2: fix deadlock caused by recursive locking in xattr
21. 21
The fix process
• Find kernel patches (from the upstream/yourself).
• Test the patches based on the customer version.
Pass ocfs2 test suits.
• Create the fix branch.
e.g. origin/users/ghe/SLE12-SP4/bsc1128902
• L3 creates the corresponding PTF rpm.
• The customer verifies the PTF rpm.
• Submit the patches to the upstream if they are new.
• Add the patches to SUSE kernel-source.
• Close the bug from SUSE bugzilla.
22. 22
SUSE kernel source maintenance
• Kernel-source
url: user@kerncvs.suse.de:/home/git/kernel-source.git
Linux tarball plus lots of patches
• Kernel
url: git://kerncvs.suse.de/kernel.git
SUSE Linux kernel source (patches applied)
• Code branches for various SLES versions.
origin/SLE12-SP4
origin/SLE15-SP1
origin/SLE15-SP1-UPDATE
...
• Automatically propagate among branches.
http://kerncvs.suse.de/
24. 24
Add patch to SUSE kernel-source
• Format patch from the Linus git
cd /torvalds
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git format-patch commit-id -1
• Add three keywords to the patch, e.g.
Patch-mainline: v4.11-rc1
Git-commit: b891fa5024a95c77e0d6fd6655cb74af6fb77f46
References: bsc#1086695
Note: the patch must include at least one SUSE related e-mail address.
• Set LINUX_GIT environment variable
This variable points to your local Linus git directory, e.g. LINUX_GIT=/torvalds/linux
• Push the patch to SUSE kernel-source, e.g.
git checkout -b users/ghe/SLE12-SP2/for-next origin/SLE12-SP2
./scripts/git_sort/series_insert.py patches.fixes/ocfs2-try-to-reuse-extent-block-in-dealloc-without-m.patch
git add patches.fixes/ocfs2-try-to-reuse-extent-block-in-dealloc-without-m.patch
./scripts/log
git push -v ssh://ghe@kerncvs.suse.de/srv/git/kernel-source.git users/ghe/SLE12-SP2/for-next
• Reference
https://pes.suse.de/L3/Kernel_git_repositories/