SlideShare uma empresa Scribd logo
1 de 25
RCA for Linux Reboot
Satish Navkar
Agenda
Importance
General Approach
Technical Approach
Advanced Investigation
Preparing for future
Common issues
Why RCA is important
Business Impact
Loss of money due to outages.
Disruption in availability of services.
Risk of re-occurrence of the issue.
Finding the culprit behind the scene.
Security breach or human error.
General Approach (Non-Technical)
The RCA is a method of problem solving.
There can be more than one root cause behind the issue.
Purpose is to identify solution or workaround, to prevent
reocurrence at lowest cost and simplest way.
RCFA (Root Cause Failure Analysis) recognizes that complete
prevention of recurrence by one corrective action is not always
possible.
Famous methods/tools - 5 whys, Pareto analysis, Cause and
effect model etc.
Technical Approach (Basic)
Whether unexpected reboot is effect of some planned activity ?
Was there any recent configuration changes (sw/hw) ?
What does my recent logs suggest ?
Any unususal behaviour (or logs) spotted ? (console)
Is there some relation between the occurence of the events ?
Do we have a reliable power source ? (UPS)
Step Forward
Is it a virtual or physical system ?
Check logs recorded by hyper-visor and/or hardware. (mcelog,
IML logs, ASR events, hyper-visior utilization etc.)
Is this part of some cluster ? Any fencing event recorded ?
Try to find whether its a real OS issue; Or its related to
application/network/storage ?
Is this result of some malecious script running on the system ?
Is there any anti-virus installed and running on the system ?
What all panic parameters are set on the system ?
Deep Diving
Is there any known bug with running kernel ? (search bugzilla)
Is this issue reproducible on demand ? Any possible
workaround ?
Does replica of system exhibit similar behaviour ? Compare
initramfs of replicas to find out any differences.
Is there some known issue with the combination of the OS
version and a perticular application running on the system ?
Any sign of abnormal resource utilization near the event ?
Whether complete/partial dump is captured for the reboot ?
Check vmcore-dmesg.txt logs and try to find known issues on
vendor portal.
Server Hung Scenario
Do not confuse it with application hang scenario. Do all
checks. There is no standard defination for OS hung situation.
Some facts regarding crash vs hang situations:
• Crash is often immidiately follow a problem in kernel space. Like : Programming
error, Defective hardware, Unsupported operation etc.
• During crash oops messages are displayed and it helps in diagnosis
• Crash or panic is easier to troubleshoot. It provides stack trace and panic task
details.
• System hang are more subtle. It can be the result of simply temporary
performance issue caused by inefficient algorithms or as complicated as dead
locks.
• No oops messages displayed on console, dont know what thread caused hang.
Hence it makes hang issue more complicated to analyze.
Take a snapshot of virtual guest and extract memory dump.
Or trigger panic using available panic techniques, make sure
panic initiate’s the memory dump mechanism.
User Initiated
 The "exiting on signal 15" message is the last message that syslog
service emits during normal shutdown.
 The presence of this message in the messages file indicates a directed
shutdown of the system. Either from a user or a program.
 Is there any system health monitoring software running which may issue
the 'shutdown' command ? For ex :
• Automatic system recovery software.
• Hardware monitoring tools.
• UPS software with shutdown capability etc.
 How to find which user it was -- set audit rules or use script.
 Check secure logs & bash history of users for shutdown event.
Cluster Initiated
 Cluster reboots system using fencing mechanism. Common clustering
softwares are : Oracle clusterware, VCS & RedHat Cluster etc.
 Unlike many common thoughts, high-availability is not the highest priority
of an HA cluster, but only the 2nd one.
 There are two classes of fencing methods, one - which disables a node
itself, the other - disallows access to resources such as shared disks.
 Cluster can fall victim to conditions called Split Brain and Amnesia.
Clusters use a process called “STONITH” in order to correct the issue;
this simply means the healthy nodes kill the sick node.
 I/O fencing is one of the important feature of VCS, whereas Oracle-RAC
simply gives the message - "Please Reboot" to the sick node. The node
bounces itself and rejoins the cluster. RedHat cluster uses fence device
configuration to handle fencing events.
 One can also set fence delay to allow OS to capture vmcore for fencing
events.
Hardware Faults
 The most common hardware errors that are captured on the system are:
• Memory errors or Error Correction Code (ECC) problems.
• Inadequate cooling / processor over-heating.
• System bus errors. Cache errors in the processor or hardware.
• Firmware bugs, EDAC and NMI’s.
 The kernel does the immediate actions (like killing processes etc.) and
mcelog decodes the errors.
 The mcelog is the user space backend for logging machine check errors
reported by the hardware to the kernel.
• Seen MCE error : HARDWARE ERROR. This is *NOT* a software problem!”
Panic Parameters
These are used to deliberately panic system, when certain
conditions are met. It is necessary for debugging purpose
• 1) kernel.hung_task_panic
• 2) kernel.softlockup_panic
• 3) vm.panic_on_oom: This parameter will panic the kernel on oom-killer
events and capture a vmcore if kdump service is running as expected.
• 4) kernel.panic_on_io_nmi
• 5) kernel.unknown_nmi_panic: It utilizes NMI switch capability to force a
kernel panic on a hung system. This feature makes use of the computer's NMI
switch to trigger a panic.
• 6) kernel.panic_on_oops
• 7) kernel.panic_on_unrecovered_nmi
• 8) kernel.nmi_watchdog: The NMI watchdog monitors system interrupts and
initiates a reboot if the system appears to have hung.
• 9) kernel.panic_on_stackoverflow
• 10) kernel.panic [secs]
Panic Strings
 These panic strings explain cause of the panic. But it is not always
sufficient to determine the actual cause.
 When a kernel panic occurs, the system usually displays a message on
the console and all the system activity stops’
• Kernel BUG at net/sunrpc/sched.c:695!
• BUG: unable to handle kernel paging request at xxxxx
• BUG: unable to handle kernel NULL pointer dereference at xxxxx / (null)
• divide error: 0000 [#1] SMP
• Kernel panic – not syncing: softlockup: hung tasks / hung_task: blocked tasks
• Kernel panic – not syncing: Watchdog detected hard LOCKUP on cpu 0
• Kernel panic – not syncing: out of memory, panic_on_oom is selected
• Kernel panic – not syncing: Out of memory and no killable processes..
• Kernel panic – not syncing: An NMI occurred, please see the Integrated Management Log for
details.”
• Kernel panic – not syncing: NMI IOCK error: Not continuing / NMI: Not continuing / nmi watchdog
• Kernel panic – not syncing: Fatal Machine check
• Kernel panic – not syncing: Attempted to kill init !
• Kernel panic – not syncing: GAB: Port h halting system due to client process failure
Kernel logging
 Syslog is a standard logging facility. It collects messages of various
programs and services including the kernel, and stores them, depending
on setup, in a bunch of log files typically under /var/log.
 The “/var/log/messages” aims at storing valuable, non-debug and non-
critical messages. This log should be considered the "general system
activity" log.
 Administrators use log rotation facility to maintain historical data. One
can also change the logging level based on the requirement of the setup.
# Common call traces seen in messages are :
• OOM-killer and memory stats.
• Softlockup logs for various cores.
• Page allocation failures.
• Segfaults : Signifies an error in one particular process.
kernel: fmg[6335]: segfault at 0xffffd2dc rip 0xffffd2dc rsp 00000000ffffd1bc errorX
• Trap divide error : Application crash due to “divide by zero”
kernel: nmupm[2792] trap divide error rip:804a39a rsp:ffa4eb24 error:X
OOM call traces
 The out_of_memory function is called when the system memory
(including swap) has been fully allocated to a point where regular system
activities cannot be performed until some of that memory is freed.
 The mm/oom_kill.c terminate one or more processes based on badness()
score; which follows an algorithm that does not kill any innocent task.
<snip/>
Node 0 DMA: 3*4kB 2*8kB 2*16kB 3*32kB 2*64kB 2*128kB ... 3*4096kB = 15132kB
Node 0 DMA32: 452*4kB ..
Node 0 Normal: 13315*4kB .. <<<
[..]
Free swap = 0kB <<<
Total swap = 8388604kB
[..]
kernel: httpd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 <<<
kernel:
kernel: Call Trace:
[<ffffffff800c3a6a>] out_of_memory+0x8e/0x2f5
[<ffffffff8000f2eb>] __alloc_pages+0x245/0x2ce
[<ffffffff80012a62>] __do_page_cache_readahead+0x95/0x1d9
</snip>
D-state call traces
 These messages serve as a warning that something may not be
operating optimally. They do not necessarily indicate a serious problem
and any blocked processes should eventually proceed when the system
recovers.
 The “khungtaskd” has the ability to detect tasks stuck in D-state (
Uninterruptible Sleep (UN) ) longer than a specified time period and
results in following type of message in system log:
<snip/>
INFO: task syslogd:2643 blocked for more than 120 seconds. <<<
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <<<
syslogd D ffff81000237eaa0 0 2643 1 2646 2634
(NOTLB) <<<
ffff8101352c3d88 0000000000000086 ffff8101352c3d98 ffffffff80063ff8
0000000000001000 0000000000000009 ffff81013d2c57e0 ffff810102ac1820
0000340b30708992 0000000000000571 ffff81013d2c59c8 000000010000089f
Call Trace: <<<
[<ffffffff80063ff8>] thread_return+0x62/0xfe
[...]
[<ffffffff8005e28d>] tracesys+0xd5/0xe0
</snip>
Soft-lockup call traces
 Soft lockups are situations in which the kernel's scheduler subsystem has
not been given a chance to perform its job.
 It can be caused by defects in the kernel, by hardware issues or by
extremely high workloads.
<snip/>
kernel: BUG: soft lockup - CPU#7 stuck for 206s! [sosreport:14372] <<<
kernel: Modules linked in: rpcsec_gss_krb5 nfsd..vsock(U) ipv6 .. vmware_balloon .. vmxnet3 ..
dm_mod [last unloaded: speedstep_lib] <<<
[..]
/440BX Desktop Reference Platform
kernel: RIP: 0010:[<ffffffff81162cbd>] [<ffffffff81162cbd>] s_show+0x1ad/0x330 <<<
kernel: RSP: 0018:ffff8801e482fd98 EFLAGS: 00000202
kernel: RAX: 0000000000000000 RBX: ffff8801e482fe18 RCX: ffff88043febfb80 <<<
kernel: RDX: 0000000000000000 RSI: 00000000000036a7 RDI: ffff88043febfb60
[...]
kernel: <d> 00000000000036a7 ffff880437830f00 ffff8801e482fe18 ffff88031e3f1640
kernel: Call Trace:
kernel: [<ffffffff8119db87>] ? seq_read+0x267/0x3f0 <<<
kernel: [<ffffffff81054c30>] ? __dequeue_entity+0x30/0x50 .....
</snip>
Page allocation failures
 The kernel frequently needs to allocate chunks of memory for the
temporary storage of data and structures. Sometimes allocations
demands many physically contiguous pages which may not always be
available. In times like this memory allocator may choose to fail the
allocation request.
 Common cause are memory-crunch, memory-fragmentation, memory-
zone exhausted and drivers with different service routines.
• Usual workaround is to check the value of vm.min_free_kbytes and double it. Also
setting vm.zone_reclaim_mode to 0 can help to avoid memory congestion issues .
</snip>
kernel: swapper: page allocation failure. order:2, mode:0x20 <<<
kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1
kernel: Call Trace:
kernel: <IRQ> [<ffffffff81123daf>] ? __alloc_pages_nodemask+0x77f/0x940
kernel: [<ffffffff8115dc62>] ? kmem_getpages+0x62/0x170
kernel: [<ffffffff8115e87a>] ? fallback_alloc+0x1ba/0x270
kernel: [<ffffffff8115e2cf>] ? cache_grow+0x2cf/0x320
kernel: [<ffffffff8115e5f9>] ? ____cache_alloc_node+0x99/0x160 ...
</snip>
SysRq
 It is a 'magical' key combo that you can hit, and to which the kernel will
respond regardless of whatever else it is doing, even if the console is
unresponsive.
 The sysrq key is one of the best (and sometimes the only) way to
determine what a machine is really doing. It is useful when a server
appears to be "hung" or for diagnosing elusive, transient, kernel-related
problems.
 For security reasons, SysRq key is disabled by default.
• Because enabling sysrq gives someone with physical console access an extra
abilities. It is recommended to disable it when not troubleshooting a problem or
to ensure that physical console access is properly secured.
 There are several sysrq events(and ways) that can be triggered once the
sysrq facility is enabled.
• # echo h > /proc/sysrq-trigger
Commonly used options are :
• m - dump info about memory allocation
• t - dump thread state information
• c - intentionally crash the system
Kdump
 Kdump is mechanism that uses kexec to capture the crash dump. Crash
dump is also known as “vmcore” it can be captured using -
kdump/diskdump/netdump/xendump/LKCD/vmss2core etc.
 kexec is a fastboot mechanism that allows booting a Linux kernel from
the context of an already running kernel without going through the BIOS.
 Crash dump captures the state of the kernel at the moment of panic. It is
a snapshot of the physical memory at the time of crash.
• Vmcore can be collected by using following methods :
• Automatically  when kernel panics (parameters) or oops. It can be due to Bug in
kernel or in third party driver. In case of memory corruption and hardware problems
• Manually  when admin uses sysrq, NMI switch or by taking snapshot.
• Limitations of vmcore: Not useful for analysing healthy system; It cant capture
historical logs; It is complex and requires expertise to analysis it.
• Configuring kdump and starting service is not sufficient, testing kdump is must.
Also find out supported and unsupported kdump target for perticular OS vendor.
• There are multiple factors that affect vmcore generation, ex : Clustering, HP-
systems, Bonding, Network-cards/modules, virtualization etc.
Bugs
 A software bug is a failure or flaw in a program that produces undesired
or incorrect results. It’s an error that prevents the application from
functioning as it should.
 There are many reasons for software bugs. Most common reason is
human mistakes in software design and coding.
 The BUG_ON() function acts similar to panic, but is called by intentional
code meant to check abnormal conditions.
 The vmcore and vmcore-dmesg.txt helps to identify bugs. Bugs can be
in any software, but bug in device drivers or in kernel can cause outages.
 A kernel bug example is - divide by zero in find_busiest_group() function
causing kernel panic in RHEL6 kernels.
 A deadlock bug in “vmtoolsd” causing system hung - is an example of
external software bug leading to system panic condition.
Preparing for Future
Configure kdump on all systems. It has no side effects.
Configure audit rules based on business requirements.
Properly configure the cluster setting and test it.
Tune system as per guidelines of Application vendor.
Be ready with backup plan.
Patch regularly.
Visual Summary 
Visual Summary 
Q&A
Feel free to contact me via chat or email.

Mais conteúdo relacionado

Mais procurados

Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsHariharan Ganesan
 
REAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMREAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMprakrutijsh
 
Process Management in Android
Process Management in AndroidProcess Management in Android
Process Management in AndroidShrey Verma
 
LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLinaro
 
Operating system - Process and its concepts
Operating system - Process and its conceptsOperating system - Process and its concepts
Operating system - Process and its conceptsKaran Thakkar
 
Galvin-operating System(Ch1)
Galvin-operating System(Ch1)Galvin-operating System(Ch1)
Galvin-operating System(Ch1)dsuyal1
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsPawandeep Kaur
 
CNIT 126: 8: Debugging
CNIT 126: 8: DebuggingCNIT 126: 8: Debugging
CNIT 126: 8: DebuggingSam Bowne
 
Galvin-operating System(Ch3)
Galvin-operating System(Ch3)Galvin-operating System(Ch3)
Galvin-operating System(Ch3)dsuyal1
 
Fault tolerant real-time scheduling
Fault tolerant real-time schedulingFault tolerant real-time scheduling
Fault tolerant real-time schedulingReza Ramezani
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsRohit Joshi
 
Real time operating systems (rtos) concepts 9
Real time operating systems (rtos) concepts 9Real time operating systems (rtos) concepts 9
Real time operating systems (rtos) concepts 9Abu Bakr Ramadan
 

Mais procurados (20)

Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
OS_Ch2
OS_Ch2OS_Ch2
OS_Ch2
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating Systems
 
REAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMREAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEM
 
Process Management in Android
Process Management in AndroidProcess Management in Android
Process Management in Android
 
LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My Slumber
 
Operating system - Process and its concepts
Operating system - Process and its conceptsOperating system - Process and its concepts
Operating system - Process and its concepts
 
Galvin-operating System(Ch1)
Galvin-operating System(Ch1)Galvin-operating System(Ch1)
Galvin-operating System(Ch1)
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
F33 book-depend-pres-pt6
F33 book-depend-pres-pt6F33 book-depend-pres-pt6
F33 book-depend-pres-pt6
 
CNIT 126: 8: Debugging
CNIT 126: 8: DebuggingCNIT 126: 8: Debugging
CNIT 126: 8: Debugging
 
Galvin-operating System(Ch3)
Galvin-operating System(Ch3)Galvin-operating System(Ch3)
Galvin-operating System(Ch3)
 
Fault tolerant real-time scheduling
Fault tolerant real-time schedulingFault tolerant real-time scheduling
Fault tolerant real-time scheduling
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
RTOS
RTOSRTOS
RTOS
 
Real time operating systems (rtos) concepts 9
Real time operating systems (rtos) concepts 9Real time operating systems (rtos) concepts 9
Real time operating systems (rtos) concepts 9
 

Semelhante a RCA for Linux Reboot

Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfstroganovboris
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Jagadisha Maiya
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharingJames Hsieh
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slotsmha4
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slotsmha4
 
lecture 1 (Part 2) kernal and its categories
lecture 1 (Part 2) kernal and its categorieslecture 1 (Part 2) kernal and its categories
lecture 1 (Part 2) kernal and its categoriesWajeehaBaig
 
4.9 apend troubleshooting tools v2
4.9 apend troubleshooting tools v24.9 apend troubleshooting tools v2
4.9 apend troubleshooting tools v2Acácio Oliveira
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 

Semelhante a RCA for Linux Reboot (20)

Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
 
Audit
AuditAudit
Audit
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
TSRT Crashes
TSRT CrashesTSRT Crashes
TSRT Crashes
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slots
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slots
 
CPU Architecture
CPU ArchitectureCPU Architecture
CPU Architecture
 
lecture 1 (Part 2) kernal and its categories
lecture 1 (Part 2) kernal and its categorieslecture 1 (Part 2) kernal and its categories
lecture 1 (Part 2) kernal and its categories
 
Ch04 system administration
Ch04 system administration Ch04 system administration
Ch04 system administration
 
Ch04
Ch04Ch04
Ch04
 
4.9 apend troubleshooting tools v2
4.9 apend troubleshooting tools v24.9 apend troubleshooting tools v2
4.9 apend troubleshooting tools v2
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 

Último

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

RCA for Linux Reboot

  • 1. RCA for Linux Reboot Satish Navkar
  • 2. Agenda Importance General Approach Technical Approach Advanced Investigation Preparing for future Common issues
  • 3. Why RCA is important Business Impact Loss of money due to outages. Disruption in availability of services. Risk of re-occurrence of the issue. Finding the culprit behind the scene. Security breach or human error.
  • 4. General Approach (Non-Technical) The RCA is a method of problem solving. There can be more than one root cause behind the issue. Purpose is to identify solution or workaround, to prevent reocurrence at lowest cost and simplest way. RCFA (Root Cause Failure Analysis) recognizes that complete prevention of recurrence by one corrective action is not always possible. Famous methods/tools - 5 whys, Pareto analysis, Cause and effect model etc.
  • 5. Technical Approach (Basic) Whether unexpected reboot is effect of some planned activity ? Was there any recent configuration changes (sw/hw) ? What does my recent logs suggest ? Any unususal behaviour (or logs) spotted ? (console) Is there some relation between the occurence of the events ? Do we have a reliable power source ? (UPS)
  • 6. Step Forward Is it a virtual or physical system ? Check logs recorded by hyper-visor and/or hardware. (mcelog, IML logs, ASR events, hyper-visior utilization etc.) Is this part of some cluster ? Any fencing event recorded ? Try to find whether its a real OS issue; Or its related to application/network/storage ? Is this result of some malecious script running on the system ? Is there any anti-virus installed and running on the system ? What all panic parameters are set on the system ?
  • 7. Deep Diving Is there any known bug with running kernel ? (search bugzilla) Is this issue reproducible on demand ? Any possible workaround ? Does replica of system exhibit similar behaviour ? Compare initramfs of replicas to find out any differences. Is there some known issue with the combination of the OS version and a perticular application running on the system ? Any sign of abnormal resource utilization near the event ? Whether complete/partial dump is captured for the reboot ? Check vmcore-dmesg.txt logs and try to find known issues on vendor portal.
  • 8. Server Hung Scenario Do not confuse it with application hang scenario. Do all checks. There is no standard defination for OS hung situation. Some facts regarding crash vs hang situations: • Crash is often immidiately follow a problem in kernel space. Like : Programming error, Defective hardware, Unsupported operation etc. • During crash oops messages are displayed and it helps in diagnosis • Crash or panic is easier to troubleshoot. It provides stack trace and panic task details. • System hang are more subtle. It can be the result of simply temporary performance issue caused by inefficient algorithms or as complicated as dead locks. • No oops messages displayed on console, dont know what thread caused hang. Hence it makes hang issue more complicated to analyze. Take a snapshot of virtual guest and extract memory dump. Or trigger panic using available panic techniques, make sure panic initiate’s the memory dump mechanism.
  • 9. User Initiated  The "exiting on signal 15" message is the last message that syslog service emits during normal shutdown.  The presence of this message in the messages file indicates a directed shutdown of the system. Either from a user or a program.  Is there any system health monitoring software running which may issue the 'shutdown' command ? For ex : • Automatic system recovery software. • Hardware monitoring tools. • UPS software with shutdown capability etc.  How to find which user it was -- set audit rules or use script.  Check secure logs & bash history of users for shutdown event.
  • 10. Cluster Initiated  Cluster reboots system using fencing mechanism. Common clustering softwares are : Oracle clusterware, VCS & RedHat Cluster etc.  Unlike many common thoughts, high-availability is not the highest priority of an HA cluster, but only the 2nd one.  There are two classes of fencing methods, one - which disables a node itself, the other - disallows access to resources such as shared disks.  Cluster can fall victim to conditions called Split Brain and Amnesia. Clusters use a process called “STONITH” in order to correct the issue; this simply means the healthy nodes kill the sick node.  I/O fencing is one of the important feature of VCS, whereas Oracle-RAC simply gives the message - "Please Reboot" to the sick node. The node bounces itself and rejoins the cluster. RedHat cluster uses fence device configuration to handle fencing events.  One can also set fence delay to allow OS to capture vmcore for fencing events.
  • 11. Hardware Faults  The most common hardware errors that are captured on the system are: • Memory errors or Error Correction Code (ECC) problems. • Inadequate cooling / processor over-heating. • System bus errors. Cache errors in the processor or hardware. • Firmware bugs, EDAC and NMI’s.  The kernel does the immediate actions (like killing processes etc.) and mcelog decodes the errors.  The mcelog is the user space backend for logging machine check errors reported by the hardware to the kernel. • Seen MCE error : HARDWARE ERROR. This is *NOT* a software problem!”
  • 12. Panic Parameters These are used to deliberately panic system, when certain conditions are met. It is necessary for debugging purpose • 1) kernel.hung_task_panic • 2) kernel.softlockup_panic • 3) vm.panic_on_oom: This parameter will panic the kernel on oom-killer events and capture a vmcore if kdump service is running as expected. • 4) kernel.panic_on_io_nmi • 5) kernel.unknown_nmi_panic: It utilizes NMI switch capability to force a kernel panic on a hung system. This feature makes use of the computer's NMI switch to trigger a panic. • 6) kernel.panic_on_oops • 7) kernel.panic_on_unrecovered_nmi • 8) kernel.nmi_watchdog: The NMI watchdog monitors system interrupts and initiates a reboot if the system appears to have hung. • 9) kernel.panic_on_stackoverflow • 10) kernel.panic [secs]
  • 13. Panic Strings  These panic strings explain cause of the panic. But it is not always sufficient to determine the actual cause.  When a kernel panic occurs, the system usually displays a message on the console and all the system activity stops’ • Kernel BUG at net/sunrpc/sched.c:695! • BUG: unable to handle kernel paging request at xxxxx • BUG: unable to handle kernel NULL pointer dereference at xxxxx / (null) • divide error: 0000 [#1] SMP • Kernel panic – not syncing: softlockup: hung tasks / hung_task: blocked tasks • Kernel panic – not syncing: Watchdog detected hard LOCKUP on cpu 0 • Kernel panic – not syncing: out of memory, panic_on_oom is selected • Kernel panic – not syncing: Out of memory and no killable processes.. • Kernel panic – not syncing: An NMI occurred, please see the Integrated Management Log for details.” • Kernel panic – not syncing: NMI IOCK error: Not continuing / NMI: Not continuing / nmi watchdog • Kernel panic – not syncing: Fatal Machine check • Kernel panic – not syncing: Attempted to kill init ! • Kernel panic – not syncing: GAB: Port h halting system due to client process failure
  • 14. Kernel logging  Syslog is a standard logging facility. It collects messages of various programs and services including the kernel, and stores them, depending on setup, in a bunch of log files typically under /var/log.  The “/var/log/messages” aims at storing valuable, non-debug and non- critical messages. This log should be considered the "general system activity" log.  Administrators use log rotation facility to maintain historical data. One can also change the logging level based on the requirement of the setup. # Common call traces seen in messages are : • OOM-killer and memory stats. • Softlockup logs for various cores. • Page allocation failures. • Segfaults : Signifies an error in one particular process. kernel: fmg[6335]: segfault at 0xffffd2dc rip 0xffffd2dc rsp 00000000ffffd1bc errorX • Trap divide error : Application crash due to “divide by zero” kernel: nmupm[2792] trap divide error rip:804a39a rsp:ffa4eb24 error:X
  • 15. OOM call traces  The out_of_memory function is called when the system memory (including swap) has been fully allocated to a point where regular system activities cannot be performed until some of that memory is freed.  The mm/oom_kill.c terminate one or more processes based on badness() score; which follows an algorithm that does not kill any innocent task. <snip/> Node 0 DMA: 3*4kB 2*8kB 2*16kB 3*32kB 2*64kB 2*128kB ... 3*4096kB = 15132kB Node 0 DMA32: 452*4kB .. Node 0 Normal: 13315*4kB .. <<< [..] Free swap = 0kB <<< Total swap = 8388604kB [..] kernel: httpd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 <<< kernel: kernel: Call Trace: [<ffffffff800c3a6a>] out_of_memory+0x8e/0x2f5 [<ffffffff8000f2eb>] __alloc_pages+0x245/0x2ce [<ffffffff80012a62>] __do_page_cache_readahead+0x95/0x1d9 </snip>
  • 16. D-state call traces  These messages serve as a warning that something may not be operating optimally. They do not necessarily indicate a serious problem and any blocked processes should eventually proceed when the system recovers.  The “khungtaskd” has the ability to detect tasks stuck in D-state ( Uninterruptible Sleep (UN) ) longer than a specified time period and results in following type of message in system log: <snip/> INFO: task syslogd:2643 blocked for more than 120 seconds. <<< "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <<< syslogd D ffff81000237eaa0 0 2643 1 2646 2634 (NOTLB) <<< ffff8101352c3d88 0000000000000086 ffff8101352c3d98 ffffffff80063ff8 0000000000001000 0000000000000009 ffff81013d2c57e0 ffff810102ac1820 0000340b30708992 0000000000000571 ffff81013d2c59c8 000000010000089f Call Trace: <<< [<ffffffff80063ff8>] thread_return+0x62/0xfe [...] [<ffffffff8005e28d>] tracesys+0xd5/0xe0 </snip>
  • 17. Soft-lockup call traces  Soft lockups are situations in which the kernel's scheduler subsystem has not been given a chance to perform its job.  It can be caused by defects in the kernel, by hardware issues or by extremely high workloads. <snip/> kernel: BUG: soft lockup - CPU#7 stuck for 206s! [sosreport:14372] <<< kernel: Modules linked in: rpcsec_gss_krb5 nfsd..vsock(U) ipv6 .. vmware_balloon .. vmxnet3 .. dm_mod [last unloaded: speedstep_lib] <<< [..] /440BX Desktop Reference Platform kernel: RIP: 0010:[<ffffffff81162cbd>] [<ffffffff81162cbd>] s_show+0x1ad/0x330 <<< kernel: RSP: 0018:ffff8801e482fd98 EFLAGS: 00000202 kernel: RAX: 0000000000000000 RBX: ffff8801e482fe18 RCX: ffff88043febfb80 <<< kernel: RDX: 0000000000000000 RSI: 00000000000036a7 RDI: ffff88043febfb60 [...] kernel: <d> 00000000000036a7 ffff880437830f00 ffff8801e482fe18 ffff88031e3f1640 kernel: Call Trace: kernel: [<ffffffff8119db87>] ? seq_read+0x267/0x3f0 <<< kernel: [<ffffffff81054c30>] ? __dequeue_entity+0x30/0x50 ..... </snip>
  • 18. Page allocation failures  The kernel frequently needs to allocate chunks of memory for the temporary storage of data and structures. Sometimes allocations demands many physically contiguous pages which may not always be available. In times like this memory allocator may choose to fail the allocation request.  Common cause are memory-crunch, memory-fragmentation, memory- zone exhausted and drivers with different service routines. • Usual workaround is to check the value of vm.min_free_kbytes and double it. Also setting vm.zone_reclaim_mode to 0 can help to avoid memory congestion issues . </snip> kernel: swapper: page allocation failure. order:2, mode:0x20 <<< kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1 kernel: Call Trace: kernel: <IRQ> [<ffffffff81123daf>] ? __alloc_pages_nodemask+0x77f/0x940 kernel: [<ffffffff8115dc62>] ? kmem_getpages+0x62/0x170 kernel: [<ffffffff8115e87a>] ? fallback_alloc+0x1ba/0x270 kernel: [<ffffffff8115e2cf>] ? cache_grow+0x2cf/0x320 kernel: [<ffffffff8115e5f9>] ? ____cache_alloc_node+0x99/0x160 ... </snip>
  • 19. SysRq  It is a 'magical' key combo that you can hit, and to which the kernel will respond regardless of whatever else it is doing, even if the console is unresponsive.  The sysrq key is one of the best (and sometimes the only) way to determine what a machine is really doing. It is useful when a server appears to be "hung" or for diagnosing elusive, transient, kernel-related problems.  For security reasons, SysRq key is disabled by default. • Because enabling sysrq gives someone with physical console access an extra abilities. It is recommended to disable it when not troubleshooting a problem or to ensure that physical console access is properly secured.  There are several sysrq events(and ways) that can be triggered once the sysrq facility is enabled. • # echo h > /proc/sysrq-trigger Commonly used options are : • m - dump info about memory allocation • t - dump thread state information • c - intentionally crash the system
  • 20. Kdump  Kdump is mechanism that uses kexec to capture the crash dump. Crash dump is also known as “vmcore” it can be captured using - kdump/diskdump/netdump/xendump/LKCD/vmss2core etc.  kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through the BIOS.  Crash dump captures the state of the kernel at the moment of panic. It is a snapshot of the physical memory at the time of crash. • Vmcore can be collected by using following methods : • Automatically  when kernel panics (parameters) or oops. It can be due to Bug in kernel or in third party driver. In case of memory corruption and hardware problems • Manually  when admin uses sysrq, NMI switch or by taking snapshot. • Limitations of vmcore: Not useful for analysing healthy system; It cant capture historical logs; It is complex and requires expertise to analysis it. • Configuring kdump and starting service is not sufficient, testing kdump is must. Also find out supported and unsupported kdump target for perticular OS vendor. • There are multiple factors that affect vmcore generation, ex : Clustering, HP- systems, Bonding, Network-cards/modules, virtualization etc.
  • 21. Bugs  A software bug is a failure or flaw in a program that produces undesired or incorrect results. It’s an error that prevents the application from functioning as it should.  There are many reasons for software bugs. Most common reason is human mistakes in software design and coding.  The BUG_ON() function acts similar to panic, but is called by intentional code meant to check abnormal conditions.  The vmcore and vmcore-dmesg.txt helps to identify bugs. Bugs can be in any software, but bug in device drivers or in kernel can cause outages.  A kernel bug example is - divide by zero in find_busiest_group() function causing kernel panic in RHEL6 kernels.  A deadlock bug in “vmtoolsd” causing system hung - is an example of external software bug leading to system panic condition.
  • 22. Preparing for Future Configure kdump on all systems. It has no side effects. Configure audit rules based on business requirements. Properly configure the cluster setting and test it. Tune system as per guidelines of Application vendor. Be ready with backup plan. Patch regularly.
  • 25. Q&A Feel free to contact me via chat or email.