Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

Sandesh Rao
VP AIOps , Autonomous Database
Wilson Chan
Architect – Autonomous Database and Real Application Clusters
Oracle Real Application Clusters
19c: Best Practices and Internals
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4

Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not
a commitment to deliver any material, code, or functionality, and should not be
relied upon in making purchasing decisions. The development, release,
timing, and pricing of any features or functionality described for Oracle’s products
may change and remains at the sole discretion of Oracle Corporation.

Natural language
processing
Architecture and basics

Grid Infrastructure - Overview
Host 1
Cluster
DB
Instance
DB
Instance
ASM
Instance
ASM
Instance
Host 2 Host 3
Disk Group A Disk Group B
ASM Disk Groups
DB
Instance
ASM
Instance

Grid Infrastructure - Overview
CRS can also be Standalone for
ASM and/or Oracle Restart
CRS & ASM in Grid Home
Must be installed in different location to
RDBMS home
Installer locks the Grid Home path by
setting root permissions
CRS can run by itself or in combination
with other vendor clusterware
Combination of :
Oracle Cluster Ready Services (CRS)
Oracle Automatic Storage Management
(ASM)

Grid Infrastructure - Requirements
Shared Oracle Cluster Registry (OCR) and Voting files
Must be in ASM or CFS
OCR backed up every 4 hours automatically GIHOME/cdata
Kept 4,8,12 hours, 1 day, 1 week
Restored with ocrconfig
Voting file backed up into OCR at each change.
Voting file restored with crsctl

Grid Infrastructure – CRS network
Requirements
One or more redundant
private networks for inter-
node communications
High speed with low latency
Separate physical network or
managed converged network
VLANS are supported
Usage
Interconnect is a memory backplane for
the cluster
Clusterware messaging
RDBMS messaging and block transfer
ASM messaging
HANFS for block traffic

Grid Infrastructure – How it works
A node can be evicted when
deemed unhealthy
May require reboot
IPMI integration
or diskmon in case of Exadata
On Unix ohasd runs out of inittab
with respawn
CRS provides Cluster Time
Synchronization services
Always runs but in observer mode
if ntpd configured
CRS stack is spawned from Oracle
HA Services Daemon (ohasd)

Grid Infrastructure – Processes for core resources
HA Stack CRS Stack CRS Service
Level 0 Level 1 Level 2 Level 3 Level 4
INIT
ohasd
cssdmonitor
Network sources
SCAN VIP
Node VIP
ACFS Registry
GNS VIP
ASM Instance
Diskgroup
DB Resources
SCAN Listener
Listener
Services
eONS
ONS
GNS
GSD
CRSD
orarootagent
CRSD
oraagent
ASM
mDNSD
GIPCS
EVMD
GPNPD
CRSD
CTSSD
Diskmon
CSSD
OHASD
oraagent
OHASD
oraclerootagent
cssdagent

Local & Cluster Resources
ora.asm
ora.dgname.dg
ora.listener.lsnr
ora.ons
ora.gsd
ora.net1.network
ora.registry.acfs
OS
Oracle Grid Infrastructure
LISTENER_SCAN1
ora.SCAN1.VIP
ora.node1.vip
OS OS OS
crsctl stat res -tList with:
If database is deployed:
ora.Dbname.db
ora.Dbname.Srvcname.svc
ora.asm
ora.dgname.dg
ora.listener.lsnr
ora.ons
ora.gsd
ora.net1.network
ora.registry.acfs
LISTENER_SCAN2
ora.SCAN2.VIP
ora.node2.vip
ora.asm
ora.dgname.dg
ora.listener.lsnr
ora.ons
ora.gsd
ora.net1.network
ora.registry.acfs
LISTENER_SCAN3
ora.SCAN3.VIP
ora.node3.vip
ora.asm
ora.dgname.dg
ora.listener.lsnr
ora.ons
ora.gsd
ora.net1.network
ora.registry.acfs
Ora.oc4j
Ora.cvu
ora.node4.vip

Node VIP Details
[GRID]> crsctl status res ora.rac1.vip -p
NAME=ora.rac1.vip
TYPE=ora.cluster_vip_net1.type
...
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%C
RS_EXE_SUFFIX%
...
CHECK_INTERVAL=1
CHECK_TIMEOUT=30
...
START_DEPENDENCIES=hard(ora.net1.network)
pullup(ora.net1.network)
...
STOP_DEPENDENCIES=hard(ora.net1.network)
...
USR_ORA_VIP=rac1-vip
VERSION=12.2.0.2.0
..
..
ora.listener.lsnr
..
..
ora.net1.network
..
OS
Oracle Grid Infrastructure
..
..
ora.rac1.vip
OS
..
..
ora.listener.lsnr
..
..
ora.net1.network
..
..
..
ora.rac2.vip
rac1 rac2

CSSD
Dual channel node monitoring every second
Failures cause node eviction (fencing)
Network heartbeat (Private interconnect ping)
Nodes must respond in css_miscount (30s default)
Disk heartbeat (voting disk ping)
Disk must respond in diskTimeout time
CSSD
Oracle Clusterware
(1) Ping
network
heartbeat
CSSD
(2) Ping
disk heartbeat
(2) Ping
disk heartbeat
[CSSD] [1115699552] >TRACE:
clssnmReadDskHeartbeat: node(2) is down.
rcfg(1) wrtcnt(1) LATS(63436584) Disk
lastSeqNo(1)
[CSSD][1111902528]clssnmPollingThread: node
mynodename(5) at 75% heartbeat fatal, removal
in 6.770 seconds

CSSD - Eviction
Eviction (fencing) prevents shared data
being corrupted by independently
operating nodes
In order to perform eviction:
1. CSSD sends kill request to node via
remaining channel
2. If CSSD does not succeed in kill then
OCSSDMONITOR will take over and
attempt kill
CSSD
Oracle Clusterware
CSSD
Network
heartbeat
failure
X
1
Kill request sent
via remaining
channel
2
Kill attempted by
Oracle
Clusterware

Interconnect - Redundant Interconnect Usage & HAIPs
Redundant interconnect provides bonding
alternative
Private networks only
Enables HA & load balancing for up to 4 NICs
per server
Uses HAIPs assigned to private networks
on server
HAIPs taken from link-local IP range
If network interface fails HAIP is failed over to
remaining interface
HAIP 1
Oracle Clusterware
Node 1 Node 2
HAIP 2
HAIP 3
HAIP 4

12.2 Automatic Storage Management (ASM)
ASM Filter Driver – Full Integration
Further configuration and monitoring is
conducted by
using the AFDTOOL utility:
Provision a disk:
Remove a disk:
List the managed disks:
Configure during Installation
Reject non-Oracle I/O
Stops OS utilities from
overwriting ASM disks
Protects database files
Reduce OS resource usage
Fewer open file descriptors
Faster node recovery
afdtool -add /dev/dsk1 disk1
afdtool -getdevlist
afdtool -delete disk1

Node Weighting - Everything equal, let the majority of work survive
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
• Node Weighing is a new feature that
considers the workload run on a node
during fencing
• The idea is to let the majority of work
survive, if everything else is equal
– “Majority work” is for example represented
by the number of services.
• Example: In a 2-node cluster, the node
hosting the majority of services (at
fencing time) is meant to survive
• DBAs can overrule and rate a service
as a “critical” based on business needs

Pluggable Database & Service Isolation
Prevents instance failures affecting others
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
Using Oracle Multitenant, PDBs can be:
Opened as singletons (in one database instance
only)
In a subset of instances
Or in all instances at once
If certain PDBs are only opened on some
instances, Pluggable Database Isolation
Improves performance by:
Reducing DLM operations for
PDBs not open in all instances.
Optimizing block operations based
on in-memory block separation.
Ensures that instance failures of instances
only hosting singleton PDBs will not impact
other instances of the same RAC-based CDB

Fleet Patching & Provisioning Support
Database & Grid Infrastructure
11.2.0.3.
11.2.0.4.
12.1
12.2
18
VM VM
VM VM
VM VM
VM VM
• Single Instance
• Oracle Restart
• Oracle RAC One
• Oracle RAC
BM
Non-CDB CDB/PDB
VM
• Generic Software
• Data Guard Aware
• Customizable
Multi-OS
19

Applications see no errors during outages
Standardize on Transparent Application Continuity
Hides errors, timeouts, and
maintenance
No application knowledge or changes
to use
Rebuilds session state & in-flight
transactions
Adapts as applications change:
protected for the future
Request
Errors/Timeouts hidden
TAC

Oracle RAC Performance Features
Over two decades of innovation
• Automatic Undo Management
• Cache Fusion
• Oracle Real Application
Clusters
• Session Affinity
• PDB & Services Isolation
• Service-Oriented Buffer Cache
• Leaf Block Split Optimizations
• Self Tuning LMS
• Multithreaded Cache Fusion
• ExaFusion Direct-to-Wire Protocol
• Smart Fusion Block Transfer
• Universal Connection Pool (UCP) Support for Oracle RAC
• Support for Distributed Transactions (XA) in Oracle RAC
• Parallel Execution Optimizations for Oracle RAC
• Affinity Locking and Read-Mostly Objects
• Reader Bypass
• Flash Cache
• Connection Load Balancing
• Load Balancing Advisory
• Cluster Managed Services
• Automatic Storage Management
9i 10g
11g
12c
18c
• Zero Downtime Patching Clusterware
• Fleet Provisioning and Patching
• Automated Transaction Draining
• Support TLS Ciphers for Clusterware
• Automated PDB Relocation
19c
• Scalable Sequences
• Undo RDMA-Read
• Commit Cache
• Database Reliability Framework

RAC Enhancements
Remastering Slaves (1 slave per LMS)
Starting with Oracle RAC 12.1, the LMS offloads heavy remastering work to the slave
This improves LMS’s responsiveness for Cache Fusion requests during remastering
Support for 100 LMS’s – change in default value
Oracle RAC 12.2 supports up to 100 LMS’s (names: LMS0-LM99) as opposed to 35
On larger systems (lots of CPU, large SGA), more LMS’s will start by default
More LMS’s means better reconfiguration time without any impact during runtime
More Dynamic Remastering (DRM)
Starting with Oracle RAC 19c, DRM is planned to more adaptively consider the overall system
state

Reconfiguration Performance Improvements
11204
4 x
1.5 x
12.2 18.1

Reconfiguration Performance as of 18c
Timings with different #LMS:
Total reconfiguration time for an
instance leave & re-join
100GB cache
2 node RAC
Timings with different cache sizes:
Total reconfiguration time for an
instance leave & re-join
8 LMS’s
2 node RAC
Buffer Cache Size Reconfiguration
Time
25GB 3.0 sec
50GB 4.9 sec
100GB 8.3 sec
# LMS Reconfiguration
Time
8 LMS’s 8.3 sec
16 LMS’s 5.0 sec
32 LMS’s 3.6 sec

Reconfiguration Diagnosability
Detailed timing
breakdown
available in LMON
trace file
**************** BEGIN DLM RCFG HA STATS ****************
Total dlm rcfg time (inc 6): 3.586 secs (394926177, 394929763)
Begin step .........: 0.005 secs (394926177, 394926182)
Freeze step ........: 0.019 secs (394926182, 394926201)
Sync 1 step ........: 0.002 secs (394926264, 394926266)
Sync 2 step ........: 0.024 secs (394926266, 394926290)
Enqueue cleanup step: 0.002 secs (394926290, 394926292)
Sync pcm1 step .....: 0.004 secs (394926293, 394926297)
……
Enqueue dubious step: 0.004 secs (394926432, 394926436)
Sync 5 step ........: 0.000 secs (394926436, 394926436)
Enqueue grant step .: 0.001 secs (394926436, 394926437)
Sync 6 step ........: 0.012 secs (394926437, 394926449)
Fixwrt replay step .: 0.885 secs (394928837, 394929722)
Sync 8 step ........: 0.040 secs (394929722, 394929762)
End step ...........: 0.001 secs (394929762, 394929763)
Number of replayed enqueues sent / received .......: 2246 / 893
Number of replayed fusion locks sent / received ...: 124027 / 0
Number of enqueues mastered before / after rcfg ...: 2058 / 1384
**************** END DLM RCFG HA STATS *****************

DRM Diagnosability
Detailed timing
breakdown
available in AWR
Report
Dynamic Remastering Statistics DB/Inst: SALES/sales1 Snaps: 393-452
-> Affinity objects - Affinity objects mastered at the begin/end snapshot
-> Read-mostly objects - Read-mostly objects mastered at the begin/end snapshot
per Begin End
Name Total Remaster Op Snap Snap
-------------------------------- ------------ ------------- -------- --------
remaster ops 24 1.00
remastered objects 24 1.00
remaster time (s) 7.4 0.31
freeze time (s) 1.5 0.06
cleanup time (s) 2.4 0.10
replay time (s) 0.3 0.01
fixwrite time (s) 2.4 0.10
sync time (s) 0.1 0.01
affinity objects N/A 3 27
read-mostly objects N/A 0 0
read-mostly objects (persistent) N/A 0 0

Natural language
processing
Proactive and reactive
tools

Combined installation of TFA, ORAchk and EXAchk
TFA and ORAchk/EXAchk are installed via a single platform specific installation
available at doc: 2550798.1
Updated on a quarterly basis
Included within Database & Grid Infrastructure install
Included within Release Updates (RUs)

Automatic notifications
ORAchk / EXAchk results
older than two weeks are
automatically purged
Full ORAchk / EXAchk
compliance run at
3am once a week
TFA automatically monitors
for significant events,
collects, analyses
diagnostics then notifies you
Most impactful compliance
checks from
ORAchk/EXAchk are run
2am every day
tfactl <orachk|exachk> -autostart

• Download latest orachk and benefit
from the latest checks
• No need to individually
download autoupgrade.jar or
cluvfy
• Single report with results from
autoupgrade.jar, orachk and cluvfy
checks
• *orachk also includes other
components like Application
Continuity and Security related
checks
ORAchk=ORAchk+cluvfy+Autoupgrade.jar*
Cluvf
y
PreUpgrad
e
ORAch
k
ORAchk autoupgrade includes
autoupgrade.jar checks and cluvfy pre-
upgrade checks

Natural language
processing
19c and beyond

• Execution of ./gridSetup.sh on old
OS releases may fail
• Failure is reported as a perl error
message
• perl has hard dependency on
glibc
• Similar message reported by DB
installer
• Additional details in URL below
https://www.linkedin.com/pulse/high-level-
steps-upgrade-oracle-19c-rac-anil-nair/
Oracle 19c Upgrade requires Linux 7

• Download latest orachk and benefit
from the latest checks
• No need to individually
download autoupgrade.jar or
cluvfy
• Single report with results from
autoupgrade.jar, orachk and cluvfy
checks
• *orachk also includes other
components like Application
Continuity and Security related
checks
ORAchk=ORAchk+cluvfy+Autoupgrade.jar*
Cluvfy PreUpgrade
ORAchk
ORAchk autoupgrade includes autoupgrade.jar
checks and cluvfy pre-upgrade checks

• Grid Infrastructure Management
Repository (GIMR) aka mgmtDB is
NO longer mandatory with starting
with Oracle 19c
• Limited AHF functionality by
utilizing filesystem without GIMR
• No support for CHA GUI chactl
• Trace File Analyzer (TFA) will
provide limited graphical view
State of the GIMR

• Apply patch to a new grid
home while stack continues to
run from current home
• Reduces downtime as stack is
up and running during the
copy process
• Reduces errors caused by
common issues such as “Out
of space”
• Easy fallback in case of issues
Patch faster with -SwitchHome
/u01/app/19.0/grid
/u01/app/19.3/grid

• Utilizes Service(s) to drive
workload placement
• Services implicitly opens PDB
Instance(s)
• Order of PDB open based on
service definition
• Defined using Preferred,
Available attributes
• Default modeling after
upgrades
Resource Modeling Today
Services trigger PDB open
Services

• Define PDB Cardinality, Rank
• Higher Ranked PDBs started
before Lower Ranked PDB
• Cardinality defines the number
of nodes where PDB is started
• Considers
• host runtime CPU load before
PDB open
• availability of nodes before PDB
open
• all nodes may not be equal in
terms of # of CPUs or speed
New Resource Modeling Scheme
PDB open triggers Service startup

Clusterware runtime Diagnostics
• Oracle 19c Clusterware processes maintains histograms, statistics
such as trace file rotation frequency, time taken for rotation
• Preserves critical information on very busy systems
• Severity tagging provides human readable criticality of messages
• 2019-08-20 08:36:13.142 : CSSD:1871161088: [ ERROR]
clssgmclienteventhndlr: (SENDCOMPLETE) No proc found for ClientID
• 2019-08-20 08:36:13.188 : CSSD:1871161088: [ INFO] clssgmDeadProc:
Removing clientID 2:43454:0 (0x7fda802df820), with GIPC
• New diagnostics monitor thread ensures in-memory logs (UTS) are
periodically written to ensure diagnostics are available in case of
process crash

During planned shutdown
• Distribute Resource Masters before
Instance shutdown
• Distribution before shutdown does not
require any recovery on the surviving
instances
• Effectively reduces the time spent on
reconfiguration during planned outages
to zero
• Rolling Windows to reduce
impact of reconfiguration
• During runtime
• Smart DRM
38
Zero Downtime for Planned outages
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
B BB B
M MM M
M MM M
M MM M
M MM M
M MM M
M MM M
M MM M
M MM M
Global Resource Directory
B Buffer M MasterB M

• Subnet Manager for Fast
Node Death detection
• Network (Subnet Manager)
• Disk (Diskmon)
• Utilize low latency RDMA
• Read/Write to remote memory
without CPU
• More details available at
• https://www.slideshare.net/AnilNair2
7/oracle-rac-features-on-exadata
Oracle RAC Exadata optimizations
0.8
30
0
10
20
30
40
Exadata Generic Systems
Fast Node Death Detection
Exafusion
Smart Fusion
Block Transfer

Cache Fusion Optimizations on Exadata
Utilize RDMA for
• propagating BoC (Broadcast on commit)
• Reduce messages (2 instead of 5)
• Reduce CPU usage on LMS
• Current read blocks
• Very common access pattern in various workloads
• Commit Cache messaging
• https://www.slideshare.net/AnilNair27/oracle-rac-features-on-exadata (Slide#43)
• Reduces load on LMS from remote node
• Direct read than sending an 8k block
• Undo blocks
• https://www.slideshare.net/AnilNair27/oracle-rac-features-on-exadata (Slide#44)
40

Oracle RAC 19c - Hang Manager
Manages hung database processes
Detect & resolves
Cross-layer hangs
I.E: Hangs caused by a blocked ASM
resource.
Resolves deadlocks
User defined control via PL/SQL
Early Warning exposed via (V$ view)
Database Member
Cluster
Uses ASM IO
Service
IO
Service
ASM
Service
Session
Detect
Analyze
Evaluate
Hung?
Hang
Resolution

Monitors for problems before service
disruption
E.g HB for critical processes
Detects the cause of problem
Use collected data across all nodes to
identify root cause
E.g. Waits on GRD
Resolves the problem with minimal
disruption
E.g Resize internal Structures
Introducing Database Reliability Framework
Resources

Database Reliability Framework in Action
Monitor
Detect
Review
Resolve
Increase in number of resources in the
Global Resource Directory (GRD)
Resulting in higher wait times for GRD
Several solutions possible
Is wait time due to high CPU load?
Increase in number of LMS help?
Increasing CR slaves help
Increasing internal thresholds help?

Examples and DRF Views
Busy FG process(es) using CPU
Potential upcoming memory starvation
LGWR constrained by CPU
Too many RT processes
Insufficient CR slaves
DLM resource cache incorrectly sized
Control file IO (CFIO) stall
V$ view Description
v$gcr_metrics All defined metrics
v$gcr_actions All defined actions
v$gcr_log Metric/action history
summary log
v$gcr_status details on latest
metric/action status

Cache Fusion Optimizations
Increase the maximum number of LMSs
Based on System utilization (DRF)
Each LMS will spawn a dedicated CR slave
Threshold of Rollback Changes
Threaded CR slave in 18c
Optimized for Multi core/thread architecture
Remastering Slaves (RMV0..)
Offloads heavy remastering work to slaves

Commit Cache
Reduce Cache Fusion traffic for
remote undo header lookups
Often becomes a bottleneck with
DML heavy OLTP/mixed
workloads
Remote undo header lookups are
needed for:
Check if a transaction has committed
Delayed block cleanout
0
400
800
1,200
1,600
2,000
Data
Blocks
Undo
Headers
Undo
Blocks
Others
#BlockTransfers(thousands)
CR (Immediate) CR (Busy)
Current (Immediate) Current (Busy)

Smart Fusion Block Transfer
On Exadata, Oracle does not wait for
the log write notification
Exadata ensures the log write completes
before changes to block on another instance
commit, guaranteeing durability
Wait for Log I/O during transfer of hot blocks
is eliminated
Up to 40% throughput and 33% response
time improvement in some heavily
contended OLTP workloads
Storage software will ensure correct
ordering of writes
1. Issue log write
2. Wait for log
write completion
3. Transfer
block
Exadata Avoids I/O Wait confirmation
Storage

Continuous Feature Improvements
Lock Domain per PDB Utilize Bloom Filter to further
reduce Reconfiguration times
Utilize Database Reliability
Framework

Patching Improvements
OJVM is Oracle RAC rolling patch enabled with Oracle RAC 18c (18.4)
Non-Java services are available at all times
Java services are available all the time, except for a ~10 seconds brownout
No errors are reported during the brownout
Zero-Downtime Oracle Grid Infrastructure Patching (*18.3)
Patch Oracle Grid Infrastructure without interrupting database operations
Patches are applied out-of-place and in a rolling fashion with one node being
patched
at a time while the database instance(s) on that node remain up and running
Supported for Oracle RAC and RAC One Node clusters with two or more
nodes

Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

Semelhante a Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019 (20)

Mais de Sandesh Rao

Mais de Sandesh Rao (20)

Último

Último (20)

Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019