Troubleshooting Databases Hang Due to Heavy Contention for ‘library cache: mutex X’ Waits (Oracle 11.2 and Later) (Doc ID 2051456.1)

APPLIES TO:
Oracle Database – Enterprise Edition – Version 11.2.0.3 and later
Information in this document applies to any platform.

PURPOSE
Provides some basic information to enable DBAs to identify library cache mutex related problems and raises awareness on known issues and actions to be
taken.

TROUBLESHOOTING STEPS
Identifying KGL mutex (library cache: mutex X) issues related to mutex recovery problems in 11.2.0.4

Of all the issues related to KGL mutex (library cache: mutex X) in 11.2.0.4 a significant proportion relate to mutex recovery problems in highly concurrent
environments.
Problem Synopsis
Normally, whenever the KGL mutex is accessed, the mutex data structure is updated such that the current session accessing the mutex is the holder and the
reference count is incremented. This is an Atomic operation and as such is very fast, short and small. Once the current session is done with the atomic
operation, the holder information is cleared so that other sessions can repeat the same operation. If, due to some unforeseen reason, the process/session that
holds the KGL mutex gets killed before clearing the mutex holder. Similarly, there are other instances of Bad or Broken Mutex Recovery when recovery could
get skipped. If this happens then PMON skips the recovery thereby leading to heavy contention on the KGL mutex. Any process(es) requesting the mutex with
the invalid holder information will hang because the holder will never be freed and waits for ‘library cache: mutex X’ will be seen. If this happens then the KGL
Mutex will be assigned to the pseudo-session id assigned to session-less processes. This can be used to identify that Bad or Broken Mutex Recovery has
occurred.
How to Identify the pseudo-SID
From V$ACTIVE_SESSION_HISTORY or DBA_HIST_ACTIVE_SESS_HISTORY output, one could find huge number of sessions waiting on wait-event ‘library
cache: mutex X’. The value of the mutex is recorded in the P2 column of type number. This number can be converted into hex as follows:
Column P2 value = 2.8146E+14
=> 281460000000000
=> 0xFFFC83518800
Once in hex format, the value 0xFFFC83518800 is made up of 2 parts :

(Leading zeroes added for clarity)
upper 8-bits => 0x0000FFFC which is the session ID (SID) of the Mutex owner which can be identified in trace once it has been converted back to
decimal:
SID = 0xFFFC => 65532
۲ of 465532 is the pseudo-session id assigned to session-less processes who acquire KGL mutex.
you can find this in the KGX AOL (Atomic Operation Log) section of systemstate dump trace files. For example:

KGX Atomic Operation Log 7000107e5380090
Mutex 700010814ddd4f8(65532, 0) idn b518e53f oper EXCL(6)
Library Cache uid 65532 efd 11 whr 91 slp 0
oper=0 pt1=700010814ddd3b8 pt2=7000107d92f1e30 pt3=0
pt4=0 pt5=0 ub4=58687

This section show that the Mutex has been acquired by the pseudo-sid “uid 65532”. It is this acquisition by the pseudo-sid that indicates
lower 8-bits 0x83518800 is the Mutex reference count which is used for concurrency.
For more details of this expansion see:
Document 1298015.1 WAITEVENT: “cursor: pin S wait on X” Reference Note
Workarounds
Restart the database
or
Kill the process associated with the pseudo-sid marked as the mutex holder.
Note: If the process already died, then as a last resort, “kill -9” can be attempted when killing is not possible from Oracle
Known Issues
The following are known issues and regression in KGL layer which causes the problem. These should be regarded as mandatory fixes to avoid KGL mutex
issues on 11.2.0.4 (or 12.1) due to Bad or Broken Mutex Recovery:
۳ of 4Unpublished BUG 13542050 – USE OF KGL MUTEXES MIGHT BLOCK ON BOGUS MUTEX HOLDER – not fixed in 12.1.0.2
Unpublished BUG 14773743 – SIGSEGV IN KGLMUTEXCLEANUPALL, GENERIC SGA REFERENCE DURING SHUTDOWN – fixed in 12.1.0.1
BUG 18513891 – CORE CONCURRENT PROCESSING TARGET IS PENDING CAUSING OTHERS TO BE PENDING TOO – fixed in 12.1.0.2
Unpublished BUG 16065166 – PERFORMANCE REGRESSION IN KGL FUNCTIONS – fixed in 12.1.0.2
Unpublished BUG 21260397 – ASSERT BEFORE SETTING PSEUDO-SID AS AOL HOLDER – not fixed in 12.1.0.2
Following is the known issue in KKS layer which causes huge cursor: mutex X waits.
Bug 26582460 – INSTANCE HANGS WITH SMON WAITING ON CURSOR: MUTEX X – not fixed in 12.1.0.2, 12.2.0.1
Fix for Bug 21260397, by and large, introduces sanity checks rather than code-fix.
Hence the recommendation is to apply all the fixes in the above list except 21260397 because it has complexities.
Patches
If merge patches are required. Please contact Oracle Support to get these. Currently, the Patch 21287813 is available only for 11.2.0.4.4 under AIX 64bit.
Should the KGL mutex problems occur in 12.1.0.2 Fix for Bug:13542050 will have to be applied as it is fixed in 12.2 only.
Note that this fix 13542050 has been superseded by the fix in unpublished Bug:24739928. Please apply Patch 24739928 instead.
Didn’t find what you are looking for?

0 پاسخ

دیدگاه خود را ثبت کنید

تمایل دارید در گفتگوها شرکت کنید؟
در گفتگو ها شرکت کنید.

دیدگاهتان را بنویسید