Alter User: 10.2.0.3 - Mounting a diskgroup fails with error ORA-600 [kfcema02]

Just yesterday I became familiar with a Bug (6163771) that can rear its ugly head when ever you perform a SHUTDOWN ABORT. This was reported fixed in 11.1.0.7 and has the following description details:

During instance recovery, mounting a diskgroup can fail with ORA-600[KFCEMA02].

There is a mismatch between the FCN recorded in the block and the FCN recorded

in the ACD. block FCN < ACD fcn.

The top functions in the call stack are:

kfgInitCache -> kfcMount ->kfrcrv -> kfrPass2 -> kfcema

The trace file contains the FCN for the current block been recovered.

eg:

kfbh_kfcbh.fcn_kfbh = 0.5538283

BH: (0x3807959c0) bnum=13 type=FILEDIR state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x38040c000

kfbh_kfcbh.fcn_kfbh = 0.5538283 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null

The ACD fcn is the second argument on the ORA-600 [KFCEMA02]

This patch does not fix a diskgroup with the error already introduced.

It will prevent future occurrences.

Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23

Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]

PROBLEM:

--------

The cusotmer had a maintenance window (for something else) this morning on

this development RAC. We could not shutdown cleanly. Then after the

maintenance window, FRA diskgroup would not mounted.

Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23

Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]

WORKAROUND:

-----------

N/A

REPRODUCIBILITY:

----------------

At will

STACK TRACE:

------------

ksedmp kgerinv kgeasnmierr kfcema kfrPass2 kfrcrv

kfcMount kfgInitCache kfgFinalizeMount 3088 kfgscFinalize kfgForEachKfgsc

kfgsoFinalize kfgFinalize kfxdrvMount kfxdrvEntry opiexe opiosq0

kpooprx kpoal8 opiodr ttcpip opitsk opiino

opiodr opidrv sou2o opimai_real...

SUPPORTING INFORMATION:

-----------------------

Alert log and trace file uploaded

PROGRAMMING DETAILS:

-----------------------

Development has found a bug in the way checkpoints are maintained and this

bug is the probable cause of the kfcema02 assert these customers are seeing.

We have a high degree of confidence that the bug we found is the cause of

the customer issues because of what we saw in the AMDU dumps.

The problem is that buffers on the ping queue are not sorted in any

particular order. The fix is for kfrbCkpt to scan the entire ping queue to

find the oldest buffer when computing the new checkpoint. kfcbDriver is

also updated to scan the entire ping queue when computing the targetAba for

kfcbCkpt, but that code change is not critical because the only effect of

having the targetAba be higher than it should be was that DBWR would write

more dirty buffer than it really needed to.

After reading this BLOG from awhile ago on ORACLE-L - I was not encouraged to say the least.

Reaching out to Oracle Support helped solved the problem with employing a 11g Tool (can also run on 10g) called: facp (and AMDU). AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks. As many other tools released with 11g, it can be used on 10g environments. Note 553639.1 is the placeholder for the different platforms. The note include also instructions for the configuration. It only needs to be configured (not run) for this fix since facp calls the AMDU.

Steps taken to resolve:

Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.

Download the script facp from SR attachment.

Then, ACD Scanning and generation of pertinent files,

$./facp '/dev/oracleasm/disks*' 'DG6' ALL

And then it will generate files named like facp* in same directory.

Then try to adjust all checkpoints by 10 blocks:

./facp_adjust -10

Used after adjusting the checkpoints to verify they are valid.

$./facp_check

If you adjusted too much facp_check will not print "Valid Checkpoint". Try adjusting less.

Till get "Valid Checkpoint" for both thread.

Once facp_check reports "Valid Checkpoint" for all threads, it's the indication

to proceed with the real patching, which means, updating the ACD records

Write ASM metadata with the new data:

$./facp_patch

Then try to mount this diskgroup manually:

SQL> alter diskgroup DG6 mount; --------->> ASM sqlplus

SQL> select name,state from v$asm_diskgroup; --------->> ASM sqlplus

Everything showed MOUNTED and was able to bring up our Production DB.

If you experience this issue - log a SR with Oracle Support for these tools if not already on your system.

Alter User

Thursday, September 27, 2012

10.2.0.3 - Mounting a diskgroup fails with error ORA-600 [kfcema02]

No comments:

Post a Comment