Thursday, September 27, 2012

10.2.0.3 - Mounting a diskgroup fails with error ORA-600 [kfcema02]

Just yesterday I became familiar with a Bug (6163771) that can rear its ugly head when ever you perform a SHUTDOWN ABORT. This was reported fixed in 11.1.0.7 and has the following description details:

During instance recovery, mounting a diskgroup can fail with ORA-600[KFCEMA02].
There is a mismatch between the FCN recorded in the block and the FCN recorded
in the ACD.  block FCN < ACD fcn.

The top functions in the call stack are:

    kfgInitCache -> kfcMount ->kfrcrv -> kfrPass2 -> kfcema 

The trace file contains the FCN for the current block been recovered.  
eg:

 kfbh_kfcbh.fcn_kfbh = 0.5538283

 BH: (0x3807959c0)  bnum=13 type=FILEDIR state=rcv chgSt=not modifying
      flags=0x00000000  pinmode=excl  lockmode=null  bf=0x38040c000
      kfbh_kfcbh.fcn_kfbh = 0.5538283  lowAba=0.0  highAba=0.0
      last kfcbInitSlot return code=null cpkt lnk is null

The ACD fcn is the second argument on the ORA-600 [KFCEMA02]

This patch does not fix a diskgroup with the error already introduced.  
It will prevent future occurrences.

Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23
Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]

PROBLEM:
--------
The cusotmer had a maintenance window (for something else) this morning on 
this development RAC. We could not shutdown cleanly. Then after the 
maintenance window, FRA diskgroup would not mounted.

Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23
Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]

WORKAROUND:
-----------
N/A

REPRODUCIBILITY:
----------------
At will


STACK TRACE:
------------
ksedmp kgerinv kgeasnmierr kfcema kfrPass2 kfrcrv
kfcMount kfgInitCache kfgFinalizeMount 3088 kfgscFinalize kfgForEachKfgsc
kfgsoFinalize kfgFinalize kfxdrvMount kfxdrvEntry opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino
opiodr opidrv sou2o opimai_real...

SUPPORTING INFORMATION:
-----------------------
Alert log and trace file uploaded

PROGRAMMING DETAILS:
-----------------------
Development has found a bug in the way checkpoints are maintained and this 
bug is the probable cause of the kfcema02 assert these customers are seeing.  
 We have a high degree of confidence that the bug we found is the cause of 
the customer issues because of what we saw in the AMDU dumps.

The problem is that buffers on the ping queue are not sorted in any 
particular order.  The fix is for kfrbCkpt to scan the entire ping queue to 
find the oldest buffer when computing the new checkpoint.   kfcbDriver is 
also updated to scan the entire ping queue when computing the targetAba for 
kfcbCkpt, but that code change is not critical because the only effect of 
having the targetAba be higher than it should be was that DBWR would write 
more dirty buffer than it really needed to.

After reading this BLOG from awhile ago on ORACLE-L - I was not encouraged to say the least.

Reaching out to Oracle Support helped solved the problem with employing a 11g Tool (can also run on 10g) called: facp (and AMDU). AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks. As many other tools released with 11g, it can be used on 10g environments.  Note 553639.1  is the placeholder for the different platforms. The note include also instructions for the configuration. It only needs to be configured (not run) for this fix since facp calls the AMDU.

Steps taken to resolve:

Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.

Download the script facp from SR attachment.

Then, ACD Scanning and generation of pertinent files,

$./facp '/dev/oracleasm/disks*' 'DG6' ALL

And then it will generate files named like facp* in same directory.

Then try to adjust all checkpoints  by 10 blocks:

./facp_adjust  -10

Used after adjusting the checkpoints to verify they are valid.

$./facp_check

If you adjusted too much facp_check will not print "Valid Checkpoint". Try adjusting less. 
Till get "Valid Checkpoint" for both thread.

Once facp_check reports "Valid Checkpoint" for all threads, it's the indication
to proceed with the real patching, which means, updating the ACD records

Write ASM metadata with the new data:

$./facp_patch

Then try to mount this diskgroup manually:

SQL> alter diskgroup DG6 mount; --------->> ASM sqlplus

SQL> select name,state from v$asm_diskgroup; --------->> ASM sqlplus


Everything showed MOUNTED and was able to bring up our Production DB.

If you experience this issue - log a SR with Oracle Support for these tools if not already on your system.




Monday, September 24, 2012

Missing Password Files

I don't usually log in as SYSDBA for normal activities. I prefer to login as the main EBS application user most of the time. However, when trying to login from SQLPLUSW.exe on my local machine, once I specified "as sysdba" - I would receive this:

ERROR:
ORA-01031: insufficient privileges

At this point I would always ssh into a terminal session directly to the server and logon with:

./sqlplus / as sysdba

Later I would notice that:

SQL> show parameter password;

NAME     TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile     string EXCLUSIVE

But when I would query:

SQL> select * from v$pwfile_users;

no rows selected

So, maybe I just need to add a user to the pw file?


SQL> create user my_temp identified by my_temp;

User created.

SQL> grant sysdba to my_temp;
grant sysdba to my_temp
*
ERROR at line 1:
ORA-01994: GRANT failed: password file missing or disabled

When looking at: $ORACLE_HOME/dbs, I did not see any orapwd<SID> file so I decided to try and create one (in Test) using:

ORACLE_HOME/bin/orapwd file=orapworadev1 password=<password>  entries=5

Once I did this, I was able to:

SQL> select * from v$pwfile_users;

USERNAME       SYSDB SYSOP
------------------------------ ----- -----
SYS       TRUE  TRUE

and log into the database with the SYSDBA privilege remotely.







Wednesday, September 19, 2012

Growing MLOG$ Tables with EBS 12

There are several snapshot tables (materialized views) in EBS that are set to Fast Refresh on Demand that, if not accessed, will grow to very large sizes. It's good to take a look at the what schemas these tables might reside in using the following SQL:

  SELECT owner, object_name
    FROM dba_objects
   WHERE object_name LIKE 'MLOG$%' AND object_type = 'TABLE'
ORDER BY owner, object_name

Added Tip: Or even better, run this to see which MLOG tables you should address first:


  SELECT *
    FROM (  SELECT owner, table_name, TO_NUMBER (EXTRACTVALUE (xmltype (DBMS_XMLGEN.getxml ('select count(*) c from ' || owner || '.' || table_name)), '/ROWSET/ROW/C')) COUNT
              FROM all_tables
             WHERE table_name LIKE 'MLOG$%'
          ORDER BY owner, table_name)
ORDER BY COUNT DESC

If you stats are current, you can quickly get an idea of the impact of this on your system. In my case, I had several snapshot logs (MLOG$ tables) that were over 1m rows and as high as 17m rows.

Using Toad, it might be as simple as navigating to the underlying materialized view and right-click and doing a Refresh (Complete, Atomic = False).  To find out what MV this log is tied to, you can use this:

SELECT a.OWNER,
       a.MASTER,
       a.MVIEW_LAST_REFRESH_TIME,
       m.OWNER,
       m.NAME,
       m.MVIEW_SITE
  FROM all_base_table_mviews a, ALL_REGISTERED_MVIEWS m
 WHERE m.MVIEW_ID = a.MVIEW_ID AND a.MASTER = 'MTL_MATERIAL_TRANSACTIONS_TEMP'



To find out what a.MASTER is looking for, simply browse the DDL for the MLOG$ object. You will see a comment made:

COMMENT ON TABLE INV.MLOG$_MTL_MATERIAL_TRANSAC IS 'snapshot log for master table INV.MTL_MATERIAL_TRANSACTIONS_TEMP';



When refreshing, you may encounter the following error(s) however:

BEGIN
  DBMS_SNAPSHOT.REFRESH(
    LIST                 => 'INV.MTL_SYS_ITEMS_SN'
   ,METHOD               => 'C'
   ,PUSH_DEFERRED_RPC    => TRUE
   ,REFRESH_AFTER_ERRORS => FALSE
   ,PURGE_OPTION         => 1
   ,PARALLELISM          => 0
   ,ATOMIC_REFRESH       => FALSE
   ,NESTED               => FALSE);
END;
Error at line 2
ORA-12008: error in materialized view refresh path
ORA-08002: sequence MRP_AP_REFRESH_S.CURRVAL is not yet defined in this session
ORA-06512: at "APPS.MRP_SN_SYS_ITEMS_T1", line 7
ORA-04088: error during execution of trigger 'APPS.MRP_SN_SYS_ITEMS_T1'
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2254
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2460
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2429
ORA-06512: at line 2

At this point, it is just as easy to drop and recreate the MV. If you had very large MLOG$ tables, you will have to do the following to return space back to the tablespace:

11:24:00 GOLD1> alter materialized view log on INV.MTL_MATERIAL_TRANSACTIONS_TEMP enable row movement;

Materialized view log altered.

11:30:33 GOLD1> alter materialized view log on INV.MTL_MATERIAL_TRANSACTIONS_TEMP shrink space compact;

Materialized view log altered.

11:30:53 GOLD1> alter materialized view log on INV.MTL_MATERIAL_TRANSACTIONS_TEMP shrink space;

Materialized view log altered.

11:31:53 GOLD1> alter materialized view log on INV.MTL_MATERIAL_TRANSACTIONS_TEMP disable row movement;

Materialized view log altered.

11:32:28 GOLD1> alter materialized view log on INV.MTL_MATERIAL_TRANSACTIONS_TEMP deallocate unused;

Materialized view log altered.


That should take care of this Maintenance task. Moving forward, you may want to schedule a regular job to address these by using combinations of :

DBMS_MVIEW.REFRESH('INV.MTL_MTRX_TMP_SN', 'C', atomic_refresh=>FALSE);

and dropping/creating the MV's that cannot be refreshed.

Monday, September 17, 2012

Bulk Processing with BULK COLLECT and FORALL

Here's a great article discussing using BULK COLLECT and FORALL when doing mass updates as opposed to looping through each row and incurring a context switch between PL/SQL and SQL Engines:

http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html

Thanks Steven!


11/21/2012:  Also see this video: http://www.youtube.com/watch?v=Zdpw9z05LWQ