I generally don’t get time to play with single instance ASM since I in live the RAC world so much. But I needed to quickly create a 12c PDB configuration over ASM.
If you recall from the 11gR2 tho is straightforward install and configuration of Grid Infrustructure. However, there are cases where the install/config doesn’t go smoothly. This exactly what happened in my case. Not sure if was an user error or just a 12c bug (the code is barely in the field) or combination of the two.
In any case, what this blog is going to touch on is how you recover, or rather escape from a mal-configured/messed up 12c Grid Infrastructure for Standalone Cluster install.
Since everybody loves logs and traces, I’ll walk through some of the issues. First off the, installation of the software went w/o out a hitch, its the scarey-tentative “root.sh” that went wacko
Here’s the error message from GI alert log:
OHASD starting
Timed out waiting for init.ohasd script to start; posting an alert
OHASD exiting; Could not init OLR
OHASD stderr redirected to ohasdOUT.log
Here’s the trace info from ohasd.log:
2013-08-16 13:10:47.347: [ default][357553728] OHASD Daemon Starting. Command string :reboot 2013-08-16 13:10:47.347: [ default][357553728] OHASD params [] 2013-08-16 13:10:47.662: [ default][357553728] 2013-08-16 13:10:47.662: [ default][357553728] Initializing OLR 2013-08-16 13:10:47.662: [ default][357553728]proa_init: OLR Abstraction layer initialization. Bootlevel:[1] 2013-08-16 13:10:47.670: [ OCRAPI][357553728]a_init: Successfully initialized the patch management context. 2013-08-16 13:10:47.670: [ OCRAPI][357553728]a_init: Successfully initialized the OLR specific states. 2013-08-16 13:10:47.670: [ OCRAPI][357553728]a_init:13: Clusterware init successful 2013-08-16 13:10:47.670: [ OCRAPI][357553728]a_init:15: Successfully initialized the Cache layer. 2013-08-16 13:10:47.670: [ OCRRAW][357553728]proprioo: opening OCR device(s) 2013-08-16 13:10:47.670: [ OCRRAW][357553728]proprioo: Successfully opened the non-ASM locations if configured. 2013-08-16 13:10:47.670: [ OCRRAW][357553728]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1) 2013-08-16 13:10:47.670: [ OCRRAW][357553728]proprioo: my id set: (799232119, 1028247821, 0, 0, 0) 2013-08-16 13:10:47.671: [ OCRRAW][357553728]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0) 2013-08-16 13:10:47.671: [ OCRRAW][357553728]proprioo: 2nd set: (0, 0, 0, 0, 0) 2013-08-16 13:10:47.671: [ OCRRAW][357553728]proprinit: Successfully initialized the I/O module (proprioini). 2013-08-16 13:10:47.671: [ OCRRAW][357553728]proprinit: Successfully initialized the backend handle (propribctx). 2013-08-16 13:10:47.671: [ OCRAPI][357553728]proa_init: Successfully initialized the Storage Layer. 2013-08-16 13:10:47.674: [ OCRAPI][357553728]proa_init: Successfully initlaized the Messaging Layer.
<---- everything okay this point
2013-08-16 13:10:47.698: [ OCRAPI][357553728]a_init:18!: Thread init unsuccessful : [24] 2013-08-16 13:10:47.742: [ CRSOCR][357553728] OCR context init failure. Error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1] 2013-08-16 13:10:47.743: [ default][357553728] Created alert : (:OHAS00106:) : OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1] 2013-08-16 13:10:47.743: [ default][357553728][PANIC] OHASD exiting; Could not init OLR 2013-08-16 13:10:47.743: [ default][357553728] Done. 2013-08-16 13:27:35.715: [ default][2626647616] Created alert : (:OHAS00117:) : TIMED OUT WAITING FOR OHASD MONITOR 2013-08-16 13:27:35.716: [ default][2626647616] OHASD Daemon Starting. Command string :reboot 2013-08-16 13:27:35.716: [ default][2626647616] OHASD params [] 2013-08-16 13:27:35.717: [ default][2626647616] 2013-08-16 13:27:35.717: [ default][2626647616] Initializing OLR 2013-08-16 13:27:35.717: [ default][2626647616]proa_init: OLR Abstraction layer initialization. Bootlevel:[1] 2013-08-16 13:27:35.724: [ OCRAPI][2626647616]a_init: Successfully initialized the patch management context. 2013-08-16 13:27:35.724: [ OCRAPI][2626647616]a_init: Successfully initialized the OLR specific states. 2013-08-16 13:27:35.724: [ OCRAPI][2626647616]a_init:13: Clusterware init successful 2013-08-16 13:27:35.724: [ OCRAPI][2626647616]a_init:15: Successfully initialized the Cache layer. 2013-08-16 13:27:35.724: [ OCRRAW][2626647616]proprioo: opening OCR device(s) 2013-08-16 13:27:35.724: [ OCRRAW][2626647616]proprioo: Successfully opened the non-ASM locations if configured. 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1) 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprioo: my id set: (799232119, 1028247821, 0, 0, 0) 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0) 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprioo: 2nd set: (0, 0, 0, 0, 0) 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprinit: Successfully initialized the I/O module (proprioini). 2013-08-16 13:27:35.725: [ OCRRAW][2626647616]proprinit: Successfully initialized the backend handle (propribctx). 2013-08-16 13:27:35.725: [ OCRAPI][2626647616]proa_init: Successfully initialized the Storage Layer. 2013-08-16 13:27:35.726: [ OCRAPI][2626647616]proa_init: Successfully initlaized the Messaging Layer. 2013-08-16 13:27:35.731: [ OCRMSG][2608776960]prom_listen: Failed to listen at endpoint [1] 2013-08-16 13:27:35.732: [ OCRMSG][2608776960]GIPC error [1] msg [gipcretFail] 2013-08-16 13:27:35.732: [ OCRSRV][2608776960]th_listen: prom_listen failed retval= 24, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))] 2013-08-16 13:27:35.732: [ OCRSRV][2626647616]th_init: Local listener did not reach valid state <---- This can mean some issue with network socket file location or permission. 2013-08-16 13:27:35.732: [ OCRAPI][2626647616]a_init:18!: Thread init unsuccessful : [24] 2013-08-16 13:27:35.776: [ CRSOCR][2626647616] OCR context init failure. Error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1] 2013-08-16 13:27:35.776: [ default][2626647616] Created alert : (:OHAS00106:) : OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1] 2013-08-16 13:27:35.776: [ default][2626647616][PANIC] OHASD exiting; Could not init OLR 2013-08-16 13:27:35.777: [ default][2626647616] Done.
After triaging a bit I think we had some wrong permissions on the directory strcutures, plus some hostname stuff wasn’t accurate. Hopefully that was it. Now let’s re-do the following to recover and move forward:
I first tried to stop HAS to ensure its not active
[root@pdb12c grid]# crsctl stop has -f CRS-4639: Could not contact Oracle High Availability Services CRS-4000: Command Stop failed, or completed with errors. [root@pdb12c grid] cd /u01/app/oracle/product/12.1.0/grid/crs/install Let's try to execute deconfig to fix the broken configuration: [root@pdb12c install]# ./roothas.pl -deconfig -force Using configuration parameter file: ./crsconfig_params CRS-4047: No Oracle Clusterware components configured. CRS-4000: Command Stop failed, or completed with errors. CRS-4047: No Oracle Clusterware components configured. CRS-4000: Command Delete failed, or completed with errors. CRS-4047: No Oracle Clusterware components configured. CRS-4000: Command Stop failed, or completed with errors. 2013/08/17 15:35:04 CLSRSC-357: Failed to stop current Oracle Clusterware stack during upgrade 2013/08/17 15:35:05 CLSRSC-180: An error occurred while executing the command '/etc/init.d/ohasd deinstall' (error code -1) Failure in execution (rc=-1, 0, Inappropriate ioctl for device) for command /etc/init.d/ohasd deinstall 2013/08/17 15:35:05 CLSRSC-337: Successfully deconfigured Oracle Restart stack
Hopefully this deconfig and stop stack worked:
[
root@pdb12c install]# ps -ef|grep has root 5679 2968 0 15:35 pts/0 00:00:00 grep has Stack is down so let's re-run root.sh [root@pdb12c grid]# ./root.sh Performing root user operation for Oracle 12c The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid/crs/install/crsconfig_params LOCAL ADD MODE Creating OCR keys for user 'oracle', privgrp 'oinstall'.. Operation successful. LOCAL ONLY MODE Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. CRS-4664: Node pdb12c successfully pinned. 2013/08/17 15:35:41 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf' pdb12c 2013/08/17 15:35:58 /u01/app/oracle/product/12.1.0/grid/cdata/pdb12c/backup_20130817_153558.olr 2013/08/17 15:37:50 CLSRSC-327: Successfully configured Oracle Grid Infrastructure for a Standalone Server
Let’s verify this successful OLR initialization by looking at ohasd.log:
2013-08-17 15:35:46.993: [ default][2945685056] OHASD Daemon Starting. Command string :reboot 2013-08-17 15:35:46.993: [ default][2945685056] OHASD params [] 2013-08-17 15:35:46.994: [ default][2945685056] 2013-08-17 15:35:46.994: [ default][2945685056] Initializing OLR 2013-08-17 15:35:46.994: [ default][2945685056]proa_init: OLR Abstraction layer initialization. Bootlevel:[1] 2013-08-17 15:35:46.998: [ OCRAPI][2945685056]a_init: Successfully initialized the patch management context. 2013-08-17 15:35:46.998: [ OCRAPI][2945685056]a_init: Successfully initialized the OLR specific states. 2013-08-17 15:35:46.998: [ OCRAPI][2945685056]a_init:13: Clusterware init successful 2013-08-17 15:35:46.998: [ OCRAPI][2945685056]a_init:15: Successfully initialized the Cache layer. 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: opening OCR device(s) 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: Successfully opened the non-ASM locations if configured. 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1) 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: my id set: (799232119, 1028247821, 0, 0, 0) 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0) 2013-08-17 15:35:46.998: [ OCRRAW][2945685056]proprioo: 2nd set: (0, 0, 0, 0, 0) 2013-08-17 15:35:46.999: [ OCRRAW][2945685056]proprinit: Successfully initialized the I/O module (proprioini). 2013-08-17 15:35:46.999: [ OCRRAW][2945685056]proprinit: Successfully initialized the backend handle (propribctx). 2013-08-17 15:35:46.999: [ OCRAPI][2945685056]proa_init: Successfully initialized the Storage Layer. 2013-08-17 15:35:47.000: [ OCRAPI][2945685056]proa_init: Successfully initlaized the Messaging Layer. 2013-08-17 15:35:47.003: [ OCRAPI][2945685056]a_init:18: Thread init successful 2013-08-17 15:35:47.003: [ OCRAPI][2945685056]a_init:19: Client init successful 2013-08-17 15:35:47.003: [ OCRAPI][2945685056]a_init:21: OLR init successful. Init Level [1] <--- this is a good sign, but we still need to ensure OHASD starts up and initializes the CRS Policy Engine: 2013-08-17 15:35:47.003: [ default][2945685056] Checking version compatibility... 2013-08-17 15:35:47.003: [ default][2945685056]clsvactversion:4: Retrieving Active Version from local storage. 2013-08-17 15:35:47.004: [ default][2945685056] Version compatibility check passed: Software Version: 12.1.0.1.0 Release Version: 12.1.0.1.0 Active Version: 12.1.0.1.0 2013-08-17 15:35:47.004: [ default][2945685056] Running mode check... 2013-08-17 15:35:47.004: [ default][2945685056] OHASD running as the Non-Privileged user <--- this is a also good sign, getting there... 2013-08-17 15:35:47.190: [ CRSPE][2889111296] {0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running] <--- PE is running, getting there some more... 2013-08-17 15:35:47.190: [ CRSPE][2889111296] {0:0:2} Processing pending join requests: 1 2013-08-17 15:35:47.190: [UiServer][2413815552] UI comms listening for GIPC events. 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} Special Value map for : pdb12c 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} CRS_CSS_NODENAME=pdb12c 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} CRS_CSS_NODENUMBER=0 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} CRS_CSS_NODENUMBER_PLUS1=1 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} CRS_HOME=/u01/app/oracle/product/12.1.0/grid 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} Server Attributes for : pdb12c 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} ACTIVE_CSS_ROLE=UNAVAILABLE 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} CONFIGURED_CSS_ROLE= 2013-08-17 15:35:47.191: [ CRSPE][2889111296] {0:0:2} Server [pdb12c] has been registered with the PE data model 2013-08-17 15:35:47.191: [ AGFW][2899617536] {0:0:2} Agfw Proxy Server received the message: PE_HANDHSAKE[Proxy] ID 20487:33 2013-08-17 15:35:47.191: [ AGFW][2899617536] {0:0:2} Received handshake message from PE. <--- PE is initialized, getting even closer... 2013-08-17 15:35:47.192: [ AGFW][2899617536] {0:0:2} Added resource type: application 2013-08-17 15:35:47.192: [ AGFW][2899617536] {0:0:2} Added resource type: cluster_resource 2013-08-17 15:35:47.192: [ AGFW][2899617536] {0:0:2} Added resource type: generic_application 2013-08-17 15:35:47.192: [ AGFW][2899617536] {0:0:2} Added resource type: local_resource I think the basic GI stack is there, let's verify: [root@pdb12c ohasd]# ps -ef|grep oracle root 3241 3232 0 14:23 pts/2 00:00:00 su - oracle oracle 3242 3241 0 14:23 pts/2 00:00:00 -bash oracle 5959 1 0 15:35 ? 00:00:02 /u01/app/oracle/product/12.1.0/grid/bin/ohasd.bin reboot oracle 6067 1 0 15:35 ? 00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/oraagent.bin oracle 6080 1 0 15:35 ? 00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/evmd.bin oracle 6154 6080 0 15:35 ? 00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/evmlogger.bin -o /u01/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.log Check crsctl stat res [root@pdb12c ohasd]# crsctl stat res -init -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ons OFFLINE OFFLINE pdb12c STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.cssd 1 OFFLINE OFFLINE STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.evmd 1 ONLINE ONLINE pdb12c STABLE --------------------------------------------------------------------------------
Odd that CSS hasn't started yet, and even odder that ASM is not instantiated. UGH!! Now we have to do some stitch work. Again, all this is unnecessary, its because installer didn't finish its work that we have do this. So, let's add Local Listener and ASM. In this order !! [oracle@pdb12c grid]$ srvctl add listener [oracle@pdb12c grid]$ srvctl config ons ONS exists: Local port 6100, remote port 6200, EM port 2016 [oracle@pdb12c grid]$ srvctl config listener Name: LISTENER Home: /u01/app/oracle/product/12.1.0/grid [oracle@pdb12c grid]$ srvctl add asm [oracle@pdb12c grid]$ srvctl config asm ASM home: /u01/app/oracle/product/12.1.0/grid Password file: ASM listener: LISTENER Spfile: ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++ <--- Notice that ASM has no SPFile associated with yet, but we still start it with default parameters [oracle@pdb12c grid]$ srvctl start asm [oracle@pdb12c asmca]$ ps -ef|grep asm root 51260 2 0 15:37 ? 00:00:00 [asmWorkerThread] root 51261 2 0 15:37 ? 00:00:00 [asmWorkerThread] root 51262 2 0 15:37 ? 00:00:00 [asmWorkerThread] root 51263 2 0 15:37 ? 00:00:00 [asmWorkerThread] root 51264 2 0 15:37 ? 00:00:00 [asmWorkerThread] oracle 53092 1 0 16:26 ? 00:00:00 asm_pmon_+ASM oracle 53094 1 0 16:26 ? 00:00:00 asm_psp0_+ASM oracle 53096 1 3 16:26 ? 00:00:01 asm_vktm_+ASM oracle 53100 1 0 16:26 ? 00:00:00 asm_gen0_+ASM oracle 53102 1 0 16:26 ? 00:00:00 asm_mman_+ASM oracle 53106 1 0 16:26 ? 00:00:00 asm_diag_+ASM oracle 53108 1 0 16:26 ? 00:00:00 asm_dia0_+ASM oracle 53110 1 0 16:26 ? 00:00:00 asm_dbw0_+ASM oracle 53112 1 0 16:26 ? 00:00:00 asm_lgwr_+ASM oracle 53115 1 0 16:26 ? 00:00:00 asm_ckpt_+ASM oracle 53117 1 0 16:26 ? 00:00:00 asm_smon_+ASM oracle 53119 1 0 16:26 ? 00:00:00 asm_lreg_+ASM oracle 53121 1 0 16:26 ? 00:00:00 asm_rbal_+ASM oracle 53123 1 0 16:26 ? 00:00:00 asm_gmon_+ASM oracle 53125 1 0 16:26 ? 00:00:00 asm_mmon_+ASM oracle 53127 1 0 16:26 ? 00:00:00 asm_mmnl_+ASM Now run asmca to create the disk group using ASMCALet's check crsctl stat res again [oracle@pdb12c asmca]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE pdb12c STABLE ora.PDBDATA.dg ONLINE ONLINE pdb12c STABLE ora.asm ONLINE ONLINE pdb12c Started,STABLE ora.ons OFFLINE OFFLINE pdb12c STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.cssd 1 ONLINE ONLINE pdb12c STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.evmd 1 ONLINE ONLINE pdb12c STABLE -------------------------------------------------------------------------------- Cool, we now have majority of the stack GI stack started!! Also notice that the PDBDATA disk group resource got created auto magically when the disk group was created and mounted (this was the same case in 11gR2) But I wonder how cssd got placed in the ONLINE state, we didn't change that state directly: The answer to that has to do with the startup dependencies for ASM resource. In this case its "hard pull-up" dependency, also ASM has a "weak" dependency on Listener, so that got Onlined too. We can see that from crsctl stat res ora.asm -p [oracle@pdb12c asmca]$ crsctl stat res ora.asm -p NAME=ora.asm TYPE=ora.asm.type …. …. START_DEPENDENCIES=hard(ora.cssd) weak(ora.LISTENER.lsnr) …. …. STOP_DEPENDENCIES=hard(ora.cssd) Note ASM is using the basic/generic init.ora file. So let's create a real usable one: cat $HOME/init+ASM.ora sga_target=1536M asm_diskgroups='PDBDATA' asm_diskstring='/dev/sd*' instance_type='asm' remote_login_passwordfile='EXCLUSIVE' SQL> create spfile='+PDBDATA' from pfile='$HOME/init+ASM.ora' 2 ; File created.
To really validate that GI stack is up and running and that ASM is cool, just for fun let's create an ACFS filesystem. This validates the communication between layers of HAS/CRS stack, ASM as well the Policy Engine:...
[root@pdb12c ohasd]# mkdir -p /u01/app/oracle/acfsmounts/pdbdata_pdbvol1 [root@pdb12c ohasd]# [root@pdb12c ohasd]# /bin/mount -t acfs /dev/asm/pdbvol1-339 /u01/app/oracle/acfsmounts/pdbdata_pdbvol1 [root@pdb12c ohasd]# df -ha Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_12crac1-lv_root 19G 12G 5.8G 67% / proc 0 0 0 - /proc sysfs 0 0 0 - /sys devpts 0 0 0 - /dev/pts tmpfs 1.4G 637M 792M 45% /dev/shm /dev/sda1 485M 55M 405M 12% /boot /dev/asm/pdbvol1-339 1.0G 41M 984M 4% /u01/app/oracle/acfsmounts/pdbdata_pdbvol1 ANd there you have it! A re-sticthed back GI stack. Again, I hope nobody has to go through that, but now at least you know!!
Now next step is to create the PDB database over ASM!!