FlexASM Deep Dive – Show Me the Output!!!

If you saw the first FlexASM blog you know we installed and configured FlexASM and a CDB plus a couple of PDBs. Also, this was Policy Managed with a cardinality of 2. Now let’s see what the configuration looks like, and we can break it down using the wonderful crsctl and srvctl tools

First let’s ensure we are really running in FlexASM mode:

[oracle@rac02 ~]$ asmcmd showclustermode
ASM cluster : Flex mode enabled

[oracle@rac02 ~]$ srvctl status serverpool -serverpool naboo
Server pool name: naboo
Active servers count: 2

[oracle@rac01 trace]$ crsctl get node role status -all
Node ‘rac01’ active role is ‘hub’
Node ‘rac03’ active role is ‘hub’
Node ‘rac02’ active role is ‘hub’
Node ‘rac04’ active role is ‘hub’

[oracle@rac01 ~]$ crsctl stat res -t
——————————————————————————–
Name Target State Server State details
——————————————————————————–
Local Resources
——————————————————————————–
ora.ASMNET1LSNR_ASM.lsnr
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
ONLINE ONLINE rac04 STABLE

You notice that we have 4 ASM listeners one on each node in the Cluster. You’ll see the process as the following on each node:

[oracle@rac01 ~]$ ps -ef |grep -i asmnet

ooracle 6646 1 0 12:19 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit

ora.CRSDATA.DATAVOL1.advm
ONLINE ONLINE rac01 Volume device /dev/a
sm/datavol1-194 is o
nline,STABLE
ONLINE ONLINE rac02 Volume device /dev/a
sm/datavol1-194 is o
nline,STABLE
ONLINE OFFLINE rac03 Unable to connect to
ASM,STABLE
ONLINE ONLINE rac04 Volume device /dev/a
sm/datavol1-194 is o
nline,STABLE
The datavol1 ADVM resource runs on all the nodes where indicated it should run. In this case we are seeing that RAC03 is having some issues.
Let’s look into that a little later. But I like the fact crsctl tells something is amiss here on node3

ora.CRSDATA.dg
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
OFFLINE OFFLINE rac04 STABLE

ora.FRA.dg
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
OFFLINE OFFLINE rac04 STABLE

The crsdata and fra disk groups resource is started on all nodes except node 4

ora.LISTENER.lsnr
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
ONLINE ONLINE rac04 STABLE

We all know, as in 11gR2, that this is the Node listener.

ora.PDBDATA.dg
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
OFFLINE OFFLINE rac04 STABLE

The pdbdata disk groups resource is started on all nodes except node 4

ora.crsdata.datavol1.acfs
ONLINE ONLINE rac01 mounted on /u02/app/
oracle/acfsmounts,ST
ABLE
ONLINE ONLINE rac02 mounted on /u02/app/
oracle/acfsmounts,ST
ABLE
ONLINE OFFLINE rac03 (2) volume /u02/app/
oracle/acfsmounts of
fline,STABLE
ONLINE ONLINE rac04 mounted on /u02/app/
oracle/acfsmounts,ST
ABLE

ACFS filesystem resource for datavol1 is started on all nodes except node3.
But I think the following has something to do w/ it :-). Need to debug this a bit later. I even tried:
[oracle@rac03 ~]$ asmcmd volenable –all
ASMCMD-9470: ASM proxy instance unavailable
ASMCMD-9471: cannot enable or disable volumes

ora.net1.network
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
ONLINE ONLINE rac04 STABLE
ora.ons
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE ONLINE rac03 STABLE
ONLINE ONLINE rac04 STABLE

The Network (in my case I only have only Net1) and ONS are same as in previous versions

ora.proxy_advm
ONLINE ONLINE rac01 STABLE
ONLINE ONLINE rac02 STABLE
ONLINE OFFLINE rac03 STABLE
ONLINE ONLINE rac04 STABLE

Yep, since proxy_advm is not started on node3, the filesystems won’t come online….but again, i’ll look at that later
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac02 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE rac03 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE rac04 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE rac01 169.254.90.36 172.16
.11.10,STABLE
ora.asm
1 ONLINE ONLINE rac03 STABLE
2 ONLINE ONLINE rac01 STABLE
3 ONLINE ONLINE rac02 STABLE

Since we have the cardinality of 3 ASM instance we have 3 ASM resources active

ora.cvu
1 ONLINE ONLINE rac01 STABLE
ora.mgmtdb
1 ONLINE ONLINE rac01 Open,STABLE
ora.oc4j
1 ONLINE ONLINE rac01 STABLE
ora.rac01.vip
1 ONLINE ONLINE rac01 STABLE
ora.rac02.vip
1 ONLINE ONLINE rac02 STABLE
ora.rac03.vip
1 ONLINE ONLINE rac03 STABLE
ora.rac04.vip
1 ONLINE ONLINE rac04 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac02 STABLE
ora.scan2.vip
1 ONLINE ONLINE rac03 STABLE
ora.scan3.vip
1 ONLINE ONLINE rac04 STABLE
ora.tatooine.db
1 ONLINE ONLINE rac01 Open,STABLE
2 ONLINE ONLINE rac02 Open,STABLE

As we stated above, I specified a Policy Managed database with cardinality of 2, so I have 2 database instances running
——————————————————————————–

Here’s some other important supporting info on FlexASm:

[oracle@rac02 ~]$ srvctl config asm -detail
ASM home: /u01/app/12.1.0/grid
Password file: +CRSDATA/orapwASM
ASM listener: LISTENER
ASM is enabled.
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM

[oracle@rac02 ~]$ srvctl status filesystem
ACFS file system /u02/app/oracle/acfsmounts is mounted on nodes rac01,rac02,rac04

ANd here’s what the Database has to say about FlexASM

NOTE: ASMB registering with ASM instance as client 0x10001 (reg:1377584805)
NOTE: ASMB connected to ASM instance +ASM1 (Flex mode; client id 0x10001)
NOTE: ASMB rebuilding ASM server state
NOTE: ASMB rebuilt 2 (of 2) groups
SUCCESS: ASMB reconnected & completed ASM server state

So for the interesting part:
If you notice that ASM is not running node 4:
[oracle@rac02 ~]$ srvctl status asm -v

ASM is running on rac01,rac02,rac03
[oracle@rac02 ~]$ srvctl status asm -detail
ASM is running on rac01,rac02,rac03

So, how does a client (ocrdump, rman, asmcmd, etc..) connect to if ASM if there is no ASM on that node. Well let’s test this using asmcmd on node4. You notice that a pipe is created, a connect string is generated and passed to ASMCMD to connect remotely to ASM2 on node2!!!!

22-Sep-13 12:54 ASMCMD Foreground (PID = 14106): Pipe /tmp/pipe_14106 has been found.
22-Sep-13 12:54 ASMCMD Background (PID = 14117): Successfully opened the pipe /tmp/pipe_14106
22-Sep-13 12:54 ASMCMD Foreground (PID = 14106): Successfully opened the pipe /tmp/pipe_14106 in read mode
NOTE: Executing kfod /u01/app/12.1.0/grid/bin/kfod op=getclstype..
22-Sep-13 12:54 Printing the connection string
contype =
driver =
instanceName = <>
usr =
ServiceName = <+ASM>
23-Sep-13 16:23 Successfully connected to ASM instance +ASM2
23-Sep-13 16:23 NOTE: Querying ASM instance to get list of disks
22-Sep-13 12:54 Registered Daemon process.
22-Sep-13 12:54 ASMCMD Foreground (PID = 14106): Closed pipe /tmp/pipe_14106.

Creating PDBs part 3

Due to so many people asking me other methods besides SQLplus for provisioning PDBs; such as OEM, DBCA, etc. In this blog entry I’ll DBCA, just because its simple to show. As I mentioned in my last PDB blog,
the installer DBCA (initial DBA invocation) looks different than the subsequent (post initial db creation).

The main DBCA screen shows the following pages. We will choose Manage Pluggable Database

PDB12c 2013 08 20 17 46 20

Choose the CDB, Note you could have many CDBs on the same Node or RAC cluster

PDB12c 2013 08 30 17 51 59

We choose our PDB that we created in Part 1 of the blog

PDB12c 2013 08 30 17 52 39

Ahh..we gotta open the PDB first. As before:

CDB$ROOT@YODA> alter session set container=pdbobi;
Session altered.

CDB$ROOT@YODA> alter pluggable database pdbobi open;

Pluggable database altered.

or CDB$ROOT@YODA> alter pluggable database all open;

PDB12c 2013 08 30 17 54 22

Now we can Add support for and configure Database Vault. Additionally, Label Security can be configured.
It would have been nice to enable and modify Resource Manager as well other PDB tasks.
But I get the fact that this DBCA is really driven for the PDB operations (plug,unplug, create and destroy PDB).
Bulk of the PDB admin tasks are provided in EM

PDB12c 2013 08 30 18 14 54

Let’s do a new PDB creation for grins 🙂

PDB12c 2013 08 30 18 21 01

Specify the PDB name, storage location, and a default tablespace. Again, it would have been nice to specify a TEMP tablespace too, but that was left out

PDB12c 2013 08 30 18 22 26

Progress ….

PDB12c 2013 08 30 18 23 18

And Completion….Pretty Straightforward

PDB12c 2013 08 30 18 22 53

Creating PDB’s part 2

Once we have installed 12.1 Database Software, we can create the Container Database and the Pluggable Databases. In my case I did a software only install then manually executed DBCA

In this blog entry I’ll show the screens that walk-thru the configuration of the “first” database. I noticed that once DBCA is used to create the initial database, the capability and options (screens) for DBCA are different; i.e., it much more aligned to create/manage additional databases. I’ll show those screens in Part 3 of PDB

So let’s get started by executing
$ $ORACLE_HOME/bin/dbca

Rac01 2013 09 15 22 39 12

Choose Advanced mode for Policy Managed Database or use “Default Configuration”. Being a big promoter of Policy Managed Databases and since I have 4 RAC nodes (my best practice threshold to choose Policy Managed), I’ll choose, that.

Rac01 2013 09 15 22 39 44

I’ll pick a Global Database name and choose PDB option, and also option to choose how many PDBs to create (with prefix)

Rac01 2013 09 15 22 40 30

Pick a Server Pool name, I chose a cardinality of 2

Rac01 2013 09 15 22 40 58

Define the Management Options

Rac01 2013 09 15 22 41 20

Choose the Storage locations

Rac01 2013 09 15 22 43 31

Define Database Vault Owner and also the Separate Account Manager. Note the user name definitions

Rac01 2013 09 15 22 45 31

And now the finish

Rac01 2013 09 15 22 56 41

12c PDB Multitenancy, Schema Consolidation, and whatever

Many of you have probably have heard me speak over the years (at OOW, local user groups and at the local bars) about the virtues of simplification, rationalization, and consolidation. I mentioned the different database consolidation and multi-tenancy models: Virtualization based, Database Instance and Schema consolidation.

The following papers I wrote [when I was at Oracle] touch in detail on this topic –
http://www.oracle.com/technetwork/database/database-cloud/database-cons-best-practices-1561461.pdf

And here’s a more current version of that paper., updated for 12c and PDB.
http://www.oracle.com/us/products/database/database-private-cloud-wp-360048.pdf

For those who have done consolidation via Virtualization platforms such as VMWare or OVM know its fairly straightforward and its a simple “drag and drop”, as I say. Similarly consolidation of many databases as separate database instances on platform is also fairly straightforward. Its the consolidation of many disparate schemas into a common database that makes things interesting. Couple of key points on “why schema consolidation” from the paper:

  • The schema consolidation model has consistently provided the most opportunities for reducing operating expenses, since you only have a single big database to maintain,monitor, mange and maintain.
  • Though schema consolidation allows the best ROI (w.r.t CapEX/OPex), you are sacrificing flexibility for compaction. As I’ve stated in my presentations and papers, “…consolidation and isolation move in opposite directions” The more you consolidate the less capabilities you’ll have for isolation; in contrast, the more you try to isolate, the more you sacrifice benefits of consolidation.
  • Custom (home-grown) apps have been best fit use cases for schema consolidation, since application owners and developers have more control on how the application and schema is built.

Well, with the 12c Oracle Database feature Pluggable Database (PDB) , you now have more incentive to lean towards the schema consolidation. PDB “begins” to eliminate the typical issues that come with schema consolidation; such as namespace collisions, security, granularity of recovery.

In this 1st part of the three part series on PDB, I’ll illustrate the installation of the 12c Database with Pluggable Database feature. The next upcoming parts of the series will cover management and user isolation (security) with PDB.

But first a very, very high-level primer on terminology:

  • Root Container Database – Or the root CDB (cdb$root) is the real database (if you will), and the name you give it will be name of the instance. The CDB database owns the SGA and running processes. I can have many CDBs on the same database server (each with its own PDBs). But the cool thing is that you can have a more than one CDB, allowing DBAs to have a Database Instance consolidation model coupled a schema consolidation. For best scalability, mix in RAC and leverage all the benefits of RAC Services, QoS, and Workload Distribution. The seed PDB (PDB$SEED) is a Oracle supplied system template that the CDB can use to create new PDBs. The seed PDB is named PDB$SEED. One cannot add or modify objects in PDB$SEED.
  • Pluggable Database – The PDB databases are sub-containers that serviced by CDB resources. The true beauty of the PDB is its mobility; i.e., I can unplug and plug 12c databases into and out of CDBs. I can “create like” new PDBs from existing PDB, like full snapshots.

So, now I’ll illustrate the important/interesting and new screens of 12c Database Installer:

PDB12c 2013 08 19 17 22 42

We chose Server Class

PDB12c 2013 08 19 17 23 09

It will single instance ..for now 🙂

PDB12c 2013 08 19 17 23 37

Choose Advanced Install

PDB12c 2013 08 19 17 24 07

And now for the fun step. We choose a Enterprise Edition, as Pluggable Database feature is only available in EE

PDB12c 2013 08 19 17 24 47

The next couple of screens ask about the Oracle Home and Oracle Base location, nothing new, but look at screen for Step 11. This where the fun is. We specify the Database name, but also specify if we want to create a Container Database. If we check it, it allows us to create our first PDB database in the Container Database (CDB). In my example I speficied Yoda as my CDB name and (in keeping with Star Wars theme) I said PDB is PDBOBI

PDB12c 2013 08 19 17 27 19

We obviously choose ASM as the storage location

PDB12c 2013 08 19 17 28 18

And we have the opportunity to register EM Cloud Control this new target database.

PDB12c 2013 08 20 17 46 20

The rest of the steps/screens are standard stuff, so I won’t bore you with it. But here’s an excerpt from the database alert that shows magic underneath:

create pluggable database PDB$SEED as clone  using '/u02/app/oracle/product/12.1.0/dbhome_1/assistants/dbca/templates//pdbseed.xml'  source_file_name_convert = ('/ade/b/3593327372/oracle/oradata/seeddata/pdbseed/temp01.dbf','+PDBDATA/YODA/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/pdbseed_temp01.dbf',
'/ade/b/3593327372/oracle/oradata/seeddata/pdbseed/system01.dbf','+PDBDATA/YODA/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/system.271.823892297',
'/ade/b/3593327372/oracle/oradata/seeddata/pdbseed/sysaux01.dbf','+PDBDATA/YODA/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/sysaux.270.823892297') file_name_convert=NONE  NOCOPY
Mon Aug 19 18:58:59 2013
….
…. 
Post plug operations are now complete.
Pluggable database PDB$SEED with pdb id - 2 is now marked as NEW.


create pluggable database pdbobi as clone  using '/u02/app/oracle/product/12.1.0/dbhome_1/assistants/dbca/templates//sampleschema.xml'  source_file_name_convert = ('/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/temp01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/pdbobi_temp01.dbf',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/example01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/example.275.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/system01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/system.276.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/SAMPLE_SCHEMA_users01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/users.277.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/sysaux01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/sysaux.274.823892813') file_name_convert=NONE  NOCOPY
Mon Aug 19 19:07:42 2013
….
….
****************************************************************
Post plug operations are now complete.
Pluggable database PDBOBI with pdb id - 3 is now marked as NEW.
****************************************************************
Completed: create pluggable database pdbobi as clone  using '/u02/app/oracle/product/12.1.0/dbhome_1/assistants/dbca/templates//sampleschema.xml'  source_file_name_convert = ('/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/temp01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/pdbobi_temp01.dbf',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/example01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/example.275.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/system01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/system.276.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/SAMPLE_SCHEMA_users01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/users.277.823892813',
'/ade/b/3593327372/oracle/oradata/seeddata/SAMPLE_SCHEMA/sysaux01.dbf','+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/sysaux.274.823892813') file_name_convert=NONE  NOCOPY
alter pluggable database pdbobi open restricted
Pluggable database PDBOBI dictionary check beginning
Pluggable Database PDBOBI Dictionary check complete
Database Characterset is US7ASCII
….
….

XDB installed.

XDB initialized.
Mon Aug 19 19:08:01 2013
Pluggable database PDBOBI opened read write
Completed: alter pluggable database pdbobi open restricted

I will cover more of PDB creation and management in the next blog. But I’ll leave you with this teaser of DBCA screen:

PDB12c 2013 08 20 17 46 20

How to recover from an aborted GI Standalone Clusterware config …and live to tell about it

I generally don’t get time to play with single instance ASM since I in live the RAC world so much. But I needed to quickly create a 12c PDB configuration over ASM.
If you recall from the 11gR2 tho is straightforward install and configuration of Grid Infrustructure. However, there are cases where the install/config doesn’t go smoothly. This exactly what happened in my case. Not sure if was an user error or just a 12c bug (the code is barely in the field) or combination of the two.

In any case, what this blog is going to touch on is how you recover, or rather escape from a mal-configured/messed up 12c Grid Infrastructure for Standalone Cluster install.

Since everybody loves logs and traces, I’ll walk through some of the issues. First off the, installation of the software went w/o out a hitch, its the scarey-tentative “root.sh” that went wacko

Here’s the error message from GI alert log:

OHASD starting
Timed out waiting for init.ohasd script to start; posting an alert
OHASD exiting; Could not init OLR
OHASD stderr redirected to ohasdOUT.log

Here’s the trace info from ohasd.log:

2013-08-16 13:10:47.347: [ default][357553728] OHASD Daemon Starting. Command string :reboot
2013-08-16 13:10:47.347: [ default][357553728] OHASD params []
2013-08-16 13:10:47.662: [ default][357553728]
2013-08-16 13:10:47.662: [ default][357553728] Initializing OLR
2013-08-16 13:10:47.662: [ default][357553728]proa_init: OLR Abstraction layer initialization. Bootlevel:[1]
2013-08-16 13:10:47.670: [  OCRAPI][357553728]a_init: Successfully initialized the patch management context.
2013-08-16 13:10:47.670: [  OCRAPI][357553728]a_init: Successfully initialized the OLR specific states.
2013-08-16 13:10:47.670: [  OCRAPI][357553728]a_init:13: Clusterware init successful
2013-08-16 13:10:47.670: [  OCRAPI][357553728]a_init:15: Successfully initialized the Cache layer.
2013-08-16 13:10:47.670: [  OCRRAW][357553728]proprioo: opening OCR device(s)
2013-08-16 13:10:47.670: [  OCRRAW][357553728]proprioo: Successfully opened the non-ASM locations if configured.
2013-08-16 13:10:47.670: [  OCRRAW][357553728]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1)
2013-08-16 13:10:47.670: [  OCRRAW][357553728]proprioo: my id set: (799232119, 1028247821, 0, 0, 0)
2013-08-16 13:10:47.671: [  OCRRAW][357553728]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0)
2013-08-16 13:10:47.671: [  OCRRAW][357553728]proprioo: 2nd set: (0, 0, 0, 0, 0)
2013-08-16 13:10:47.671: [  OCRRAW][357553728]proprinit: Successfully initialized the I/O module (proprioini).
2013-08-16 13:10:47.671: [  OCRRAW][357553728]proprinit: Successfully initialized the backend handle (propribctx).
2013-08-16 13:10:47.671: [  OCRAPI][357553728]proa_init: Successfully initialized the Storage Layer.
2013-08-16 13:10:47.674: [  OCRAPI][357553728]proa_init: Successfully initlaized the Messaging Layer.

<---- everything okay this point

2013-08-16 13:10:47.698: [  OCRAPI][357553728]a_init:18!: Thread init unsuccessful : [24]
2013-08-16 13:10:47.742: [  CRSOCR][357553728] OCR context init failure.  Error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1]
2013-08-16 13:10:47.743: [ default][357553728] Created alert : (:OHAS00106:) :  OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1]
2013-08-16 13:10:47.743: [ default][357553728][PANIC] OHASD exiting; Could not init OLR
2013-08-16 13:10:47.743: [ default][357553728] Done.

2013-08-16 13:27:35.715: [ default][2626647616] Created alert : (:OHAS00117:) :  TIMED OUT WAITING FOR OHASD MONITOR
2013-08-16 13:27:35.716: [ default][2626647616] OHASD Daemon Starting. Command string :reboot
2013-08-16 13:27:35.716: [ default][2626647616] OHASD params []
2013-08-16 13:27:35.717: [ default][2626647616]
2013-08-16 13:27:35.717: [ default][2626647616] Initializing OLR
2013-08-16 13:27:35.717: [ default][2626647616]proa_init: OLR Abstraction layer initialization. Bootlevel:[1]
2013-08-16 13:27:35.724: [  OCRAPI][2626647616]a_init: Successfully initialized the patch management context.

2013-08-16 13:27:35.724: [  OCRAPI][2626647616]a_init: Successfully initialized the OLR specific states.
2013-08-16 13:27:35.724: [  OCRAPI][2626647616]a_init:13: Clusterware init successful
2013-08-16 13:27:35.724: [  OCRAPI][2626647616]a_init:15: Successfully initialized the Cache layer.
2013-08-16 13:27:35.724: [  OCRRAW][2626647616]proprioo: opening OCR device(s)
2013-08-16 13:27:35.724: [  OCRRAW][2626647616]proprioo: Successfully opened the non-ASM locations if configured.
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1)
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprioo: my id set: (799232119, 1028247821, 0, 0, 0)
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0)
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprioo: 2nd set: (0, 0, 0, 0, 0)
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprinit: Successfully initialized the I/O module (proprioini).
2013-08-16 13:27:35.725: [  OCRRAW][2626647616]proprinit: Successfully initialized the backend handle (propribctx).
2013-08-16 13:27:35.725: [  OCRAPI][2626647616]proa_init: Successfully initialized the Storage Layer.
2013-08-16 13:27:35.726: [  OCRAPI][2626647616]proa_init: Successfully initlaized the Messaging Layer.
2013-08-16 13:27:35.731: [  OCRMSG][2608776960]prom_listen: Failed to listen at endpoint [1]
2013-08-16 13:27:35.732: [  OCRMSG][2608776960]GIPC error [1] msg [gipcretFail]
2013-08-16 13:27:35.732: [  OCRSRV][2608776960]th_listen: prom_listen failed retval= 24, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))]
2013-08-16 13:27:35.732: [  OCRSRV][2626647616]th_init: Local listener did not reach valid state

			<---- This can mean some issue with network socket file location or permission.  

2013-08-16 13:27:35.732: [  OCRAPI][2626647616]a_init:18!: Thread init unsuccessful : [24]
2013-08-16 13:27:35.776: [  CRSOCR][2626647616] OCR context init failure.  Error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1]
2013-08-16 13:27:35.776: [ default][2626647616] Created alert : (:OHAS00106:) :  OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretFail] [1]
2013-08-16 13:27:35.776: [ default][2626647616][PANIC] OHASD exiting; Could not init OLR
2013-08-16 13:27:35.777: [ default][2626647616] Done.

After triaging a bit I think we had some wrong permissions on the directory strcutures, plus some hostname stuff wasn’t accurate. Hopefully that was it. Now let’s re-do the following to recover and move forward:

I first tried to stop HAS to ensure its not active

[root@pdb12c grid]# crsctl stop  has -f 
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.

[root@pdb12c grid]  cd /u01/app/oracle/product/12.1.0/grid/crs/install

Let's try to execute deconfig to fix the broken configuration:

[root@pdb12c install]# ./roothas.pl -deconfig -force
Using configuration parameter file: ./crsconfig_params
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Delete failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
2013/08/17 15:35:04 CLSRSC-357: Failed to stop current Oracle Clusterware stack during upgrade
2013/08/17 15:35:05 CLSRSC-180: An error occurred while executing the command '/etc/init.d/ohasd deinstall' (error code -1)
Failure in execution (rc=-1, 0, Inappropriate ioctl for device) for command /etc/init.d/ohasd deinstall
2013/08/17 15:35:05 CLSRSC-337: Successfully deconfigured Oracle Restart stack

Hopefully this deconfig and stop stack worked:

[

root@pdb12c install]# ps -ef|grep has
root      5679  2968  0 15:35 pts/0    00:00:00 grep has

Stack is down so let's re-run root.sh

[root@pdb12c grid]# ./root.sh
Performing root user operation for Oracle 12c 

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/oracle/product/12.1.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE 
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE 
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node pdb12c successfully pinned.
2013/08/17 15:35:41 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'
pdb12c     2013/08/17 15:35:58     /u01/app/oracle/product/12.1.0/grid/cdata/pdb12c/backup_20130817_153558.olr
2013/08/17 15:37:50 CLSRSC-327: Successfully configured Oracle Grid Infrastructure for a Standalone Server

Let’s verify this successful OLR initialization by looking at ohasd.log:

2013-08-17 15:35:46.993: [ default][2945685056] OHASD Daemon Starting. Command string :reboot
2013-08-17 15:35:46.993: [ default][2945685056] OHASD params []
2013-08-17 15:35:46.994: [ default][2945685056]
2013-08-17 15:35:46.994: [ default][2945685056] Initializing OLR
2013-08-17 15:35:46.994: [ default][2945685056]proa_init: OLR Abstraction layer initialization. Bootlevel:[1]
2013-08-17 15:35:46.998: [  OCRAPI][2945685056]a_init: Successfully initialized the patch management context.
2013-08-17 15:35:46.998: [  OCRAPI][2945685056]a_init: Successfully initialized the OLR specific states.
2013-08-17 15:35:46.998: [  OCRAPI][2945685056]a_init:13: Clusterware init successful
2013-08-17 15:35:46.998: [  OCRAPI][2945685056]a_init:15: Successfully initialized the Cache layer.
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: opening OCR device(s)
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: Successfully opened the non-ASM locations if configured.
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: for disk 0 (/u01/app/oracle/product/12.1.0/grid/cdata/localhost/pdb12c.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (1), lsn (1)
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: my id set: (799232119, 1028247821, 0, 0, 0)
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: 1st set: (799232119, 1028247821, 0, 0, 0)
2013-08-17 15:35:46.998: [  OCRRAW][2945685056]proprioo: 2nd set: (0, 0, 0, 0, 0)
2013-08-17 15:35:46.999: [  OCRRAW][2945685056]proprinit: Successfully initialized the I/O module (proprioini).
2013-08-17 15:35:46.999: [  OCRRAW][2945685056]proprinit: Successfully initialized the backend handle (propribctx).
2013-08-17 15:35:46.999: [  OCRAPI][2945685056]proa_init: Successfully initialized the Storage Layer.
2013-08-17 15:35:47.000: [  OCRAPI][2945685056]proa_init: Successfully initlaized the Messaging Layer.
2013-08-17 15:35:47.003: [  OCRAPI][2945685056]a_init:18: Thread init successful
2013-08-17 15:35:47.003: [  OCRAPI][2945685056]a_init:19: Client init successful
2013-08-17 15:35:47.003: [  OCRAPI][2945685056]a_init:21: OLR init successful. Init Level [1]

			<--- this is a good sign, but we still need to ensure OHASD starts up and initializes the CRS Policy Engine:
 
2013-08-17 15:35:47.003: [ default][2945685056] Checking version compatibility...
2013-08-17 15:35:47.003: [ default][2945685056]clsvactversion:4: Retrieving Active Version from local storage.
2013-08-17 15:35:47.004: [ default][2945685056] Version compatibility check passed:  Software Version: 12.1.0.1.0 Release Version: 12.1.0.1.0 Active Version: 12.1.0.1.0
2013-08-17 15:35:47.004: [ default][2945685056] Running mode check...
2013-08-17 15:35:47.004: [ default][2945685056] OHASD running as the Non-Privileged user

			<--- this is a also good sign, getting there...

2013-08-17 15:35:47.190: [   CRSPE][2889111296] {0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running]
			<--- PE is running, getting there some more...

2013-08-17 15:35:47.190: [   CRSPE][2889111296] {0:0:2} Processing pending join requests: 1
2013-08-17 15:35:47.190: [UiServer][2413815552] UI comms listening for GIPC events.
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} Special Value map for : pdb12c
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} CRS_CSS_NODENAME=pdb12c
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} CRS_CSS_NODENUMBER=0
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} CRS_CSS_NODENUMBER_PLUS1=1
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} CRS_HOME=/u01/app/oracle/product/12.1.0/grid
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} Server Attributes for : pdb12c
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} ACTIVE_CSS_ROLE=UNAVAILABLE
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} CONFIGURED_CSS_ROLE=
2013-08-17 15:35:47.191: [   CRSPE][2889111296] {0:0:2} Server [pdb12c] has been registered with the PE data model
2013-08-17 15:35:47.191: [    AGFW][2899617536] {0:0:2} Agfw Proxy Server received the message: PE_HANDHSAKE[Proxy] ID 20487:33
2013-08-17 15:35:47.191: [    AGFW][2899617536] {0:0:2} Received handshake message from PE.

		<--- PE is initialized, getting even closer...

2013-08-17 15:35:47.192: [    AGFW][2899617536] {0:0:2} Added resource type: application
2013-08-17 15:35:47.192: [    AGFW][2899617536] {0:0:2} Added resource type: cluster_resource
2013-08-17 15:35:47.192: [    AGFW][2899617536] {0:0:2} Added resource type: generic_application
2013-08-17 15:35:47.192: [    AGFW][2899617536] {0:0:2} Added resource type: local_resource

I think the basic GI stack is there, let's verify:

[root@pdb12c ohasd]# ps -ef|grep oracle
root      3241  3232  0 14:23 pts/2    00:00:00 su - oracle
oracle    3242  3241  0 14:23 pts/2    00:00:00 -bash
oracle    5959     1  0 15:35 ?        00:00:02 /u01/app/oracle/product/12.1.0/grid/bin/ohasd.bin reboot
oracle    6067     1  0 15:35 ?        00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/oraagent.bin
oracle    6080     1  0 15:35 ?        00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/evmd.bin
oracle    6154  6080  0 15:35 ?        00:00:00 /u01/app/oracle/product/12.1.0/grid/bin/evmlogger.bin -o /u01/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.log


Check crsctl stat res

[root@pdb12c ohasd]# crsctl stat res -init -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
               OFFLINE OFFLINE      pdb12c                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       pdb12c                   STABLE
--------------------------------------------------------------------------------
Odd that CSS hasn't started yet, and even odder that ASM is not instantiated.  UGH!!
Now we have to do some stitch work.  Again, all this is unnecessary, its because  installer didn't finish its work that we have do this.
So, let's add Local Listener and ASM. In this order !!

[oracle@pdb12c grid]$ srvctl add listener

[oracle@pdb12c grid]$ srvctl config ons
ONS exists: Local port 6100, remote port 6200, EM port 2016
[oracle@pdb12c grid]$ srvctl config listener
Name: LISTENER
Home: /u01/app/oracle/product/12.1.0/grid

[oracle@pdb12c grid]$ srvctl add asm


[oracle@pdb12c grid]$ srvctl config asm
ASM home: /u01/app/oracle/product/12.1.0/grid
Password file: 
ASM listener: LISTENER
Spfile: 
ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++

<--- Notice that ASM has no SPFile associated with yet, but we still start it with default parameters

[oracle@pdb12c grid]$ srvctl start asm

[oracle@pdb12c asmca]$ ps -ef|grep asm
root     51260     2  0 15:37 ?        00:00:00 [asmWorkerThread]
root     51261     2  0 15:37 ?        00:00:00 [asmWorkerThread]
root     51262     2  0 15:37 ?        00:00:00 [asmWorkerThread]
root     51263     2  0 15:37 ?        00:00:00 [asmWorkerThread]
root     51264     2  0 15:37 ?        00:00:00 [asmWorkerThread]
oracle   53092     1  0 16:26 ?        00:00:00 asm_pmon_+ASM
oracle   53094     1  0 16:26 ?        00:00:00 asm_psp0_+ASM
oracle   53096     1  3 16:26 ?        00:00:01 asm_vktm_+ASM
oracle   53100     1  0 16:26 ?        00:00:00 asm_gen0_+ASM
oracle   53102     1  0 16:26 ?        00:00:00 asm_mman_+ASM
oracle   53106     1  0 16:26 ?        00:00:00 asm_diag_+ASM
oracle   53108     1  0 16:26 ?        00:00:00 asm_dia0_+ASM
oracle   53110     1  0 16:26 ?        00:00:00 asm_dbw0_+ASM
oracle   53112     1  0 16:26 ?        00:00:00 asm_lgwr_+ASM
oracle   53115     1  0 16:26 ?        00:00:00 asm_ckpt_+ASM
oracle   53117     1  0 16:26 ?        00:00:00 asm_smon_+ASM
oracle   53119     1  0 16:26 ?        00:00:00 asm_lreg_+ASM
oracle   53121     1  0 16:26 ?        00:00:00 asm_rbal_+ASM
oracle   53123     1  0 16:26 ?        00:00:00 asm_gmon_+ASM
oracle   53125     1  0 16:26 ?        00:00:00 asm_mmon_+ASM
oracle   53127     1  0 16:26 ?        00:00:00 asm_mmnl_+ASM


Now run asmca to create the disk group using ASMCA

PDB12c 2013 08 17 16 34 15


Let's check crsctl stat res again

[oracle@pdb12c asmca]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       pdb12c                   STABLE
ora.PDBDATA.dg
               ONLINE  ONLINE       pdb12c                   STABLE
ora.asm
               ONLINE  ONLINE       pdb12c                   Started,STABLE
ora.ons
               OFFLINE OFFLINE      pdb12c                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       pdb12c                   STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       pdb12c                   STABLE
--------------------------------------------------------------------------------

Cool, we now have majority of the stack GI stack started!!  Also notice that the PDBDATA disk group resource got created auto magically when the disk group was created and mounted (this was the same case in 11gR2)

But I wonder how cssd got placed in the ONLINE state, we didn't change that state directly:
The answer to that has to do with the startup dependencies for ASM resource.  In this case its "hard pull-up" dependency, also ASM has a "weak" dependency on Listener, so that got Onlined too.  We can see that from  crsctl stat res ora.asm -p

[oracle@pdb12c asmca]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
….
….
START_DEPENDENCIES=hard(ora.cssd) weak(ora.LISTENER.lsnr)

….
….

STOP_DEPENDENCIES=hard(ora.cssd)



Note ASM is using the basic/generic init.ora file.  So let's create a real usable one:

cat $HOME/init+ASM.ora
sga_target=1536M
asm_diskgroups='PDBDATA'
asm_diskstring='/dev/sd*'
instance_type='asm'
remote_login_passwordfile='EXCLUSIVE'

SQL> create spfile='+PDBDATA' from pfile='$HOME/init+ASM.ora'
  2  ;

File created.
To really validate that GI stack is up and running and that ASM is cool, just for fun let's create an ACFS filesystem.  This validates the communication between layers of HAS/CRS stack, ASM as well the Policy Engine:

PDB12c 2013 08 17 16 37 48

...

PDB12c 2013 08 17 16 39 49

[root@pdb12c ohasd]# mkdir -p /u01/app/oracle/acfsmounts/pdbdata_pdbvol1
[root@pdb12c ohasd]# 
[root@pdb12c ohasd]# /bin/mount -t acfs /dev/asm/pdbvol1-339 /u01/app/oracle/acfsmounts/pdbdata_pdbvol1

[root@pdb12c ohasd]# df -ha
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_12crac1-lv_root
                       19G   12G  5.8G  67% /
proc                     0     0     0   -  /proc
sysfs                    0     0     0   -  /sys
devpts                   0     0     0   -  /dev/pts
tmpfs                 1.4G  637M  792M  45% /dev/shm
/dev/sda1             485M   55M  405M  12% /boot
/dev/asm/pdbvol1-339  1.0G   41M  984M   4% /u01/app/oracle/acfsmounts/pdbdata_pdbvol1

ANd there you have it!  A re-sticthed back GI stack.  Again, I hope nobody has to go through that, but now at least you know!!

Now next step is to create the PDB database over ASM!!

A 12c Flex Cluster Installation Walk-thru

Let’s start the 12c Flex install, with the execution of the traditional runInstaller script

Rac12c1 2013 07 30 22 55 32

Let’s choose Install 12c Flex Cluster

Rac12c1 2013 07 30 22 56 04

And yes we picking English… since we live in English-land

Rac12c1 2013 07 30 22 56 29

Now let’s specify the Scan information and yes, we’ll need to define GNS and since we have to use GNS, we’ll need to get DNS domain delegation setup. In our case we have us.viscosityna-test.com as the sub-domain

Rac12c1 2013 07 30 22 57 07

This is the new stuff!! We define which nodes in the cluster will be Hubs and which will be Leaf. Note, you’ll ocassionaly hear the terms Hub and RIM be used interchangeably. It just historical!

Rac12c1 2013 07 30 22 57 42

Let’s specify the interfaces. You all have seen this screen before. But it now got a small twist to it. You can specify a separate “ASM &Private” networks.

Rac12c1 2013 07 30 22 58 32

Now the validation!

Rac12c1 2013 07 30 23 01 02

This step is new too. You have the option to configure Grid Infrastructure Repository,
which is used for storing Cluster Health Monitor (CHM) data. In 11gR2 this was stored
in a Berkley DB database and was created by default. Now this option allows users to
specify a Oracle Database to store the CHM data. This database is a single instance
database that is named MGMTDB by default. It is an internal CRS resource, which has
HA-failover capabilities. I’ll cover this topic in more detail later, but I should
mention that this is the only opportunity to create this repository; i,e, you have
uninstall/reinstall to get this GI repos option.

Rac12c1 2013 07 30 23 00 38

Now the fun stuff! Let’s create the ASM disk group. Note, that if you are configuring a GI repo, then you’ll need a minimum of 5GB disk (for testers and laptop folks).

Rac12c1 2013 07 31 17 47 57

Now define passwords

Rac12c1 2013 07 31 17 48 32

Verify…and yes we’re cool w/ the passwords

Rac12c1 2013 07 31 17 49 00

No IPMI

Rac12c1 2013 07 31 17 49 32

No define the group definitions

Rac12c1 2013 07 31 17 50 06

Where we gonna put the Oracle Home and Oracle Base

Rac12c1 2013 07 31 17 52 41

Now this really cool. I can specify the root password/credentials, for downstream root required actions.

Rac12c1 2013 07 31 18 02 05

Gotta run some fix some things, execute fixup.sh script

Rac12c1 2013 07 31 21 43 21

Now off to the races !!!

Rac12c1 2013 07 31 23 33 54

Secret Agentman – Clusterware Processes and Agents

Some of you may old enough to recall the song “Secret Agent Man” from Johnny Rivers:
There’s a man who leads a life of danger.
To everyone he meets he stays a stranger.
With every move he makes another chance he takes.
Odds are he won’t live to see tomorrow.

Well that’s how I felt when I was at a customer site recently (well maybe not exactly).

They recently had a issue with a node eviction. That in itself deserves a blog post later.
But anyways, he was asking “what are all these Clusterware processes and how do you even traverse through all the log files”.
After 15 mins of discussion, I realized I had thoroughly confused him.
So I suggested we start from the beginning and firstly try to understand Oracle Clusterware processes, agents, and relationships, then draw up some pictures. Maybe then we’ll have a better feel for hierarchy.

Let’s start with the grand master himself HAS (or OHASD)

OHASD manages clusterware daemons, including CRSD. We’ll discuss CRSD resources and startp in another blog. For now just keep in mind that OHASD starts up CRSD (at some point later in the stack), once CRSD is started, it manages the remaining startup of the stack

The “-init flag” is needed for crsctl to operate on OHASD resources,e.g. crsctl stat res ora.crsd -init
To list resources started by CRSD you would issue just “crsctl stat res”

OHASD resource startup order
ora.gipcd
ora.gpnpd -> Starts ora.mdnsd because of dependency
ora.cssd -> Starts ora.diskmon and ora.cssdmonitor because of dependency
ora.ctssd
ora.evmd
ora.crsd

OHASD has agents that work for him. These agents are oraagent, orarootagent, cssdagent and cssdmonitoragent. Each agent manages and handles very specific OHASD resources, and each agent runs as a specific user (root or, clusterware user).
For example, the ora.cssd resource (as root user) is started and monitored by the ora.cssdagent, whereas ora.asm is handled by the oraagent (running as cluster ware user).

All agent as well as other OHASD resource log files are in the CRS $ORACLE_HOME/log/hostname/agent/{ohasd|crsd}/agentname_owner/agentname_owner.log or in CRS $ORACLE_HOME/log/hostname/resource_name/resource_name.log; respectively.

To find out which agent is associated with a resource issue the following:

[root@rhel59a log]# crsctl stat res ora.cssd -init -p |grep “AGENT_FILENAME”
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%

For example, for CRSD we find:

[root@rhel59a bin]# crsctl stat res ora.crsd -init -p |grep “AGENT_FILENAME”
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%

Note, an agent log file can have log messages for more than one resources, since those resources are managed by the same agent.

When I debug a resource, I start by going down the following Clusterware log file tree:
1. Start with Clusterware alert.log

2. Depending on the resource (managed by OHASD or CRSD) I look $ORACLE_HOME/logs//ohasd/ohasd.log or $ORACLE_HOME/logs//crsd/crsd.log

3. Then agent log file, as I mentioned above

4. Then finally to the resources log file itself (that’ll be listed in the agent log)

Item #2 requires a little more discussion, and will be the topic of our next discussion

New [rarely covered and rarely discussed] 12c features

With busy weeks of IOUG and other conferences coming up, we have little time to blog…. So, in the coming weeks, I’m just going to do some “baby” blogs; i.e., some quick tips and new features

Here’s a new 12c new feature that simplifies snapshotting databases

Snapshot Optimized Recovery

There’s many of you that take snapshot copies of database, either via server-side snapshot tools or using storage level snapshots. Usually this required a cold database or putting the database in hot-backup mode. However, there are downsides to both options

In Oracle 12c, third-party snapshots technologies that meet the following requirements can be taken without requiring the database to be placed in backup mode:

Database is crash consistent at the point of the snapshot.
Write ordering is preserved for each file within a snapshot.
Snapshot stores the time at which a snapshot is completed.

The new RECOVER SNAPSHOT TIME command is introduced to recover a snapshot to a consistent point, without any additional manual procedures for point-in-time recovery needs.
This command performs the recovery in a single step. Recovery can be either to the current time or to a point in time after the snapshot was taken

Though there is a bit upfront overhead; e.g.,additional redo logging and a complete database checkpoint.

My new Favorite RAC-Clusterware command

My new favorite 12c Oracle Clusterware command is the 'crsctl stat res "resource name" -dependency'

What this command does, is to provide a dependency tree structure for resource the in question.  This will display startup (default) and shutdown dependencies.  

From this we can understand the pull-up, pushdown, weak, and hard dependencies between clusterware resources 


[oracle@rac02 ~]$ crsctl stat res ora.dagobah.db -dependency
================================================================================
Resource Start Dependencies
================================================================================
---------------------------------ora.dagobah.db---------------------------------
ora.dagobah.db(ora.database.type)->
| type:ora.listener.type[weak:type]
| | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | ora.net1.network(ora.network.type)[hard,pullup]
| | | ora.gns<Resource not found>[weak:global]
| type:ora.scan_listener.type[weak:type:global]
| | ora.scan1.vip(ora.scan_vip.type)[hard,pullup]
| | | ora.net1.network(ora.network.type)[hard,pullup:global]
| | | ora.gns<Resource not found>[weak:global]
| | | type:ora.scan_vip.type[dispersion:type:active]
| | type:ora.scan_listener.type[dispersion:type:active]
| ora.ons(ora.ons.type)[weak:uniform]
| | ora.net1.network(ora.network.type)[hard,pullup]
| ora.gns<Resource not found>[weak:global]
| ora.PDBDATA.dg(ora.diskgroup.type)[weak:global:uniform]
| | ora.asm(ora.asm.type)[hard,pullup:always]
| | | ora.LISTENER.lsnr(ora.listener.type)[weak]
| | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | | | ora.net1.network(ora.network.type)[hard,pullup]
| | | | | ora.gns<Resource not found>[weak:global]
| | | ora.ASMNET1LSNR_ASM.lsnr(ora.asm_listener.type)[hard,pullup]
| | | | ora.gns<Resource not found>[weak:global]
| ora.FRA.dg(ora.diskgroup.type)[hard:global:uniform,pullup:global]
| | ora.asm(ora.asm.type)[hard,pullup:always]
| | | ora.LISTENER.lsnr(ora.listener.type)[weak]
| | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | | | ora.net1.network(ora.network.type)[hard,pullup]
| | | | | ora.gns<Resource not found>[weak:global]
| | | ora.ASMNET1LSNR_ASM.lsnr(ora.asm_listener.type)[hard,pullup]
| | | | ora.gns<Resource not found>[weak:global]
--------------------------------------------------------------------------------

Now the same for shutdown (pushdown) dependencies

[oracle@rac02 ~]$ crsctl stat res ora.dagobah.db -dependency -stop
================================================================================
Resource Stop Dependencies
================================================================================
---------------------------------ora.dagobah.db---------------------------------
ora.dagobah.db(ora.database.type)->
| ora.dagobah.hoth.svc(ora.service.type)[hard:intermediate]
| ora.dagobah.r2d2.svc(ora.service.type)[hard:intermediate]
--------------------------------------------------------------------------------

Why is this command and output important?  Well, in cases where a particular resource doesn't come up, you may want to understand relationship with its dependents
The reason is, if you are creating your own resource dependencies using the CRS API (formally known as CLSCRS API).

<pre>CLSCRS is a set of C-based APIs for Oracle Clusterware. The CLSCRS APIs enable you to manage the operation of entities that are managed by Oracle Clusterware. These entities include resources, resource types, servers, and server pools. You can use the APIs to register user applications with Oracle Clusterware so that the clusterware can manage them and maintain high availability. Once an application is registered, you can manage, monitor and query the application's status.  The APIs allow you to use the callbacks for diagnostic logging.

</pre>

Oh..GUID of PDB World


We have done a lot talks, sessions, blogs on Oracle 12c Pluggable Databases (PDB).  The question that seems to come up a lot is what is this long string of alphanumeric characters embedded in the database file name.

Before we answer that, let's take a trip down memory lane and understand how OMF works and its relationship with this GUID

Oracle Managed Files (OMF) was a feature introduced in 9i to minimize the overhead of managing database files.  Part of this feature is the database automatic naming of database files (on successful file creation). 
Files are named using system generated names and placed in the location as defined by the DB_CREATE_FILE_DEST init.ora .  In Data Guard configurations there are *_FILE_NAME_CONVERT, STANDBY_FILE_MANAGEMENT to assist with converting names of existing and newly created datafiles when OMF is in use.


OMF really came into play with widespread use, because of the implementation of ASM.  When ASM is used, OMF is inherently used for file management.  The ASM-OMF directory structure for datafiles traditionally consists of //DATAFILE/.  A traditional file name in ASM consists of 3 parts, ...  For example:


+PDBDATA/YODA/DATAFILE/system.258.8238921091

Note,Users are not allowed to directly create files with this naming structure, if you try you'll get a single-form file name error ORA-15046!


So what does this GUID thingy mean for 12c PDB configurations with ASM-OMF.  In addition to the OMF file naming and directory structure (discussed above), there is an embedded global unique identifier (GUID).  The GUID is globally unique immutable ID assigned to the 12c database at creation time.  Each 12c database, whether its non-CDB, CDB, or PDB, has a GUID associated with it.  Thus, with PDB, the directory structure changes for each pluggable database (PDB) in a container database (CDB). 


For pre-12c non-CDB databases, the GUID will be created when the database is upgraded to 12c.  


There are so many identifiers for a 12c database, let's make sure we get this straight. There's dbid, con_id, con_uid, and guid. The DBID is the database id embedded in the database file, control file, redo log header. The con_id is simply a container number in that specific CDB, starts with 0 and 1 is for root PDB. The con_uid is a local unique identifier within that CDB. The GUID is universal across all CDB/PDB.  Keep in mind that we can unplug a PDB from one CDB into another CDB, so the GUID provides this uniqueness and streamlines portability. More on this later!


The following query shows the different 12c database identifiers:

CDB$ROOT@YODA> select CON_ID,DBID,NAME,TOTAL_SIZE from v$pdbs;    
CON_ID      DBID     NAME                     TOTAL_SIZE
---------- ---------- -------------          -------------      
2    4066465523 PDB$SEED                      283115520      
3     483260478 PDBOBI                        917504000      
4     994649056 PDBVADER                              0

Note, that the GUID does not change throughout the life of the PDB/non-CDB. The GUID for a particular container/non-CDB can be found by querying V$CONTAINERS or v$PDBs. To assist with identifying which files belong to which PDB, an ASM directory structure of ///DATAFILE/ is used for PDBs. This is one of the main reasons a PDB should be cloned (cloning generates a new GUID) rather than copying the same PDB to multilple locations and plugging in to multiple CDBs.

See the example below, for GUID:

 
CDB$ROOT@YODA> select name, con_id from v$datafile order by con_id
NAME                                                                                    CON_ID
----------------------------------------------------------------------------------- ----------
+PDBDATA/YODA/DATAFILE/undotbs1.260.823892155                                                1
+PDBDATA/YODA/DATAFILE/sysaux.257.823892063                                                  1
+PDBDATA/YODA/DATAFILE/system.258.823892109                                                  1
+PDBDATA/YODA/DATAFILE/users.259.823892155                                                   1
+PDBDATA/YODA/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/system.271.823892297                 2
+PDBDATA/YODA/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/sysaux.270.823892297                 2
+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/example.275.823892813                3
+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/users.277.823892813                  3
+PDBDATA/YODA/E456D87DF75E6553E043EDFE10AC71EA/DATAFILE/obiwan.284.824683339                 3
+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/system.276.823892813                 3
+PDBDATA/YODA/DD7D8C1D4C234B38E04325AAE80AF577/DATAFILE/sysaux.274.823892813                 3
+PDBDATA/YODA/E46B24386A131109E043EDFE10AC6E89/DATAFILE/sysaux.279.823980769                 4
+PDBDATA/YODA/E46B24386A131109E043EDFE10AC6E89/DATAFILE/users.281.823980769                  4
+PDBDATA/YODA/E46B24386A131109E043EDFE10AC6E89/DATAFILE/example.282.823980769                4
+PDBDATA/YODA/E46B24386A131109E043EDFE10AC6E89/DATAFILE/system.280.823980769                 4

That long identifier, "E46B24386A131109E043EDFE10AC6E89", in the OMF name, is the GUID.

Now a similar example from ASM (asmcmd) perspective

ASMCMD [+PDBDATA] > ls -l dagobah
Type           Redund  Striped  Time             Sys  Name
                                                 Y    CONTROLFILE/
                                                 Y    DATAFILE/
                                                 Y    DD7C48AA5A4404A2E04325AAE80A403C/
                                                 Y    F2F952556B226FA5E0430B2910AC1FE5/
                                                 Y    ONLINELOG/
                                                 Y    PARAMETERFILE/
                                                 Y    PASSWORD/
                                                 Y    TEMPFILE/
PASSWORD       UNPROT  COARSE   FEB 21 23:00:00  N    orapwdagobah => +PDBDATA/DAGOBAH/PASSWORD/pwddagobah.293.840152893
PARAMETERFILE  UNPROT  COARSE   APR 09 15:00:00  N    spfiledagobah.ora => +PDBDATA/DAGOBAH/PARAMETERFILE/spfile.310.840153477

ASMCMD [+PDBDATA/TATOOINE/F2F7CA2C1F1F0593E0430A2910AC246A/datafile] > ls
SYSAUX.258.840146475
SYSTEM.286.840146475

Let's look at two examples of PDB creation and the GUID.
Example1. This example illustrates the PDB creation and GUID


SQL> CREATE PLUGGABLE DATABASE pdbhansolo admin user hansolo identified by hansolo roles=(dba);

Pluggable database created.

SQL> select * from v$pdbs ;


    CON_ID       DBID    CON_UID GUID                             NAME      OPEN_MODE  RES     OPEN_TIME             CREATE_SCN TOTAL_SIZE
---------- ---------- ---------- -------------------------------- ------------------- ------   -----------           ----------- ------------
         2 4080865680 4080865680 F13EFFD958E24857E0430B2910ACF6FD PDB$SEED   READ ONLY  NO  17-FEB-14 01.01.13.909 PM   1720768  283115520
         3 3403102439 3403102439 F2A023F791663F8DE0430B2910AC37F7 PDBHANSOLO MOUNTED        17-FEB-14 01.27.08.942 PM   1846849          0

Example two. Here, we are going to plug in (convert) a PDB from a non-CDB. Note, that we can see the GUID in manifest file. In the XML output below (from manifest xml file), you see the GUID listed for this non-CBD

<PDB>
  <pdbname>wookie</pdbname>
  <cid>0</cid>
  <byteorder>1</byteorder>
  <vsn>202375168</vsn>
  <dbid>2940614436</dbid>
  <cdbid>2940614436</cdbid>
  <guid>F2BBDF340FFE3E90E0430B2910AC097F</guid>

Now connect to the CDB and create the Wookie PDB from the manifest file

CDB_SQL>CREATE PLUGGABLE DATABASE wookie USING '/home/oracle/wookie_pdb.xml'
  NOCOPY;


Pluggable database created.

SQL> select name, open_mode from v$pdbs;


NAME                           OPEN_MODE
------------------------------ ----------
PDB$SEED                       READ ONLY
WOOKIE                         MOUNTED

SQL> select name, guid from v$pdbs;


NAME                           GUID
------------------------------ --------------------------------
PDB$SEED                       F13EFFD958E24857E0430B2910ACF6FD
WOOKIE                         F2BBDF340FFE3E90E0430B2910AC097F

Here's where the big issue comes in. Many DBAs have mentioned to me that there is no real way to identify the PDB by solely looking at the path name. We do however, know the name of the CDB its in, but that's as far as we can go. In order to determine the PDB associated with the file, you would need to login directly to PDB (not even the CDB), and get the name

Initially there are some issues w/ GUID/OMF/ASM when are files are copied, and a physical standby database is in place. There have been improvements made to the multitenant plugin operation on both the primary and standby environments, e.g., You need PSU 2 at least, then DataGuard will do the right thing when you plug on a new pdb at the primary after making sure the files are at the standby first. RMAN has been enhanced so that, when copying files between databases it recognizes the GUID and acts accordingly when writing the files.

Here are some additional RMAN considerations for GUID management

* If the clone/auxiliary instance being connected to for clone operations is a CDB root, the GUID of the RMAN target database is used to determine the directory structure to write the datafiles. Connect to the CDB root as the RMAN clone/auxiliary instance when the source database should be a 12c non-CDB or PDB that is going to be migrated and plugged into a remote CDB as a brand new PDB. This will ensure that the files copied by RMAN will be written to the GUID directory of source database for the migration.
* If the clone/auxiliary instance being connected to for clone operations is a PDB, the GUID of the auxiliary PDB will be used to determine the directory structure to write the datafiles. Connect to the destination PDB as the RMAN clone auxiliary instance when the source database is a 12c non-CDB or PDB that requires a cross platform full transportable database import and the data and files will be imported into an existing PDB. This will ensure the files copied by RMAN will be written to the GUID directory of the PDB target database for the migration.

* The enhancements for multitenant plugin operations with OMF simplify the process extensively. The manifest generated on the source non-CDB/PDB contains all of the filenames and characteristics about each file. Normally, the plugin operation would use the filenames in the manifest and look for those exact filenames or partially converted (using the SOURCE_FILE_NAME_CONVERT clause on the CREATE PLUGGABLE DATABASE....USING...statement). Since all filenames will be different when copied to a new location when OMF is used, you would need to specify full directory and filename convert pairs for EACH file being plugged in. By using the SOURCE_FILE_DIRECTORY clause on the CREATE PLUGGABLE DATABASE....USING... statement in the plugin operation, the filename in the manifest is ignored and the plugin looks for a file to match additional characteristics about the file stored in the manifest, looking for the file in the SOURCE_FILE_DIRECTORY location.

12c New Feature – Multiple Flash Device Support for Database Smart Flash Cache

Not sure anyone has seen or know about this 12c feature, it allows the grouping/aggregation of flash devices into a volume, I mentioned it during a SSD discussion at IOUG

Essentially, this feature allows a database instance to access and combine multiple flash devices for Database Smart Flash Cache without the need for a volume manager, this way you no longer need to incur the expense or management overhead of a logical volume manager in order to use multiple flash devices for Database Smart Flash Cache.

For example, Flash devices /dev/sdj, /dev/sdk, and /dev/sdl, you can set the following init.ora parameter for aggregation:

DB_FLASH_CACHE_FILE = /dev/sdj, /dev/sdk, /dev/sdl

DB_FLASH_CACHE_SIZE = 32G, 32G, 64G

The V$FLASHFILESTAT view can be used to determine the cumulative latency and read counts of each file and compute the average latency.

You can use ALTER SYSTEM to set DB_FLASH_CACHE_SIZE to zero for each flash device you wish to disable. You can also use ALTER SYSTEM to set the size for any disabled flash device back to its original size to reenable it. However, dynamically changing the size of Database Smart Flash Cache is not supported.