Secret Agentman – Clusterware Processes and Agents

Some of you may old enough to recall the song “Secret Agent Man” from Johnny Rivers:
There’s a man who leads a life of danger.
To everyone he meets he stays a stranger.
With every move he makes another chance he takes.
Odds are he won’t live to see tomorrow.

Well that’s how I felt when I was at a customer site recently (well maybe not exactly).

They recently had a issue with a node eviction. That in itself deserves a blog post later.
But anyways, he was asking “what are all these Clusterware processes and how do you even traverse through all the log files”.
After 15 mins of discussion, I realized I had thoroughly confused him.
So I suggested we start from the beginning and firstly try to understand Oracle Clusterware processes, agents, and relationships, then draw up some pictures. Maybe then we’ll have a better feel for hierarchy.

Let’s start with the grand master himself HAS (or OHASD)

OHASD manages clusterware daemons, including CRSD. We’ll discuss CRSD resources and startp in another blog. For now just keep in mind that OHASD starts up CRSD (at some point later in the stack), once CRSD is started, it manages the remaining startup of the stack

The “-init flag” is needed for crsctl to operate on OHASD resources,e.g. crsctl stat res ora.crsd -init
To list resources started by CRSD you would issue just “crsctl stat res”

OHASD resource startup order
ora.gpnpd -> Starts ora.mdnsd because of dependency
ora.cssd -> Starts ora.diskmon and ora.cssdmonitor because of dependency

OHASD has agents that work for him. These agents are oraagent, orarootagent, cssdagent and cssdmonitoragent. Each agent manages and handles very specific OHASD resources, and each agent runs as a specific user (root or, clusterware user).
For example, the ora.cssd resource (as root user) is started and monitored by the ora.cssdagent, whereas ora.asm is handled by the oraagent (running as cluster ware user).

All agent as well as other OHASD resource log files are in the CRS $ORACLE_HOME/log/hostname/agent/{ohasd|crsd}/agentname_owner/agentname_owner.log or in CRS $ORACLE_HOME/log/hostname/resource_name/resource_name.log; respectively.

To find out which agent is associated with a resource issue the following:

[root@rhel59a log]# crsctl stat res ora.cssd -init -p |grep “AGENT_FILENAME”

For example, for CRSD we find:

[root@rhel59a bin]# crsctl stat res ora.crsd -init -p |grep “AGENT_FILENAME”

Note, an agent log file can have log messages for more than one resources, since those resources are managed by the same agent.

When I debug a resource, I start by going down the following Clusterware log file tree:
1. Start with Clusterware alert.log

2. Depending on the resource (managed by OHASD or CRSD) I look $ORACLE_HOME/logs//ohasd/ohasd.log or $ORACLE_HOME/logs//crsd/crsd.log

3. Then agent log file, as I mentioned above

4. Then finally to the resources log file itself (that’ll be listed in the agent log)

Item #2 requires a little more discussion, and will be the topic of our next discussion