What’s with MGTDB anyways

For those who have either upgraded or fresh-installed 12.1 (12c) Grid Infrastructure stack, will notice a new database instance (-MGMTDB) that was provisioned automagically. So what is this MGMTDB and why do I need this overhead.

Si let’s recap what the DB is and what it does…
Management Database is the central repository to store Cluster Health Monitor, the Grid Infrastructure Management Repository.

MGMT database is a container database (CDB) with one pluggable database (PDB) running. However, this database runs out of the Grid Infrastructure home.
The MGMTDB is a Rac One Node database; i.e., it runs on one node at a time, but because this is Clustered Resource, it can be started or failed over on any node in the cluster. MGMTDB is as a non-critical component of the GI stack (with no “real” hard dependencies). This means that if MGMTDB fails or becomes unavailable, Grid Infrastructure continues running

MGMTDB is configured (subject to change) with 750 MB SGA/325 MB PGA, and 5GB database size. But note that, due to the footprint MGMT’s SGA is not configured for hugepages . Since, this database is dynamically created on install, the OUI installer does not have pre-knowledge of the database that are configured or will be migrated to this cluster, thus in order to avoid any database names conflict the name “-MGMTDB” was chosen (notice the “-“). Note, bypassing MGMTDB installation is only allowed for upgrades to New installations or upgrades to future releases will require MGMTDB to be installed. if MGMTDB is not selected during upgrade, all features (Cluster Health Monitor (CHM/OS) etc) that depend on it will be disabled.

So if you are wondering where the datafiles and other structures are stored for this database. Well they would will be stored in the same diskgroup as OCR and VOTE However, these dtabase files can be migrated into ASM diskgroup post install.

MGMTDB will store a subset of Operating System (OS) performance data for longer term to provide diagnostic information and support intelligent workload management. Performance data (OS metrics similar to, but a subset of Exawatcher) collected by the ‘Cluster Health Monitor’ (CHM) is stored also on local disk, so when not using MGMTDB, CHM data can still be obtained from local disk but intelligent workload management (QoS) will be disabled. onger term MGMTDB will become a key component of the Grid Infrastructure and provide services for important components, because of this MGMTDB will eventually become a mandatory component in future upgrades to releases on Exadata.

See document 1568402.1 for more details.

Secret Agentman – Clusterware Processes and Agents

Some of you may old enough to recall the song “Secret Agent Man” from Johnny Rivers:
There’s a man who leads a life of danger.
To everyone he meets he stays a stranger.
With every move he makes another chance he takes.
Odds are he won’t live to see tomorrow.

Well that’s how I felt when I was at a customer site recently (well maybe not exactly).

They recently had a issue with a node eviction. That in itself deserves a blog post later.
But anyways, he was asking “what are all these Clusterware processes and how do you even traverse through all the log files”.
After 15 mins of discussion, I realized I had thoroughly confused him.
So I suggested we start from the beginning and firstly try to understand Oracle Clusterware processes, agents, and relationships, then draw up some pictures. Maybe then we’ll have a better feel for hierarchy.

Let’s start with the grand master himself HAS (or OHASD)

OHASD manages clusterware daemons, including CRSD. We’ll discuss CRSD resources and startp in another blog. For now just keep in mind that OHASD starts up CRSD (at some point later in the stack), once CRSD is started, it manages the remaining startup of the stack

The “-init flag” is needed for crsctl to operate on OHASD resources,e.g. crsctl stat res ora.crsd -init
To list resources started by CRSD you would issue just “crsctl stat res”

OHASD resource startup order
ora.gpnpd -> Starts ora.mdnsd because of dependency
ora.cssd -> Starts ora.diskmon and ora.cssdmonitor because of dependency

OHASD has agents that work for him. These agents are oraagent, orarootagent, cssdagent and cssdmonitoragent. Each agent manages and handles very specific OHASD resources, and each agent runs as a specific user (root or, clusterware user).
For example, the ora.cssd resource (as root user) is started and monitored by the ora.cssdagent, whereas ora.asm is handled by the oraagent (running as cluster ware user).

All agent as well as other OHASD resource log files are in the CRS $ORACLE_HOME/log/hostname/agent/{ohasd|crsd}/agentname_owner/agentname_owner.log or in CRS $ORACLE_HOME/log/hostname/resource_name/resource_name.log; respectively.

To find out which agent is associated with a resource issue the following:

[root@rhel59a log]# crsctl stat res ora.cssd -init -p |grep “AGENT_FILENAME”

For example, for CRSD we find:

[root@rhel59a bin]# crsctl stat res ora.crsd -init -p |grep “AGENT_FILENAME”

Note, an agent log file can have log messages for more than one resources, since those resources are managed by the same agent.

When I debug a resource, I start by going down the following Clusterware log file tree:
1. Start with Clusterware alert.log

2. Depending on the resource (managed by OHASD or CRSD) I look $ORACLE_HOME/logs//ohasd/ohasd.log or $ORACLE_HOME/logs//crsd/crsd.log

3. Then agent log file, as I mentioned above

4. Then finally to the resources log file itself (that’ll be listed in the agent log)

Item #2 requires a little more discussion, and will be the topic of our next discussion