Setting Jumbo Frames – Portrait of a Large MTU size

There cases where we need to ensure that large packet “address-ability” exists. This is needed to verify configuration for non standard packet sizes, i.e, MTU of 9000. For example if we are deploying a NAS or backup server across the network.

Setting the MTU can be done by editing the configuration script for the relevant interface in /etc/sysconfig/network-scripts/. In our example, we will use the eth1 interface, thus the file to edit would be ifcfg-eth1.

Add a line to specify the MTU, for example:
DEVICE=eth1
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.20.2
NETMASK=255.255.255.0
MTU=9000

Assuming that MTU is set on the system, just do a ifdown eth1 followed by ifup eth1.
An ifconfig eth1 will tell if its set correctly

eth1 Link encap:Ethernet HWaddr 00:0F:EA:94:xx:xx
inet addr:192.168.20.2 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20f:eaff:fe91:407/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:141567 errors:0 dropped:0 overruns:0 frame:0
TX packets:141306 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:101087512 (96.4 MiB) TX bytes:32695783 (31.1 MiB)
Interrupt:18 Base address:0xc000

To validate end-2-end MTU 9000 packet management

Execute the following on Linux systems:

ping -M do -s 8972 [destinationIP]
For example: ping datadomain.viscosityna.com -s 8972

The reason for the 8972 on Linux/Unix system, the ICMP/ping implementation doesn’t encapsulate the 28 byte ICMP (8) + TCP (20) (ping + standard transmission control protocol packet) header. Therefore, take in account : 9000 and subtract 28 = 8972.

[root@racnode01]# ping -s 8972 -M do datadomain.viscosityna.com
PING datadomain.viscosityna.com. (192.168.20.32) 8972(9000) bytes of data.
8980 bytes from racnode1.viscosityna.com. (192.168.20.2): icmp_seq=0 ttl=64 time=0.914 ms

To illustrate if proper MTU packet address-ability is not in place, I can set a larger packet size in the ping (8993). The packet gets fragmented you will see
“Packet needs to be fragmented by DF set”. In this example, the ping command uses ” -s” to set the packet size, and “-M do” sets the Do Not Fragment

[root@racnode01]# ping -s 8993 -M do datadomain.viscosityna.com
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING datadomain.viscosityna.com. (192.168.20.32) 8993(9001) bytes of data.
From racnode1.viscosityna.com. (192.168.20.2) icmp_seq=0 Frag needed and DF set (mtu = 9000)

By adjusting the packet size, you can figure out what the mtu for the link is. This will represent the lowest mtu allowed by any device in the path, e.g., the switch, source or target node, target or anything else inbetween.

Finally, another way to verify the correct usage of the MTU size is the command ‘netstat -a -i -n’ (the column MTU size should be 9000 when you are performing tests on Jumbo Frames)

Secret Agentman – Clusterware Processes and Agents

Some of you may old enough to recall the song “Secret Agent Man” from Johnny Rivers:
There’s a man who leads a life of danger.
To everyone he meets he stays a stranger.
With every move he makes another chance he takes.
Odds are he won’t live to see tomorrow.

Well that’s how I felt when I was at a customer site recently (well maybe not exactly).

They recently had a issue with a node eviction. That in itself deserves a blog post later.
But anyways, he was asking “what are all these Clusterware processes and how do you even traverse through all the log files”.
After 15 mins of discussion, I realized I had thoroughly confused him.
So I suggested we start from the beginning and firstly try to understand Oracle Clusterware processes, agents, and relationships, then draw up some pictures. Maybe then we’ll have a better feel for hierarchy.

Let’s start with the grand master himself HAS (or OHASD)

OHASD manages clusterware daemons, including CRSD. We’ll discuss CRSD resources and startp in another blog. For now just keep in mind that OHASD starts up CRSD (at some point later in the stack), once CRSD is started, it manages the remaining startup of the stack

The “-init flag” is needed for crsctl to operate on OHASD resources,e.g. crsctl stat res ora.crsd -init
To list resources started by CRSD you would issue just “crsctl stat res”

OHASD resource startup order
ora.gipcd
ora.gpnpd -> Starts ora.mdnsd because of dependency
ora.cssd -> Starts ora.diskmon and ora.cssdmonitor because of dependency
ora.ctssd
ora.evmd
ora.crsd

OHASD has agents that work for him. These agents are oraagent, orarootagent, cssdagent and cssdmonitoragent. Each agent manages and handles very specific OHASD resources, and each agent runs as a specific user (root or, clusterware user).
For example, the ora.cssd resource (as root user) is started and monitored by the ora.cssdagent, whereas ora.asm is handled by the oraagent (running as cluster ware user).

All agent as well as other OHASD resource log files are in the CRS $ORACLE_HOME/log/hostname/agent/{ohasd|crsd}/agentname_owner/agentname_owner.log or in CRS $ORACLE_HOME/log/hostname/resource_name/resource_name.log; respectively.

To find out which agent is associated with a resource issue the following:

[root@rhel59a log]# crsctl stat res ora.cssd -init -p |grep “AGENT_FILENAME”
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%

For example, for CRSD we find:

[root@rhel59a bin]# crsctl stat res ora.crsd -init -p |grep “AGENT_FILENAME”
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%

Note, an agent log file can have log messages for more than one resources, since those resources are managed by the same agent.

When I debug a resource, I start by going down the following Clusterware log file tree:
1. Start with Clusterware alert.log

2. Depending on the resource (managed by OHASD or CRSD) I look $ORACLE_HOME/logs//ohasd/ohasd.log or $ORACLE_HOME/logs//crsd/crsd.log

3. Then agent log file, as I mentioned above

4. Then finally to the resources log file itself (that’ll be listed in the agent log)

Item #2 requires a little more discussion, and will be the topic of our next discussion

My new Favorite RAC-Clusterware command

My new favorite 12c Oracle Clusterware command is the 'crsctl stat res "resource name" -dependency'

What this command does, is to provide a dependency tree structure for resource the in question.  This will display startup (default) and shutdown dependencies.  

From this we can understand the pull-up, pushdown, weak, and hard dependencies between clusterware resources 


[oracle@rac02 ~]$ crsctl stat res ora.dagobah.db -dependency
================================================================================
Resource Start Dependencies
================================================================================
---------------------------------ora.dagobah.db---------------------------------
ora.dagobah.db(ora.database.type)->
| type:ora.listener.type[weak:type]
| | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | ora.net1.network(ora.network.type)[hard,pullup]
| | | ora.gns<Resource not found>[weak:global]
| type:ora.scan_listener.type[weak:type:global]
| | ora.scan1.vip(ora.scan_vip.type)[hard,pullup]
| | | ora.net1.network(ora.network.type)[hard,pullup:global]
| | | ora.gns<Resource not found>[weak:global]
| | | type:ora.scan_vip.type[dispersion:type:active]
| | type:ora.scan_listener.type[dispersion:type:active]
| ora.ons(ora.ons.type)[weak:uniform]
| | ora.net1.network(ora.network.type)[hard,pullup]
| ora.gns<Resource not found>[weak:global]
| ora.PDBDATA.dg(ora.diskgroup.type)[weak:global:uniform]
| | ora.asm(ora.asm.type)[hard,pullup:always]
| | | ora.LISTENER.lsnr(ora.listener.type)[weak]
| | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | | | ora.net1.network(ora.network.type)[hard,pullup]
| | | | | ora.gns<Resource not found>[weak:global]
| | | ora.ASMNET1LSNR_ASM.lsnr(ora.asm_listener.type)[hard,pullup]
| | | | ora.gns<Resource not found>[weak:global]
| ora.FRA.dg(ora.diskgroup.type)[hard:global:uniform,pullup:global]
| | ora.asm(ora.asm.type)[hard,pullup:always]
| | | ora.LISTENER.lsnr(ora.listener.type)[weak]
| | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type]
| | | | | ora.net1.network(ora.network.type)[hard,pullup]
| | | | | ora.gns<Resource not found>[weak:global]
| | | ora.ASMNET1LSNR_ASM.lsnr(ora.asm_listener.type)[hard,pullup]
| | | | ora.gns<Resource not found>[weak:global]
--------------------------------------------------------------------------------

Now the same for shutdown (pushdown) dependencies

[oracle@rac02 ~]$ crsctl stat res ora.dagobah.db -dependency -stop
================================================================================
Resource Stop Dependencies
================================================================================
---------------------------------ora.dagobah.db---------------------------------
ora.dagobah.db(ora.database.type)->
| ora.dagobah.hoth.svc(ora.service.type)[hard:intermediate]
| ora.dagobah.r2d2.svc(ora.service.type)[hard:intermediate]
--------------------------------------------------------------------------------

Why is this command and output important?  Well, in cases where a particular resource doesn't come up, you may want to understand relationship with its dependents
The reason is, if you are creating your own resource dependencies using the CRS API (formally known as CLSCRS API).

<pre>CLSCRS is a set of C-based APIs for Oracle Clusterware. The CLSCRS APIs enable you to manage the operation of entities that are managed by Oracle Clusterware. These entities include resources, resource types, servers, and server pools. You can use the APIs to register user applications with Oracle Clusterware so that the clusterware can manage them and maintain high availability. Once an application is registered, you can manage, monitor and query the application's status.  The APIs allow you to use the callbacks for diagnostic logging.

</pre>