Expanding /u01 filesystem on Oracle Database Appliance

Recently I had to expand the /u01 on our ODA because we were in the process of consolidating several new Oracle database systems, each with their own Oracle Homes (don’t ask….its what the lines of business wanted).

Although lot of this is just a simple Linux LVM stuff…I feel it warrants a blog entry….since folks view an ODA as different beast 🙂

[root@vna-oda1-0 ~]# pvdisplay

  — Physical volume —

  PV Name               /dev/md1

  VG Name               VolGroupSys

  PV Size               446.03 GiB / not usable 29.00 MiB

  Allocatable           yes 

  PE Size               32.00 MiB

  Total PE              14272

  Free PE               7424

  Allocated PE          6848

  PV UUID               Kw1O64-n9j0-4OW7-yUCZ-8FHc-HKug-mOPyP4

[root@vna-oda1-0 ~]# df -m /u01

Filesystem           1M-blocks  Used Available Use% Mounted on


                        100666 64564     30982  68% /u01

[root@vna-oda1-0 ~]# lvdisplay /dev/mapper/VolGroupSys-LogVolU01

  — Logical volume —

  LV Path                /dev/VolGroupSys/LogVolU01

  LV Name                LogVolU01

  VG Name                VolGroupSys

  LV UUID                UGVuZY-Xia1-u0Th-TaZ2-JF9Q-FH01-jDVfOn

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2018-05-23 13:14:22 -0400

  LV Status              available

  # open                 1

  LV Size                100.00 GiB

  Current LE             3200

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  – currently set to     256

  Block device           249:40

[root@vna-oda1-0 ~]# df -kh 

Filesystem            Size  Used Avail Use% Mounted on


                       30G  5.8G   23G  21% /

tmpfs                 378G  1.3G  376G   1% /dev/shm

/dev/md0              477M  115M  338M  26% /boot

/dev/sda1             500M  304K  500M   1% /boot/efi


                       59G   36G   21G  64% /opt


                       99G   64G   31G  68% /u01

lvextend –size +50G /dev/VolGroupSys/LogVolU01

  Size of logical volume VolGroupSys/LogVolU01 changed from 100.00 GiB (3200 extents) to 150.00 GiB (4800 extents).

  Logical volume LogVolU01 successfully resized.

[root@zsc-oda0-0 oak]# 

[root@zsc-oda0-0 oak]# 

[root@zsc-oda0-0 oak]# lvdisplay /dev/mapper/VolGroupSys-LogVolU01

  — Logical volume —

  LV Path                /dev/VolGroupSys/LogVolU01

  LV Name                LogVolU01

  VG Name                VolGroupSys

  LV UUID                g2CMK8-kERY-43uu-p0ZD-V5In-otaL-S2zSdX

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2018-05-21 11:04:34 -0400

  LV Status              available

  # open                 1

  LV Size                150.00 GiB

  Current LE             4800

  Segments               2

  Allocation             inherit

  Read ahead sectors     auto

  – currently set to     256

  Block device           249:25

[root@zsc-oda0-0 oak]# resize2fs /dev/VolGroupSys/LogVolU01

resize2fs 1.43-WIP (20-Jun-2013)

Filesystem at /dev/VolGroupSys/LogVolU01 is mounted on /u01; on-line resizing required

old_desc_blocks = 7, new_desc_blocks = 10

Performing an on-line resize of /dev/VolGroupSys/LogVolU01 to 39321600 (4k) blocks.

The filesystem on /dev/VolGroupSys/LogVolU01 is now 39321600 blocks long.


ACFS Snapshot – A Walk Through

This blog explores some of the new 12.2 ACFS features.  We will walk through the ACFS snapshot process flow:


[oracle@oracle122 log]$ acfsutil snap info /acfsmounts/acfsdata/

snapshot name:               just_before_load

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_before_load

RO snapshot or RW snapshot:  RO

parent name:                 /acfsmounts/acfsdata/

snapshot creation time:      Wed Mar 22 20:36:09 2017

storage added to snapshot:   8650752   (   8.25 MB )

number of snapshots:  1

snapshot space usage: 8704000  (   8.30 MB )

[oracle@oracle122 log]$ du -sk .

18292  .

[oracle@oracle122 log]$ acfsutil snap create -w -p just_before_load just_about_batch_upload /acfsmounts/acfsdata/

acfsutil snap create: Snapshot operation is complete.

[oracle@oracle122 log]$ acfsutil snap info /acfsmounts/acfsdata

snapshot name:               just_before_load

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_before_load

RO snapshot or RW snapshot:  RO

parent name:                 /acfsmounts/acfsdata

snapshot creation time:      Wed Mar 22 20:36:09 2017

storage added to snapshot:   8650752   (   8.25 MB )

snapshot name:               just_about_batch_upload

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_about_batch_upload

RO snapshot or RW snapshot:  RW

parent name:                 just_before_load

snapshot creation time:      Wed Mar 22 20:42:56 2017

storage added to snapshot:   8650752   (   8.25 MB )

root@oracle122 ~]# acfsutil compress on /acfsmounts/acfsdata/log/wtf

acfsutil compress on: ACFS-05518: /acfsmounts/acfsdata/log/wtf is not an ACFS mount point

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/wtf

The file /acfsmounts/acfsdata/log/wtf is not compressed.

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/nitin

nitin             nitin_compressed 

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/nitin_compressed

Compression Unit size: 32768

Disk storage used:   (  60.00 KB )

Disk storage saved:  (   7.75 MB )

Storage used is 1% of what the uncompressed file would use.

File is not scheduled for asynchronous compression.

oracle@oracle122 log]$ ls -l lastlog*

-rw-r--r--. 1 oracle oracle 145708 Mar 22 12:07 lastlog

-rw-r--r--. 1 oracle oracle 145708 Mar 23 05:49 lastlog_compressed

[oracle@oracle122 log]$

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/lastlog_compressed

Compression Unit size: 32768

Disk storage used:   (  32.00 KB )

Disk storage saved:  ( 110.29 KB )

Storage used is 22% of what the uncompressed file would use.

File is not scheduled for asynchronous compression.

If you are curious about the other snapshop options... then look below !!

[oracle@oracle122 log]$ acfsutil snap -h

 Command Subcmd    Arguments

--------------- --------- ------------------------------------------

snap create    [-w|-r|-c] [-p parent_snap_name] <snap_name> <mountpoint>

snap create    [-w]                      - create a writeable snapshot

snap create    [-r]                      - create a read-only snapshot

snap create                                This is the default behavior

snap create    [-c]                      - create a writable snapshot of a

snap create                                snap duplicate target

snap create    [-p parent_snap_name]     - create a snapshot from a snapshot

snap delete    <snap_name> <mountpoint> - delete a file system snapshot

snap rename    <old_snap_name> <new_snap_name> <mountpoint>

snap rename                             - rename a file system snapshot

snap convert   -w|-r <snap_name> <mountpoint>

snap convert   -w                       - convert to a writeable snapshot

snap convert   -r                       - convert to a read-only snapshot

snap info      [-t] [<snap_name>] <mountpoint>

snap info                    - get information about snapshots

snap info      [-t]          - display family tree starting at next name given

snap info      [<snap_name>] - snapshot name

snap info      <mountpoint>  - mount point

snap remaster  {<snap_name> | -c} <volume_path>

snap remaster                           - make the specified snapshot

snap remaster                             the master file system.  The

snap remaster                             current master and all other

snap remaster                             snapshots will be deleted.

snap remaster                             WARNING: This operation cannot

snap remaster                             be reversed.  Admin privileges

snap remaster                             are required.  The file system

snap remaster                             must be unmounted on all nodes.

snap remaster                             The file system must not have

snap remaster                             Replication running.

snap remaster  [-c]                     - Continue an interrupted snapshot

snap remaster                             remastering.  Use the -c option,

snap remaster                             instead of the <snap_name>, to

 snap remaster                             complete an interrupted

snap remaster                             snapshot remastering.

snap remaster  [-f]                     - Force the snapshot remastering.

 snap duplicate apply     [-b] [-d {0..6}] [<snap_name>] <mountpoint>

 snap duplicate apply     -b                       - maintain backup snapshot

 snap duplicate apply     [-d {0..6}]              - set trace level for debugging

 snap duplicate apply     [<snap_name>]            - target snapshot

 snap duplicate apply     <mountpoint>             - mount point for target site

 snap duplicate create    [-r] [-i oldsnapname] [-d {0..6}] <newsnapname> <mountpoint>

 snap duplicate create    [-r]              - restart of data stream

 snap duplicate create    [-p parentsnap]   - parent snap for base site

 snap duplicate create    [-i oldsnapname]  - old snapshot name

 snap duplicate create    [-d {0..6}]       - set trace level for debugging

 snap duplicate create    <newsnapname>     - new snapshot name

 snap duplicate create    <mountpoint>      - mount point for base site

 snap quota     [[-|+]nnn[K|M|G|T|P]]<snap_name> <mountpoint>

 snap quota                              - set quota for snapshot


Grid Infrastructure and RAC 12.2 New Features – a Recap

The following list illustrates the new 12.2 Oracle RAC and Grid Infrastructure. This is a personal list which “I believe to be the most interesting.” I apologize to the RAC Dev team if I left out any features.

Streamlined Grid Infrastructure Installation

12.2 Grid Infrastructure software is available as an image file for download and installation. The key objective of this feature was to enable a simpler and quicker installation of Grid Infrastructure. Administrators simply prep the system by creating a new Grid home directory, appropriate users, permissions and kernel settings. Once completed, Admins extract the image file into the newly-created Grid home, and execute the gridsetup.sh script to invoke setup wizard to register the Oracle Grid Infrastructure stack with Oracle inventory. This installation approach can be used for Oracle Grid Infrastructure for Cluster and Standalone Servers configurations. This new software installation will improve large scale deployment automation as well as deployment of customized images, Patch Set Updates (PSUs) and patches.

Real Application Clusters Reader Nodes

In 12.2, Oracle extended the capability of Flex Clusters by introducing Reader nodes. Reader nodes are Leaf nodes (in a Flex Cluster) that run read-only RAC database instances. The Reader nodes are not affected by RAC reconfigurations, caused by node evictions or other cluster node membership changes, as long as the Hub Node, to which it is connected, is part of the cluster. Reader Nodes allows users to create huge reader farms (up to 64 reader nodes per Hub Node), thus enabling massive parallel processing. In this architecture, updates to the read/write instances (running on Hub nodes) are immediately propagated to the read-only instances on the Leaf Nodes, where they can be used for online reporting or instantaneous queries. Users can create services to direct queries to read-only instances running on reader nodes.

Service-Oriented Buffer Cache Access

RAC Services, which are used to allocate and distribute workloads across RAC instances, are the cornerstone of RAC workload management. There is a strong relationship between a RAC Service, a specific workload, and the database object it accesses. With 12.2 RAC, a Service- oriented buffer cache feature was introduced to improve scale and performance, by optimizing instance and node-buffer cache affinity. This is done by caching or pre-warming instances with data blocks for objects accessed where a service is expected to run.

Twelve Days of 12.2

Server Weight-Based Node Eviction

When there is a spilt-brain, or when a node eviction decision must be made, traditionally the decision was based on age, or duration of the nodes, in the cluster; i.e., nodes with a large uptime in the cluster will survive. In 12.2 RAC, Server weight-based node eviction uses a more intelligent, tie-breaker mechanism to evict a particular node or a group of nodes from a cluster. The Server Weight-based node eviction feature introspects the current load on those servers as part of the decision. Two principle mechanisms, a system inherent automatic mechanism and a user input-based mechanism is used to offer and provide guidance.

Load-Aware Resource Placement

Load-aware resource placement, prevents overloading a server with more database instances than the server is capable of running. The metrics used to determine whether an application can be started on a given server, is based on the expected resource consumption of the application, as well as the capacity of the server in terms of CPU and memory. Administrators can define database resources such as CPU (cpu_count) and memory (memory_target) to Clusterware. Clusterware uses this information to place the database instances only on servers that meet a sufficient number of CPUs, amount of memory or both.

srvctl modify database -db testdb -cpucount 8 -memorytarget 64g

Hang Manager

The Hang Manager features first became available in 11gR1. In this initial version, Hang Manager evaluated and identified system hangs, then dumped the relevant information, “wait for graph,” into a trace file. In 12.2, Hang Manager takes action and attempts to resolve the system hang. An ORA-32701 error message is logged in the alert log to reflect the hang resolution. Hang Manager also runs in both single-instance and Oracle RAC database instances. With Hang Manager, it is constantly aware of processes running in reader nodes instances, and checks whether any of these processes are blocking progress on Hub Nodes to take action, if possible.

Separation of Duty for Administering RAC Clusters

12.2 RAC introduces a new administrative privilege called SYSRAC. This privilege is used by the Clusterware agent, and removes the need to use SYSDBA privilege for RAC administrative tasks, thus reducing the reliance on SYSDBA on production systems. Note, SYSRAC privilege is the default mode for connecting to the database by Clusterware agent; e.g, when executing RAC utilities such as SRVCTL.

Rapid Home Provisioning of Oracle Software

Rapid Home Provisioning enables you to create clusters, provision, patch, and upgrade Oracle Grid Infrastructure and Oracle Database homes. It also provisions 11.2 Clusters, applications, and middleware using Rapid Home Provisioning.

Extended Clusters

In 12.2 GI Administrators can create an extended RAC cluster across two, or more, geographically separate sites. Note, each site will include a set of servers with its own storage. If a site fails, the other site acts as an active standby. 12.2 Extended Clusters can be built on initial installation or be converted from an existing (non-Flex ASM) cluster, using the ConvertToExtended script.

De-support of OCR and Voting Files on Shared Filesystem

In Grid Infrastructure 12.2, the placement of Oracle Clusterware files: the Oracle Cluster Registry (OCR), and the Voting Files, directly on a shared file system is desupported. Only ASM or NFS is supported. If you need to use a supported shared file system, either a Network File System, or a shared cluster file system instead of native disk devices, then you must create Oracle ASM disks on supported network file systems that you plan to use for hosting Oracle Clusterware files before installing Oracle Grid Infrastructure. You can then use the Oracle ASM disks in an Oracle ASM disk group to manage Oracle Clusterware files. If your Oracle Database files are stored on a shared file system, then you can continue to use shared file system storage for database files, instead of moving them to Oracle ASM storage.

ACFS 12.2 New Features – a Recap

Oracle Automatic Storage Management Cluster File System (ACFS) made it’s debut with Oracle 11.2. Many DBAs are not aware of the vast features that are available with ACFS. With each release and update to Oracle, significant enhancements have been made. With Oracle Database 12c Release 2, new feature/functionality was made to ACFS.

Snapshot Enhancements

In Oracle 12.2, Oracle extends ACFS snapshot functionality and further simplifies file system snapshot operations. The following are a few of the key new features with snapshots:

Admins can now, if needed, impose quotas to snapshots to limit amount of write operations that can be done on a snapshot. Quotas can be set on the snapshot level. Oracle also provides the capability to rename an existing ACFS snapshot, to allow more user-friendly names.

When we delete a snapshot with the “acfsutil snap delete snapshot mount_point” command, we can force a delete, even if there are open files.

There are several new capabilities with snapshot re-mastering and duplication. The new ACFS snapshot remaster capability allows for a snapshot in the snapshot registry to become the primary file system. ACFS snapshot duplication features are introduced. With the “acfsutil snap duplicate create” command, can be used to duplicate a snapshot from an existing snapshot, to a standby target file system.

The “apply” option to the “acfsutil snap duplicate” command, allows us to apply deltas to the target ACFS file system or snapshot. If this is the initial apply, the target file system must be empty. If the target had been applied before, then the apply process becomes an incremental update. Before the incremental update occurs, the contents of the target file system must match the content of the older snapshot, since the last incremental update. Also, the contents of the target snapshot cannot be modified while the apply is happening.

Additionally, ACFS snapshot-based replication now uses SSH protocols to transmit data streams.

4k Sectors and Metadata

When Admins create an ACFS file system, they have the option to create the file system with the 4096-byte metadata structure. When issuing the mkfs command, you can specify the metadata block size with the –i option; two valid options are 512 bytes or 4096 bytes. The 4096-byte metadata structure is made up of multiple 512-byte logical sectors.

If the COMPATIBLE.ADVM ASM Diskgroup attribute is set to 12.2 or greater, then the metadata block is 4096 bytes by default. If COMPATIBLE.ADVM attribute is set to less than 12.2, then the block size is set to 512 bytes. When the ADVM volume of the ACFS file system is set with 4K logical disk sector size, Direct I/O requests should be aligned on the 4K offset and be a multiple of 4k size for optimal performance.


Very rarely would you need the defragmentation tool, due to the fact that ACFS algorithm is for allocation and coalesce-ment of free space. However, for those rare situations, when we can get into fragmented situations under heavy workloads or for compressed files, Oracle provides the defrag option to the acfsutil command. Now, we can issue “acfsutil defrag dir” or “acfsutil defrag file” commands for on-demand defragmentation.

ACFS will perform all defrag operations in the background. With the –r option of the “acfsutil defrag dir”command, you can recursively defrag subdirectories.

Compression Enhancements

ACFS compression can significantly reduce disk storage requirements for customers running databases on ACFS. Databases running on ACFS, must be of versions or higher. ACFS compression can be enabled for specific ACFS file systems for database files, RMAN backup files, archivelogs, data pump extract files, and general purpose files. Oracle does not support redo log/flashback logs/control file compression.

When enabling ACFS compression for a file system, only new incoming files will be compressed. All existing files on the file system will remain un-compressed. Likewise, if you decide to uncompress a file system, Oracle will not de-compress files. Oracle will simply disable compression for newly created files.

To compress and uncompress ACFS file systems, execute the acfsutil compress on or acfsutil compress off commands. To view compression state and space consumption information, you can execute the “acfsutil compress info” command. The commands “acfsutil info fs” and “acfsutil info file” now support ACFS compression status.

At this time, databases with 2K or 4K block sizes are not supported for ACFS compression. ACFS compression is supported on Linux and AIX. ACFS is also supported to work with ACFS snapshot-based replication.

Loopback Devices

ACFS now supports loopback devices on the Linux operating system. With ACFS loopback device support, we can now take OVM images, templates, and virtual disks and present them as a block device. Files can be sparse or non-sparse. ACFS also supports Direct I/O on sparse images.

Metadata Collector

The metadata collector, copies metadata structures from an Oracle ACFS file system to a separate output file that can be ingested for analysis and diagnostics. The metadata collector reads the contents of the file system and all metadata is written out to a specified output file. The metadata collector can read the ACFS file system online without requiring an outage. Note, this tool is not a replacement for the file system checker command (fsck), but a supplement for additional diagnosis and support. Even though the metadata collector can read the file system while it is online, for best results, unmount the file system prior to metadata collection. The size of the output file, is directly correlated to the size of the file system that the collection is specified for. To collect metadata for a file system, invoke the “acfsutil meta” command.

The auto-resize feature, allows us to “autoextend” a file system if the size of the file system is about to run out of space. Just like an Oracle datafile that has the autoextend option enabled, we can now “autoextend” the ACFS file system to the size of the increment by option. With the –a option to the “acfsutil size” command, we can specify the increment by size.

We can also specify the maximum size or quota for the ACFS file system to “autoextend” to guard against a runaway space consumption. To set the maximum size for an ACFS file system, execute the “acfsutil size” command with the –x option.

Setting Round-Robin Multipathing Policy in VMware ESXi 6.0

Storage Array Type Plugins (SATP) and Path Selection Plugins (PSP) are part of the VMware APIs for Pluggable Storage Architecture (PSA). The SATP has all the knowledge of the storage array to aggregate I/Os across multiple channels and has the intelligence to send failover commands when a path has failed. The Path Selection Policy can be either “Fixed”, “Most Recently Used” or “Round Robin”.

If a VMware VM is using RDM with All Flash Arrays, then the Round Robin policy should be used. Furthermore, inside the Linux kernel (VM), the noop IO scheduler should be used. Both need to executed for proper throughput.

As a best practice, the preferred method to set Round Robin policy, is to create a rule that will allow any newly added FlashArray device, to automatically set the Round Robin PSP and an IO Operation Limit value of 1. In this blog I’ll refer to the PureStorage array for setting Round Robin policy as well as setting IO limit.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:

esxcli storage nmp satp rule add -s “VMW_SATP_ALUA” -V “PURE” -M “FlashArray” -P”VMW_PSP_RR” -O “iops=1”

This must be repeated for each ESXi host.
This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

$hosts = get-vmhost
foreach ($esx in $hosts)
$esxcli=get-esxcli -VMHost $esx
$esxcli.storage.nmp.satp.rule.add($null, $null, “PURE FlashArray RR IO Operation Limit
Rule”, $null, $null, $null, “FlashArray”, $null, “VMW_PSP_RR”, “iops=1”, “VMW_SATP_ALUA”,
$null, $null, “PURE”)

It is important to note that existing, previously presented devices will need to be either manually set to Round Robin and an I/O Operation Limit of 1 or unclaimed and reclaimed through either a reboot of the host or through a manual device reclaim process so that it can inherit the configuration set forth by the new rule. For setting a new I/O Operation Limit on an existing device, use the following procedure:

The first step is to change the particular device to use the Round Robin PSP. This must be done on every ESXi host and can be done with through the vSphere Web Client, the Pure Storage Plugin for the vSphere Web Client or via command line utilities.

Via esxcli:
esxcli storage nmp device set -d naa. –psp=VMW_PSP_RR

Note that changing the PSP using the Web Client Plugin is the preferred option as it will automatically configure Round Robin across all of the hosts. Note that this does not set the IO Operation Limit to 1. That is a command line option only, and must be done separately.

Round Robin can also be set on a per-device, per-host basis using the standard vSphere Web Client actions. The procedure to setup Round Robin policy for a Pure Storage volume. Note that this does not set the IO Operation Limit it 1 which is a command line option only—this must be done separately.

The IO Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or altered via command line utilities. The following command can check a particular device for the PSP and IO Operations Limit:

esxcli storage nmp device list -d naa.

To set a device that is pre-existing to have an IO Operation limit of one, run the following command:

esxcli storage nmp psp roundrobin deviceconfig set -d naa. -I 1 -t iops

Setting Jumbo Frames – Portrait of a Large MTU size

There cases where we need to ensure that large packet “address-ability” exists. This is needed to verify configuration for non standard packet sizes, i.e, MTU of 9000. For example if we are deploying a NAS or backup server across the network.

Setting the MTU can be done by editing the configuration script for the relevant interface in /etc/sysconfig/network-scripts/. In our example, we will use the eth1 interface, thus the file to edit would be ifcfg-eth1.

Add a line to specify the MTU, for example:

Assuming that MTU is set on the system, just do a ifdown eth1 followed by ifup eth1.
An ifconfig eth1 will tell if its set correctly

eth1 Link encap:Ethernet HWaddr 00:0F:EA:94:xx:xx
inet addr: Bcast: Mask:
inet6 addr: fe80::20f:eaff:fe91:407/64 Scope:Link
RX packets:141567 errors:0 dropped:0 overruns:0 frame:0
TX packets:141306 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:101087512 (96.4 MiB) TX bytes:32695783 (31.1 MiB)
Interrupt:18 Base address:0xc000

To validate end-2-end MTU 9000 packet management

Execute the following on Linux systems:

ping -M do -s 8972 [destinationIP]
For example: ping datadomain.viscosityna.com -s 8972

The reason for the 8972 on Linux/Unix system, the ICMP/ping implementation doesn’t encapsulate the 28 byte ICMP (8) + TCP (20) (ping + standard transmission control protocol packet) header. Therefore, take in account : 9000 and subtract 28 = 8972.

[root@racnode01]# ping -s 8972 -M do datadomain.viscosityna.com
PING datadomain.viscosityna.com. ( 8972(9000) bytes of data.
8980 bytes from racnode1.viscosityna.com. ( icmp_seq=0 ttl=64 time=0.914 ms

To illustrate if proper MTU packet address-ability is not in place, I can set a larger packet size in the ping (8993). The packet gets fragmented you will see
“Packet needs to be fragmented by DF set”. In this example, the ping command uses ” -s” to set the packet size, and “-M do” sets the Do Not Fragment

[root@racnode01]# ping -s 8993 -M do datadomain.viscosityna.com
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING datadomain.viscosityna.com. ( 8993(9001) bytes of data.
From racnode1.viscosityna.com. ( icmp_seq=0 Frag needed and DF set (mtu = 9000)

By adjusting the packet size, you can figure out what the mtu for the link is. This will represent the lowest mtu allowed by any device in the path, e.g., the switch, source or target node, target or anything else inbetween.

Finally, another way to verify the correct usage of the MTU size is the command ‘netstat -a -i -n’ (the column MTU size should be 9000 when you are performing tests on Jumbo Frames)

High Level Overview of 11204 ASM Rebalance in Async ARB0

High Level look at 11204 Rebalance with Plan Optimiation and Async ARB0


Drop disk


SQL> alter diskgroup reco drop disk ‘ASM_NORM_DATA4’ rebalance power 12

here we issue the rebalance

NOTE: requesting all-instance membership refresh for group=2

GMON querying group 2 at 120 for pid 19, osid 19030

GMON updating for reconfiguration, group 2 at 121 for pid 19, osid 19030

NOTE: group 2 PST updated.

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

GMON querying group 2 at 122 for pid 13, osid 4000

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

NOTE: starting rebalance of group 2/0x89b87754 (RECO) at power 12   rebalance internally started

Starting background process ARB0    ARB0 gets started for this rebalance

SUCCESS: alter diskgroup reco drop disk ‘ASM_NORM_DATA4’ rebalance power 12

Wed Sep 19 23:54:10 2012

ARB0 started with pid=21, OS id=19526

NOTE: assigning ARB0 to group 2/0x89b87754 (RECO) with 12 parallel I/Os   ARB0 assigned to this

diskgroup rebalance. Note that it states 12 parallel I/Os

NOTE: Attempting voting file refresh on diskgroup RECO

Wed Sep 19 23:54:38 2012

NOTE: requesting all-instance membership refresh for group=2   first indications that rebalance is completing

GMON updating for reconfiguration, group 2 at 123 for pid 22, osid 19609

NOTE: group 2 PST updated.

SUCCESS: grp 2 disk ASM_NORM_DATA4 emptied    Once rebalanced relocation phase is complete, the disk is emptied

NOTE: erasing header on grp 2 disk ASM_NORM_DATA4   The emptied disk’s header is erased and set to FORMER

NOTE: process _x000_+asm (19609) initiating offline of disk 3.3915941808 (ASM_NORM_DATA4) with mask 0x7e in group 2

The dropped disk is offlined

NOTE: initiating PST update: grp = 2, dsk = 3/0xe96887b0, mask = 0x6a, op = clear

GMON updating disk modes for group 2 at 124 for pid 22, osid 19609

NOTE: PST update grp = 2 completed successfully

NOTE: initiating PST update: grp = 2, dsk = 3/0xe96887b0, mask = 0x7e, op = clear

GMON updating disk modes for group 2 at 125 for pid 22, osid 19609

NOTE: cache closing disk 3 of grp 2: ASM_NORM_DATA4

NOTE: PST update grp = 2 completed successfully

GMON updating for reconfiguration, group 2 at 126 for pid 22, osid 19609

NOTE: cache closing disk 3 of grp 2: (not open) ASM_NORM_DATA4

NOTE: group 2 PST updated.

Wed Sep 19 23:54:42 2012

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

GMON querying group 2 at 127 for pid 13, osid 4000

GMON querying group 2 at 128 for pid 13, osid 4000

NOTE: Disk in mode 0x8 marked for de-assignment

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

NOTE: Attempting voting file refresh on diskgroup RECO

Wed Sep 19 23:56:45 2012

NOTE: stopping process ARB0    All phases of rebalance are completed and ARB0 is shutdown

SUCCESS: rebalance completed for group 2/0x89b87754 (RECO)   Rebalance marked as complete



Add disk

Starting background process ARB0

SUCCESS: alter diskgroup reco add disk ‘ORCL:ASM_NORM_DATA4’ rebalance power 16

Thu Sep 20 23:08:22 2012

ARB0 started with pid=22, OS id=19415

NOTE: assigning ARB0 to group 2/0x89b87754 (RECO) with 16 parallel I/Os

Thu Sep 20 23:08:31 2012

NOTE: Attempting voting file refresh on diskgroup RECO

Thu Sep 20 23:08:46 2012

NOTE: requesting all-instance membership refresh for group=2

Thu Sep 20 23:08:49 2012

NOTE: F1X0 copy 1 relocating from 0:2 to 0:459 for diskgroup 2 (RECO)

Thu Sep 20 23:08:50 2012

GMON updating for reconfiguration, group 2 at 134 for pid 27, osid 19492

NOTE: group 2 PST updated.

Thu Sep 20 23:08:50 2012

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

NOTE: F1X0 copy 2 relocating from 1:2 to 1:500 for diskgroup 2 (RECO)

NOTE: F1X0 copy 3 relocating from 2:2 to 2:548 for diskgroup 2 (RECO)

GMON querying group 2 at 135 for pid 13, osid 4000

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

Thu Sep 20 23:09:06 2012

NOTE: Attempting voting file refresh on diskgroup RECO

Thu Sep 20 23:09:57 2012

NOTE: stopping process ARB0

SUCCESS: rebalance completed for group 2/0x89b87754 (RECO)

------------ ---------- ------------ -------------
           2         1           0             2
           2         32           0             2

------------ ---------- ------------ -------------
           2         1           16             1
------------ ---------- ------------ -------------
           2        1           16             2
           2         32           16             2

------------ ---------- ------------ -------------
           2         1           16             2


What should you ask your All Flash Array Vendor

I was just traveling back from a client, where the customer just bought into the concept of an all flash array for their database and VDI workloads.  They had asked me to help out where I can.  So I started pondering the things this customer (or any customers/buyers) should think through…..

At first I was going to do a comparative analysis (table) of the All Flash Arrays out on the market.  However, since the AFA market is constantly changing anyways… why bother with a comparison.

Thus, I changed my approach to aiding the buyer/architect in positioning the appropriate questions to the vendor.  Thus the approach became more of  “What to consider when considering” when purchasing a AFA.

Now note, I’m not stating some earth shattering thought leadership here or a new dimension of looking at this issue,   I’m merely sharing what I was going to present to the customer

Anyways……As with most storage decisions, its very hard to bucketize considerations into the performance, costs and manageability categories, because they are so intertwined.  Also, I specifically did not address Cost separately, since Cost traverses every layer and topic, whether for cost-performance, cost- supportability or feature-cost usage.

1. Performance is king! – We know AFA performance is awesome, but think thru and ask the following:

a. How does the AFA fair with the differing workloads; i.e., degree of sequential to random, and read/write ratios of 80/20, 70/30,  and 50/50.  And especially when the array is near capacity -> 70% or 80%

b. How is garbage collection handled.  Is it using ASIC/SSD or controller based garbage collection.  Regardless, the buyer shouldn’t have to understand the bowels of garbage collection, so the question to the vendor should be simply what is the performance consistency, or better stated “consistency of performance” –  specifically during steady state/peak workloads or during flash maintenance operations (garbage collection, flash overwrites, wear leveling, etc.).

c.  I wasn’t sure if I should even add this entry, but for completeness I will.  AFAs  on the market today use a type or combination of  SSD drives: SLC, MLC, (cMLC), eMLC, etc.  As with the above, buyers should not concern themselves with this level of detail, but one should ascertain the performance they should expect.  This category really needs to go in the costing Category – cost per IOP, cost per GB, etc.

2. Manageability

a. How does the array handle non-disruptive upgrades (NDU).  AFA occasional patches, updates and even field replaceable changes, thus, need to determine what is the impact of making these changes; i.e., is it an online transparent change, online change with a reboot (outage),  or destructive change?  For example, how is a AFA OS patch handled or how is SSD firmware changes handled?

b. Scalability –  What I mean here is really AFA expansion without disruption.  Ask whether you can add another array, another set of controllers, etc, without having to export the array data contents, add in new array, and load back data. It should be mainframe class scale.

c. Storage Array simplicity – How usable are the GUI tools to manage the operational array tasks; e.g.,  create volumes, measure performance, effective-ness of Data Services, and alert notication on failing components

3. Features (Data Services) – By now most AFA will incorporate snapshots, replication, compression, and of course de-duplication. But the real question is what is impact when using these services concurrently, what about selectively using features (by LUN/volume), and overall performance  impact of these services.

This just get you started on the things to think through !!!