Part Deux – Diggin’ into ODA ASM Add Disk to DiskGroup operation

This is part 2 of the storage expansion of ODA.   If you remember from Part1, we added a whole disk shelf and we walked thru the ASM alert.log to show how the storage gets added to the diskgroup.  In this blog post, I’m just going to illustrate the process flow of the disk addition as it looks thru the eyes of the oakd.log.  But rather than show the entire 20 disks being added (which would be voluminous), I’m going to describe the  addition of a specific disk, Slot 14.

When the disk is physically added into the slot (after taking out the filler), this disk insertion will invoke a series of backend automated scripts/tooling that will start with the OS creating a disk device entry via the event handler and end with the disk being added to ASM diskgroup.

The key point of this blog and discussion is to describe the entire end-2-end automation process of simply adding a new disk to an ODA engineered system (“dare I say push button approach”).

BTW, Still LOVE the references to COMET in the code 🙂

We start with the look inside oakd.log, where majority of the action is.

In this first section from the oak log, once the disk insertion is recognized,  oakd describes the disk characteristics; including capacity , hba port and state.  Note, since multipathing is enabled we will see 2paths, and thus 2 disk names for the same [root] slot device name :

2019-02-05 10:55:35.510: [   STMHW][710730400] Sha::Inserting OSDevName /dev/sdr for slot 14.  <— SDR

2019-02-05 10:55:35.510: [   STMHW][710730400] Sha::Inserting OSDevName /dev/sdao for slot 14. <— SDAO

2019-02-05 10:55:35.510: [   STMHW][710730400] Physical Disk [14] Info:  <--Next set includes physical disk info

2019-02-05 10:55:35.510: [   STMHW][710730400] Slot Num    = 14

2019-02-05 10:55:35.510: [   STMHW][710730400] Col  Num    = 2

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevNames  = |/dev/sdao||/dev/sdr|

2019-02-05 10:55:35.510: [   STMHW][710730400] Serial Num  = 1839J5XJ9X

2019-02-05 10:55:35.510: [   STMHW][710730400] Disk Type   = SSD  <--Well we know its an SSD

2019-02-05 10:55:35.510: [   STMHW][710730400] Expander    = 0 : 508002000231a17e

2019-02-05 10:55:35.510: [   STMHW][710730400] scsi-id     = 5000cca0a101ac54

2019-02-05 10:55:35.510: [   STMHW][710730400] sectors     = 781404246

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDisk[14] Info:  <--This next listing details the device info

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevName: /dev/sdr, Id = 14, Slot = 14, Capacity = 3200631791616: 3200gb, Type = SSD, hba port = 14 State = State: GOOD, expWwn = 5080020002311fbe, scsiId = 5000cca0a101ac54, Ctrlr = 0

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDisk[38] Info:   <--Since we have multipathing, we will get same info for /dev/sdao

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevName: /dev/sdao, Id = 38, Slot = 14, Capacity = 3200631791616: 3200gb, Type = SSD, hba port = 14 State = State: GOOD, expWwn = 508002000231a17e, scsiId = 5000cca0a101ac54, Ctrlr = 1

This section from the oak log, describes the disk details from  PDiskAdapter.scr action script and FishWrap . Note the Autodiscovery hint, as the disk is partitioned for the different diskgroups:

2019-02-05 10:55:35.946: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14 2019-02-05 10:55:35.946: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14 2019-02-05 10:55:35.946: [ ADAPTER][150968064]{1:11302:2} Running predictive failure check for: /dev/sdao 2019-02-05 10:55:35.946: [    SCSI][150968064]{1:11302:2} SCSI Inquiry Command response for /dev/sdao 2019-02-05 10:55:35.946: [   OAKFW][167753472]{1:11302:2} [ActionScript] = /opt/oracle/oak/adapters/PDiskAdapter.scr 2019-02-05 10:55:35.946: [    SCSI][150968064]{1:11302:2} Vendor = HGST     Product = HBCAC2DH2SUN3.2T Revision = A170 2019-02-05 10:55:35.946: [   OAKFW][167753472]{1:11302:2} [ActionTimeout] = 1500 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ActivePath] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AgentFile] = %COMET_MS_HOME%/bin/%TYPE_NAME% 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AsmDiskList] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AutoDiscovery] = 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AutoDiscoveryHint] = |data:80:SSD||reco:20:SSD||redo:100:SSD| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [CheckInterval] = 600 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ColNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DiskId] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DiskType] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Enabled] = 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ExpNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [MultiPathList] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Name] = PDType 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [NewPartAddr] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [OSUserType] = |userType:Multiuser| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PlatformName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PrevUsrDevName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SectorSize] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SerialNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Size] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SlotNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [TotalSectors] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [UsrDevName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [gid] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [mode] = 660 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [uid] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DependListOpr] = add 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Dependency] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [IState] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Initialized] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [IsConfigDependency] = false 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [MonitorFlag] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Name] = ResourceDef 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PrevState] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [State] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [StateChangeTs] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [StateDetails] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [TypeName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Added new resource : e0_pd_11 to the agfw 2019-02-05 10:55:35.947: [   OAKFW][167753472][F-ALGO]{1:11302:2} Resource name : e0_pd_11, state : 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} PE invalidating the data model 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Evaluating Add Resource for e0_pd_11 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Executing plan size: 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} PE: Sending message to agent : RESOURCE_VALIDATE[e0_pd_11] ID 4361:96 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Engine received the message : RESOURCE_VALIDATE[e0_pd_09] ID 4361:90 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Preparing VALIDATE command for : e0_pd_09 2019-02-05 10:55:35.948: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14 2019-02-05 10:55:35.948: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14 2019-02-05 10:55:35.948: [ ADAPTER][150968064]{1:11302:2} Creating resource for PD: SSD_E0_S14_2701241428 2019-02-05 10:55:35.948: [ ADAPTER][150968064]{1:11302:2} partName datapctStr  80 diskType =SSD This section from the oak log, describes the disk validation 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] print_args called with argument : validate 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Arguments passed to PDiskAdapter: 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ResName = e0_pd_14 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskId = 35000cca0a101ac54 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DevName = SSD_E0_S14_2701241428 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] MultiPaths = /dev/sdao /dev/sdr 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ActivePath = /dev/sdao 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskType = SSD 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Expander = 0 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Size = 3200631791616 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Sectors = 781404246 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ExpColNum = 2 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] NewPartAddr = 0 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskSerial# = 1839J5XJ9X 2019-02-05 10:55:36.023: [        ][4085245696]{1:11302:2} [validate] [Tue Feb 5 10:55:35 EST 2019] Action script '/opt/oracle/oak/adapters/PDiskAdapter.scr' for resource [e0_pd_15] called for action validate

This section from the oak log, we see the Linux kernel changes once the device entry is created; eg, IO scheduler, queue  depth, property values

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo deadline > /sys/block/sdao/queue/scheduler;echo 4096 > /sys/block/sdao/queue/nr_requests;echo 128 > /sys/block/sdao/queue/read_ahead_kb;

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo deadline > /sys/block/sdr/queue/scheduler;echo 4096 > /sys/block/sdr/queue/nr_requests;echo 128 > /sys/block/sdr/queue/read_ahead_kb;

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 64 > /sys/block/sdao/device/queue_depth

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 64 > /sys/block/sdr/device/queue_depth

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 30 > /sys/block/sdao/device/timeout

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 30 > /sys/block/sdr/device/timeout

2019-02-05 10:55:36.166: [   OAKFW][4177499904]{1:11302:2} Command : validate for: e0_pd_14 completed with status: SUCCESS

2019-02-05 10:55:36.166: [   OAKFW][167753472][F-ALGO]{1:11302:2} Engine received reply for command : validate for: e0_pd_14

2019-02-05 10:55:36.166: [   OAKFW][167753472]{1:11302:2} PE: Received last reply for : RESOURCE_VALIDATE[e0_pd_14] ID 4361:107

This section from the oak log, validates the state and complete insertion

2019-02-05 10:55:51.997: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:51.997: [        ][4177499904]{1:11302:2} [check] Validating disk header for : SSD_E0_S14_2701241428

2019-02-05 10:55:51.997: [ ADAPTER][4177499904]{1:11302:2} Succefully opened the device: /dev/sdao

2019-02-05 10:55:51.997: [ ADAPTER][4177499904]{1:11302:2} Diskheader.Read: devName = /dev/sdao, master_inc = 0, m_slave_inc = 0, disk_status = 0 disk_inc = 0, slot_num= 0, serial num =  chassis snum =   part_loaded_cnt=0

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} State has been changed for: /dev/sdr Old State: GOOD, New State: INSERTED

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} State has been changed for: /dev/sdao Old State: GOOD, New State: INSERTED

2019-02-05 10:55:52.608: [        ][4177499904]{1:11302:2} [check] Found the disk in uninitialized state.

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:52.608: [        ][4177499904]{1:11302:2} [check] Running ssd wear level check for: /dev/sdao

2019-02-05 10:55:52.609: [    SCSI][4177499904]{1:11302:2} SSD Media used endurance indicator: 0%

2019-02-05 10:55:52.609: [   STMHW][4177499904]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14

2019-02-05 10:55:52.609: [   STMHW][4177499904]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14

2019-02-05 10:55:52.609: [        ][4177499904]{1:11302:2} [check] Disk State: 1,  Label: NewDiskInserted

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} getState : 1

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} State has been changed for: /dev/sdr Old State: INSERTED, New State: GOOD

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} State has been changed for: /dev/sdao Old State: INSERTED, New State: GOOD

2019-02-05 10:55:53.856: [        ][4085245696]{1:11302:2} [check] Validating disk header for : SSD_E0_S14_2701241428

2019-02-05 10:55:53.856: [ ADAPTER][4085245696]{1:11302:2} Succefully opened the device: /dev/sdao

2019-02-05 10:55:53.856: [ ADAPTER][4085245696]{1:11302:2} Diskheader.Read: devName = /dev/sdao, master_inc = 0, m_slave_inc = 0, disk_status = 0 disk_inc = 0, slot_num= 0, serial num =  chassis snum =   part_loaded_cnt=0

Finally if you jump over to the ASM alert.log you’ll see that the disks get added to the respective ASM diskgroup:

SQL> alter diskgroup /*+ _OAK_AsmCookie */ data add disk
'AFD:SSD_E0_S08_2701228196P1' name SSD_E0_S08_2701228196p1,
'AFD:SSD_E0_S07_2701240428P1' name SSD_E0_S07_2701240428p1,
'AFD:SSD_E0_S14_2701241428P1' name SSD_E0_S14_2701241428p1,  <-- here's our dude !!
'AFD:SSD_E0_S17_2701244644P1' name SSD_E0_S17_2701244644p1,
'AFD:SSD_E0_S19_2701246564P1' name SSD_E0_S19_2701246564p1,
'AFD:SSD_E0_S09_2701252584P1' name SSD_E0_S09_2701252584p1,
'AFD:SSD_E0_S13_2701254148P1' name SSD_E0_S13_2701254148p1,
'AFD:SSD_E0_S16_2701255896P1' name SSD_E0_S16_2701255896p1,
'AFD:SSD_E0_S05_2701256380P1' name SSD_E0_S05_2701256380p1,
'AFD:SSD_E0_S11_2701257468P1' name SSD_E0_S11_2701257468p1,
'AFD:SSD_E0_S15_2701258144P1' name SSD_E0_S15_2701258144p1,
'AFD:SSD_E0_S06_2701258544P1' name SSD_E0_S06_2701258544p1,
'AFD:SSD_E0_S12_2701258588P1' name SSD_E0_S12_2701258588p1,
'AFD:SSD_E0_S10_2701259504P1' name SSD_E0_S10_2701259504p1,
'AFD:SSD_E0_S18_2701260436P1' name SSD_E0_S18_2701260436p1
kfdp_query: callcnt 338 grp 1 (DATA)
kfdp_query: callcnt 339 grp 1 (DATA)
NOTE: Assigning number (1,5) to disk (AFD:SSD_E0_S08_2701228196P1)
Disk 0x777d6080 (1:5:AFD:SSD_E0_S08_2701228196P1) is being named (SSD_E0_S08_2701228196P1)
NOTE: Assigning number (1,6) to disk (AFD:SSD_E0_S07_2701240428P1)
Disk 0x777d5708 (1:6:AFD:SSD_E0_S07_2701240428P1) is being named (SSD_E0_S07_2701240428P1)
NOTE: Assigning number (1,7) to disk (AFD:SSD_E0_S14_2701241428P1)
Disk 0x777d9950 (1:7:AFD:SSD_E0_S14_2701241428P1) is being named (SSD_E0_S14_2701241428P1)
NOTE: Assigning number (1,8) to disk (AFD:SSD_E0_S17_2701244644P1)
Disk 0x777db5b8 (1:8:AFD:SSD_E0_S17_2701244644P1) is being named (SSD_E0_S17_2701244644P1)
NOTE: Assigning number (1,9) to disk (AFD:SSD_E0_S19_2701246564P1)
Disk 0x777dc8a8 (1:9:AFD:SSD_E0_S19_2701246564P1) is being named (SSD_E0_S19_2701246564P1)
NOTE: Assigning number (1,10) to disk (AFD:SSD_E0_S09_2701252584P1)
Disk 0x777d69f8 (1:10:AFD:SSD_E0_S09_2701252584P1) is being named (SSD_E0_S09_2701252584P1)
NOTE: Assigning number (1,11) to disk (AFD:SSD_E0_S13_2701254148P1)
Disk 0x777d8fd8 (1:11:AFD:SSD_E0_S13_2701254148P1) is being named (SSD_E0_S13_2701254148P1)
NOTE: Assigning number (1,12) to disk (AFD:SSD_E0_S16_2701255896P1)
Disk 0x777dac40 (1:12:AFD:SSD_E0_S16_2701255896P1) is being named (SSD_E0_S16_2701255896P1)
NOTE: Assigning number (1,13) to disk (AFD:SSD_E0_S05_2701256380P1)
Disk 0x777d4418 (1:13:AFD:SSD_E0_S05_2701256380P1) is being named (SSD_E0_S05_2701256380P1)
NOTE: Assigning number (1,14) to disk (AFD:SSD_E0_S11_2701257468P1)
Disk 0x777d7ce8 (1:14:AFD:SSD_E0_S11_2701257468P1) is being named (SSD_E0_S11_2701257468P1)
NOTE: Assigning number (1,15) to disk (AFD:SSD_E0_S15_2701258144P1)
Disk 0x777da2c8 (1:15:AFD:SSD_E0_S15_2701258144P1) is being named (SSD_E0_S15_2701258144P1)
NOTE: Assigning number (1,16) to disk (AFD:SSD_E0_S06_2701258544P1)
Disk 0x777d4d90 (1:16:AFD:SSD_E0_S06_2701258544P1) is being named (SSD_E0_S06_2701258544P1)
NOTE: Assigning number (1,17) to disk (AFD:SSD_E0_S12_2701258588P1)
Disk 0x777d8660 (1:17:AFD:SSD_E0_S12_2701258588P1) is being named (SSD_E0_S12_2701258588P1)
NOTE: Assigning number (1,18) to disk (AFD:SSD_E0_S10_2701259504P1)
Disk 0x777d7370 (1:18:AFD:SSD_E0_S10_2701259504P1) is being named (SSD_E0_S10_2701259504P1)
NOTE: Assigning number (1,19) to disk (AFD:SSD_E0_S18_2701260436P1)
Disk 0x777dbf30 (1:19:AFD:SSD_E0_S18_2701260436P1) is being named (SSD_E0_S18_2701260436P1)
2019-02-15 16:56:32.926*:kgfm.c@547: kgfmInitialize

Diggin’ into ASM Add Disk to DiskGroup operation

Recently we had to add storage to our X7-HA ODA.  This storage add includes a multi-step process, which is generally handled by the ODA OAK automation.  We simply added the disks in the slot, and the oakd dameon and workflow takes care of the device management.  The key things the oakd automation does is

  • Instantiates the disk device into the OS
  • Build partition tables
  • Create devmapper device names
  • updates the asmappl.config (***DO NOT TOUCH or EDIT THIS FILE..or apocalyptic things will HAPPEN **)
  • Generate a ASM disk add commands to added the disks to DATA and RECO diskgroups in the 80-20% pre-defined configuration.

This blog will cover the disk part and walk you through trace file.  I’ll blog about the automation stuff later

If you peak at the oakd.log or look in the ASM alert.log, you’ll see the actual command that gets executed.I have shown only the DATA dg disk add operation.    The RECO operation is the same but uses disk partitin P2.

My comments are inline:

SQL> ALTER DISKGROUP /*+ _OAK_AsmCookie */ DATA ADD DISK   <- This OAK ASM Cookie invokes ODA specific backend operations
‘AFD:SSD_E0_S05_2701246684P1’ NAME SSD_E0_S05_2701246684P1,   <– This is the list of disks that will be added to ODA
‘AFD:SSD_E0_S06_2701246400P1’ NAME SSD_E0_S06_2701246400P1,
‘AFD:SSD_E0_S07_2701243880P1’ NAME SSD_E0_S07_2701243880P1,
‘AFD:SSD_E0_S08_2701246408P1’ NAME SSD_E0_S08_2701246408P1,
‘AFD:SSD_E0_S09_2701257952P1’ NAME SSD_E0_S09_2701257952P1,
‘AFD:SSD_E0_S10_2701255368P1’ NAME SSD_E0_S10_2701255368P1,
‘AFD:SSD_E0_S11_2701247132P1’ NAME SSD_E0_S11_2701247132P1,
‘AFD:SSD_E0_S12_2701246568P1’ NAME SSD_E0_S12_2701246568P1,
‘AFD:SSD_E0_S13_2701251260P1’ NAME SSD_E0_S13_2701251260P1,
‘AFD:SSD_E0_S14_2701259824P1’ NAME SSD_E0_S14_2701259824P1,
‘AFD:SSD_E0_S15_2701255760P1’ NAME SSD_E0_S15_2701255760P1,
‘AFD:SSD_E0_S16_2701229772P1’ NAME SSD_E0_S16_2701229772P1,
‘AFD:SSD_E0_S17_2701232460P1’ NAME SSD_E0_S17_2701232460P1,
‘AFD:SSD_E0_S18_2701257420P1’ NAME SSD_E0_S18_2701257420P1,
‘AFD:SSD_E0_S19_2701253140P1’ NAME SSD_E0_S19_2701253140P1
kfdp_query: callcnt 78 grp 1 (DATA)                                  <– Its being added to DATA
kfdp_query: callcnt 79 grp 1 (DATA)
NOTE: Assigning number (1,5) to disk (AFD:SSD_E0_S05_2701246684P1) . <– Each disk is assigned a disk#
Disk 0x766f50e0 (1:5:AFD:SSD_E0_S05_2701246684P1) is being named (SSD_E0_S05_2701246684P1)
NOTE: Assigning number (1,6) to disk (AFD:SSD_E0_S06_2701246400P1)
2019-02-21 14:37:14.762*:kgfm.c@547: kgfmInitialize          <– Here the disks get initialized, using an array 
2019-02-21 14:37:14.763*:kgf.c@926: kgfArray_construct 0x7f2e648af208 len=0 nsegs=0
2019-02-21 14:37:14.763*:kgf.c@926: kgfArray_construct 0x7f2e648b75a0 len=0 nsegs=0
2019-02-21 14:37:14.763*:kgf.c@926: kgfArray_construct 0x7f2e69a676e8 len=0 nsegs=0
…<deleted repeated lines>

<– kgfmReadOak reads in OAK configuration information into memory structures

2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: max_disk_count is 100
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [max_disk_count] = [100]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=0->1 nsegs=0->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: appliance_name is ODA
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [appliance_name] = [ODA]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=1->2 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: diskstring is AFD:*
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [diskstring] = [AFD:*]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=2->3 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: file_version is 2
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [file_version] = [2]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=3->4 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: oda_version is 3
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [oda_version] = [3]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=4->5 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: jbod_count is 1
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [jbod_count] = [1]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=5->6 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: jbod_slot_count is 24 <– all 24 slots in the ODA are filled
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [jbod_slot_count] = [24]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=6->7 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: data_slot_count is 20 . <– 20 disks for DATA DG
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [data_slot_count] = [20]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=7->8 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: reco_slot_count is 20
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [reco_slot_count] = [20]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=8->9 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: redo_slot_count is 4       <– 4 disks for REDO DG
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [redo_slot_count] = [4]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=9->10 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: max_missing is 0
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [max_missing] = [0]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=10->11 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: min_partners is 2
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [min_partners] = [2]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=11->12 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: agent_sql_identifier is /*+ _OAK_AsmCookie
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [agent_sql_identifier] = [/*+ _OAK_AsmCookie ]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=12->13 nsegs=1->1
2019-02-21 14:37:14.763*:kgfm.c@1773: kgfmReadOak: rdbms_compatibility is 12.1.0.2
2019-02-21 14:37:14.763*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [rdbms_compatibility] = [12.1.0.2]
2019-02-21 14:37:14.763*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=13->14 nsegs=1->1
2019-02-21 14:37:14.764*:kgfm.c@1773: kgfmReadOak: asm_compatibility is 12.2.0.1
2019-02-21 14:37:14.764*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [asm_compatibility] = [12.2.0.1]
2019-02-21 14:37:14.764*:kgf.c@1025: kgfArray_grow 0x7f2e69a77e90 len=14->15 nsegs=1->1
2019-02-21 14:37:14.764*:kgfm.c@1773: kgfmReadOak: _asm_hbeatiowait is 100
2019-02-21 14:37:14.764*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [_asm_hbeatiowait] = [100]

This next section starts the disk association and partnership mapping

[

2019-02-21 14:37:14.769*:kgfm.c@2857: kgfmAddDisk: enc=0 slot=23 part=1 path=AFD:SSD_E0_S23_2181135920P1
2019-02-21 14:37:14.769*:kgf.c@1025: kgfArray_grow 0x7f2e648af208 len=0->1 nsegs=0->1
2019-02-21 14:37:14.769*:kgf.c@1025: kgfArray_grow 0x7f2e69a63530 len=0->24 nsegs=0->1
2019-02-21 14:37:14.769*:kgfm.c@1968: kgfmReadOak: disk 23 partners [ 2019-02-21 14:37:14.769*:kgfm.c@1970: 22 2019-02-21 14:37:14.769*:kgfm.c@1970: 21 2019-02-21 14:37:14.769*:kgfm.c@1970: 20 2019-02-21 14:37:14.769*:kgfm.c@1971: ]

[

2019-02-21 14:37:14.770*:kgfm.c@2857: kgfmAddDisk: enc=0 slot=22 part=1 path=AFD:SSD_E0_S22_2181131148P1
2019-02-21 14:37:14.770*:kgf.c@1025: kgfArray_grow 0x7f2e648af208 len=1->2 nsegs=1->1
2019-02-21 14:37:14.770*:kgfm.c@1968: kgfmReadOak: disk 22 partners [ 2019-02-21 14:37:14.770*:kgfm.c@1970: 23 2019-02-21 14:37:14.770*:kgfm.c@1970: 21 2019-02-21 14:37:14.770*:kgfm.c@1970: 20 2019-02-21 14:37:14.770*:kgfm.c@1971: ]

This repeated for every disk in the ODA

….

Define DiskGroup Attributes –

2019-02-21 14:37:14.774*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [agent_sql_identifier] = [/*+ _OAK_AsmCookie ]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [oda_version] = [1]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [jbod_count] = [1]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [jbod_slot_count] = [24]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [data_slot_count] = [20]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [reco_slot_count] = [20]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [redo_slot_count] = [4]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [diskstring] = [(null)]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [appliance_name] = [ODA]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [agent_sql_identifier] = [/* ASM Appliance Agent */]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [asm_compatibility] = [11.2.0.3]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [rdbms_compatibility] = [11.2.0.2]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [max_disk_count] = [100]
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 0 [_asm_hbeatiowait] = [0]
2019-02-21 14:37:14.775*:kgfm.c@2643: kgfmStaticCardinatliy: odaver 3 njbods 1 tmpl 1 card0 3 card1 3
2019-02-21 14:37:14.775*:kgfm.c@2643: kgfmStaticCardinatliy: odaver 3 njbods 1 tmpl 1 card0 5 card1 5
2019-02-21 14:37:14.775*:kgfm.c@2954: kgfmAddAttribute: tmpl 1 [max_missing] = [0]

NOTE: running client discovery for group 1 (reqid:7106731798842955164)

<– AT this point the disks are added and  client (database) re-discovers the diskgroup

*** 2019-02-21T14:37:20.381245-05:00
NOTE: running client discovery for group 1 (reqid:7106731798842926578)

*** 2019-02-21T14:37:21.421653-05:00
kfdp_updateReconf(): callcnt 80 grp 1 scope 0x204 .  <–Once you see kfdp_updateReconf …its DONE!
NOTE: group 1 PST updated.
PST verChk [reconf]: id=46860409, grp=1, requested=9 at 02/21/2019 14:37:21
PST verChk [reconf]: id=46860409 grp=1 completed at 02/21/2019 14:37:21

Expanding /u01 filesystem on Oracle Database Appliance

vna

Recently I had to expand the /u01 on our ODA because we were in the process of consolidating several new Oracle database systems, each with their own Oracle Homes (don’t ask….its what the lines of business wanted).

Although lot of this is just a simple Linux LVM stuff…I feel it warrants a blog entry….since folks view an ODA as different beast 🙂

[root@vna-oda1-0 ~]# pvdisplay

  — Physical volume —

  PV Name               /dev/md1

  VG Name               VolGroupSys

  PV Size               446.03 GiB / not usable 29.00 MiB

  Allocatable           yes 

  PE Size               32.00 MiB

  Total PE              14272

  Free PE               7424

  Allocated PE          6848

  PV UUID               Kw1O64-n9j0-4OW7-yUCZ-8FHc-HKug-mOPyP4

[root@vna-oda1-0 ~]# df -m /u01

Filesystem           1M-blocks  Used Available Use% Mounted on

/dev/mapper/VolGroupSys-LogVolU01

                        100666 64564     30982  68% /u01

[root@vna-oda1-0 ~]# lvdisplay /dev/mapper/VolGroupSys-LogVolU01

  — Logical volume —

  LV Path                /dev/VolGroupSys/LogVolU01

  LV Name                LogVolU01

  VG Name                VolGroupSys

  LV UUID                UGVuZY-Xia1-u0Th-TaZ2-JF9Q-FH01-jDVfOn

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2018-05-23 13:14:22 -0400

  LV Status              available

  # open                 1

  LV Size                100.00 GiB

  Current LE             3200

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  – currently set to     256

  Block device           249:40

[root@vna-oda1-0 ~]# df -kh 

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/VolGroupSys-LogVolRoot

                       30G  5.8G   23G  21% /

tmpfs                 378G  1.3G  376G   1% /dev/shm

/dev/md0              477M  115M  338M  26% /boot

/dev/sda1             500M  304K  500M   1% /boot/efi

/dev/mapper/VolGroupSys-LogVolOpt

                       59G   36G   21G  64% /opt

/dev/mapper/VolGroupSys-LogVolU01

                       99G   64G   31G  68% /u01

lvextend –size +50G /dev/VolGroupSys/LogVolU01

  Size of logical volume VolGroupSys/LogVolU01 changed from 100.00 GiB (3200 extents) to 150.00 GiB (4800 extents).

  Logical volume LogVolU01 successfully resized.

[root@vna-oda0-0 oak]# 

[root@vna-oda0-0 oak]# 

[root@vna-oda0-0 oak]# lvdisplay /dev/mapper/VolGroupSys-LogVolU01

  — Logical volume —

  LV Path                /dev/VolGroupSys/LogVolU01

  LV Name                LogVolU01

  VG Name                VolGroupSys

  LV UUID                g2CMK8-kERY-43uu-p0ZD-V5In-otaL-S2zSdX

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2018-05-21 11:04:34 -0400

  LV Status              available

  # open                 1

  LV Size                150.00 GiB

  Current LE             4800

  Segments               2

  Allocation             inherit

  Read ahead sectors     auto

  – currently set to     256

  Block device           249:25

[root@vna-oda0-0 oak]# resize2fs /dev/VolGroupSys/LogVolU01

resize2fs 1.43-WIP (20-Jun-2013)

Filesystem at /dev/VolGroupSys/LogVolU01 is mounted on /u01; on-line resizing required

old_desc_blocks = 7, new_desc_blocks = 10

Performing an on-line resize of /dev/VolGroupSys/LogVolU01 to 39321600 (4k) blocks.

The filesystem on /dev/VolGroupSys/LogVolU01 is now 39321600 blocks long.

 

ACFS Snapshot – A Walk Through

This blog explores some of the new 12.2 ACFS features.  We will walk through the ACFS snapshot process flow:

 

[oracle@oracle122 log]$ acfsutil snap info /acfsmounts/acfsdata/

snapshot name:               just_before_load

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_before_load

RO snapshot or RW snapshot:  RO

parent name:                 /acfsmounts/acfsdata/

snapshot creation time:      Wed Mar 22 20:36:09 2017

storage added to snapshot:   8650752   (   8.25 MB )

number of snapshots:  1

snapshot space usage: 8704000  (   8.30 MB )

[oracle@oracle122 log]$ du -sk .

18292  .


[oracle@oracle122 log]$ acfsutil snap create -w -p just_before_load just_about_batch_upload /acfsmounts/acfsdata/

acfsutil snap create: Snapshot operation is complete.

[oracle@oracle122 log]$ acfsutil snap info /acfsmounts/acfsdata

snapshot name:               just_before_load

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_before_load

RO snapshot or RW snapshot:  RO

parent name:                 /acfsmounts/acfsdata

snapshot creation time:      Wed Mar 22 20:36:09 2017

storage added to snapshot:   8650752   (   8.25 MB )

snapshot name:               just_about_batch_upload

snapshot location:           /acfsmounts/acfsdata/.ACFS/snaps/just_about_batch_upload

RO snapshot or RW snapshot:  RW

parent name:                 just_before_load

snapshot creation time:      Wed Mar 22 20:42:56 2017

storage added to snapshot:   8650752   (   8.25 MB )

root@oracle122 ~]# acfsutil compress on /acfsmounts/acfsdata/log/wtf

acfsutil compress on: ACFS-05518: /acfsmounts/acfsdata/log/wtf is not an ACFS mount point

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/wtf

The file /acfsmounts/acfsdata/log/wtf is not compressed.

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/nitin

nitin             nitin_compressed 

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/nitin_compressed

Compression Unit size: 32768

Disk storage used:   (  60.00 KB )

Disk storage saved:  (   7.75 MB )

Storage used is 1% of what the uncompressed file would use.

File is not scheduled for asynchronous compression.

oracle@oracle122 log]$ ls -l lastlog*

-rw-r--r--. 1 oracle oracle 145708 Mar 22 12:07 lastlog

-rw-r--r--. 1 oracle oracle 145708 Mar 23 05:49 lastlog_compressed

[oracle@oracle122 log]$

[root@oracle122 ~]# acfsutil compress info /acfsmounts/acfsdata/log/lastlog_compressed

Compression Unit size: 32768

Disk storage used:   (  32.00 KB )

Disk storage saved:  ( 110.29 KB )

Storage used is 22% of what the uncompressed file would use.

File is not scheduled for asynchronous compression.

If you are curious about the other snapshop options... then look below !!

[oracle@oracle122 log]$ acfsutil snap -h

 Command Subcmd    Arguments

--------------- --------- ------------------------------------------

snap create    [-w|-r|-c] [-p parent_snap_name] <snap_name> <mountpoint>

snap create    [-w]                      - create a writeable snapshot

snap create    [-r]                      - create a read-only snapshot

snap create                                This is the default behavior

snap create    [-c]                      - create a writable snapshot of a

snap create                                snap duplicate target

snap create    [-p parent_snap_name]     - create a snapshot from a snapshot

snap delete    <snap_name> <mountpoint> - delete a file system snapshot

snap rename    <old_snap_name> <new_snap_name> <mountpoint>

snap rename                             - rename a file system snapshot

snap convert   -w|-r <snap_name> <mountpoint>

snap convert   -w                       - convert to a writeable snapshot

snap convert   -r                       - convert to a read-only snapshot

snap info      [-t] [<snap_name>] <mountpoint>

snap info                    - get information about snapshots

snap info      [-t]          - display family tree starting at next name given

snap info      [<snap_name>] - snapshot name

snap info      <mountpoint>  - mount point

snap remaster  {<snap_name> | -c} <volume_path>

snap remaster                           - make the specified snapshot

snap remaster                             the master file system.  The

snap remaster                             current master and all other

snap remaster                             snapshots will be deleted.

snap remaster                             WARNING: This operation cannot

snap remaster                             be reversed.  Admin privileges

snap remaster                             are required.  The file system

snap remaster                             must be unmounted on all nodes.

snap remaster                             The file system must not have

snap remaster                             Replication running.

snap remaster  [-c]                     - Continue an interrupted snapshot

snap remaster                             remastering.  Use the -c option,

snap remaster                             instead of the <snap_name>, to

 snap remaster                             complete an interrupted

snap remaster                             snapshot remastering.

snap remaster  [-f]                     - Force the snapshot remastering.

 snap duplicate apply     [-b] [-d {0..6}] [<snap_name>] <mountpoint>

 snap duplicate apply     -b                       - maintain backup snapshot

 snap duplicate apply     [-d {0..6}]              - set trace level for debugging

 snap duplicate apply     [<snap_name>]            - target snapshot

 snap duplicate apply     <mountpoint>             - mount point for target site

 snap duplicate create    [-r] [-i oldsnapname] [-d {0..6}] <newsnapname> <mountpoint>

 snap duplicate create    [-r]              - restart of data stream

 snap duplicate create    [-p parentsnap]   - parent snap for base site

 snap duplicate create    [-i oldsnapname]  - old snapshot name

 snap duplicate create    [-d {0..6}]       - set trace level for debugging

 snap duplicate create    <newsnapname>     - new snapshot name

 snap duplicate create    <mountpoint>      - mount point for base site

 snap quota     [[-|+]nnn[K|M|G|T|P]]<snap_name> <mountpoint>

 snap quota                              - set quota for snapshot

 

Grid Infrastructure and RAC 12.2 New Features – a Recap

The following list illustrates the new 12.2 Oracle RAC and Grid Infrastructure. This is a personal list which “I believe to be the most interesting.” I apologize to the RAC Dev team if I left out any features.

Streamlined Grid Infrastructure Installation

12.2 Grid Infrastructure software is available as an image file for download and installation. The key objective of this feature was to enable a simpler and quicker installation of Grid Infrastructure. Administrators simply prep the system by creating a new Grid home directory, appropriate users, permissions and kernel settings. Once completed, Admins extract the image file into the newly-created Grid home, and execute the gridsetup.sh script to invoke setup wizard to register the Oracle Grid Infrastructure stack with Oracle inventory. This installation approach can be used for Oracle Grid Infrastructure for Cluster and Standalone Servers configurations. This new software installation will improve large scale deployment automation as well as deployment of customized images, Patch Set Updates (PSUs) and patches.

Real Application Clusters Reader Nodes

In 12.2, Oracle extended the capability of Flex Clusters by introducing Reader nodes. Reader nodes are Leaf nodes (in a Flex Cluster) that run read-only RAC database instances. The Reader nodes are not affected by RAC reconfigurations, caused by node evictions or other cluster node membership changes, as long as the Hub Node, to which it is connected, is part of the cluster. Reader Nodes allows users to create huge reader farms (up to 64 reader nodes per Hub Node), thus enabling massive parallel processing. In this architecture, updates to the read/write instances (running on Hub nodes) are immediately propagated to the read-only instances on the Leaf Nodes, where they can be used for online reporting or instantaneous queries. Users can create services to direct queries to read-only instances running on reader nodes.

Service-Oriented Buffer Cache Access

RAC Services, which are used to allocate and distribute workloads across RAC instances, are the cornerstone of RAC workload management. There is a strong relationship between a RAC Service, a specific workload, and the database object it accesses. With 12.2 RAC, a Service- oriented buffer cache feature was introduced to improve scale and performance, by optimizing instance and node-buffer cache affinity. This is done by caching or pre-warming instances with data blocks for objects accessed where a service is expected to run.

Twelve Days of 12.2

Server Weight-Based Node Eviction

When there is a spilt-brain, or when a node eviction decision must be made, traditionally the decision was based on age, or duration of the nodes, in the cluster; i.e., nodes with a large uptime in the cluster will survive. In 12.2 RAC, Server weight-based node eviction uses a more intelligent, tie-breaker mechanism to evict a particular node or a group of nodes from a cluster. The Server Weight-based node eviction feature introspects the current load on those servers as part of the decision. Two principle mechanisms, a system inherent automatic mechanism and a user input-based mechanism is used to offer and provide guidance.

Load-Aware Resource Placement

Load-aware resource placement, prevents overloading a server with more database instances than the server is capable of running. The metrics used to determine whether an application can be started on a given server, is based on the expected resource consumption of the application, as well as the capacity of the server in terms of CPU and memory. Administrators can define database resources such as CPU (cpu_count) and memory (memory_target) to Clusterware. Clusterware uses this information to place the database instances only on servers that meet a sufficient number of CPUs, amount of memory or both.

srvctl modify database -db testdb -cpucount 8 -memorytarget 64g

Hang Manager

The Hang Manager features first became available in 11gR1. In this initial version, Hang Manager evaluated and identified system hangs, then dumped the relevant information, “wait for graph,” into a trace file. In 12.2, Hang Manager takes action and attempts to resolve the system hang. An ORA-32701 error message is logged in the alert log to reflect the hang resolution. Hang Manager also runs in both single-instance and Oracle RAC database instances. With Hang Manager, it is constantly aware of processes running in reader nodes instances, and checks whether any of these processes are blocking progress on Hub Nodes to take action, if possible.

Separation of Duty for Administering RAC Clusters

12.2 RAC introduces a new administrative privilege called SYSRAC. This privilege is used by the Clusterware agent, and removes the need to use SYSDBA privilege for RAC administrative tasks, thus reducing the reliance on SYSDBA on production systems. Note, SYSRAC privilege is the default mode for connecting to the database by Clusterware agent; e.g, when executing RAC utilities such as SRVCTL.

Rapid Home Provisioning of Oracle Software

Rapid Home Provisioning enables you to create clusters, provision, patch, and upgrade Oracle Grid Infrastructure and Oracle Database homes. It also provisions 11.2 Clusters, applications, and middleware using Rapid Home Provisioning.

Extended Clusters

In 12.2 GI Administrators can create an extended RAC cluster across two, or more, geographically separate sites. Note, each site will include a set of servers with its own storage. If a site fails, the other site acts as an active standby. 12.2 Extended Clusters can be built on initial installation or be converted from an existing (non-Flex ASM) cluster, using the ConvertToExtended script.

De-support of OCR and Voting Files on Shared Filesystem

In Grid Infrastructure 12.2, the placement of Oracle Clusterware files: the Oracle Cluster Registry (OCR), and the Voting Files, directly on a shared file system is desupported. Only ASM or NFS is supported. If you need to use a supported shared file system, either a Network File System, or a shared cluster file system instead of native disk devices, then you must create Oracle ASM disks on supported network file systems that you plan to use for hosting Oracle Clusterware files before installing Oracle Grid Infrastructure. You can then use the Oracle ASM disks in an Oracle ASM disk group to manage Oracle Clusterware files. If your Oracle Database files are stored on a shared file system, then you can continue to use shared file system storage for database files, instead of moving them to Oracle ASM storage.

ACFS 12.2 New Features – a Recap

Oracle Automatic Storage Management Cluster File System (ACFS) made it’s debut with Oracle 11.2. Many DBAs are not aware of the vast features that are available with ACFS. With each release and update to Oracle, significant enhancements have been made. With Oracle Database 12c Release 2, new feature/functionality was made to ACFS.

Snapshot Enhancements

In Oracle 12.2, Oracle extends ACFS snapshot functionality and further simplifies file system snapshot operations. The following are a few of the key new features with snapshots:

Admins can now, if needed, impose quotas to snapshots to limit amount of write operations that can be done on a snapshot. Quotas can be set on the snapshot level. Oracle also provides the capability to rename an existing ACFS snapshot, to allow more user-friendly names.

When we delete a snapshot with the “acfsutil snap delete snapshot mount_point” command, we can force a delete, even if there are open files.

There are several new capabilities with snapshot re-mastering and duplication. The new ACFS snapshot remaster capability allows for a snapshot in the snapshot registry to become the primary file system. ACFS snapshot duplication features are introduced. With the “acfsutil snap duplicate create” command, can be used to duplicate a snapshot from an existing snapshot, to a standby target file system.

The “apply” option to the “acfsutil snap duplicate” command, allows us to apply deltas to the target ACFS file system or snapshot. If this is the initial apply, the target file system must be empty. If the target had been applied before, then the apply process becomes an incremental update. Before the incremental update occurs, the contents of the target file system must match the content of the older snapshot, since the last incremental update. Also, the contents of the target snapshot cannot be modified while the apply is happening.

Additionally, ACFS snapshot-based replication now uses SSH protocols to transmit data streams.

4k Sectors and Metadata

When Admins create an ACFS file system, they have the option to create the file system with the 4096-byte metadata structure. When issuing the mkfs command, you can specify the metadata block size with the –i option; two valid options are 512 bytes or 4096 bytes. The 4096-byte metadata structure is made up of multiple 512-byte logical sectors.

If the COMPATIBLE.ADVM ASM Diskgroup attribute is set to 12.2 or greater, then the metadata block is 4096 bytes by default. If COMPATIBLE.ADVM attribute is set to less than 12.2, then the block size is set to 512 bytes. When the ADVM volume of the ACFS file system is set with 4K logical disk sector size, Direct I/O requests should be aligned on the 4K offset and be a multiple of 4k size for optimal performance.

Defragger

Very rarely would you need the defragmentation tool, due to the fact that ACFS algorithm is for allocation and coalesce-ment of free space. However, for those rare situations, when we can get into fragmented situations under heavy workloads or for compressed files, Oracle provides the defrag option to the acfsutil command. Now, we can issue “acfsutil defrag dir” or “acfsutil defrag file” commands for on-demand defragmentation.

ACFS will perform all defrag operations in the background. With the –r option of the “acfsutil defrag dir”command, you can recursively defrag subdirectories.

Compression Enhancements

ACFS compression can significantly reduce disk storage requirements for customers running databases on ACFS. Databases running on ACFS, must be of versions 11.2.0.4 or higher. ACFS compression can be enabled for specific ACFS file systems for database files, RMAN backup files, archivelogs, data pump extract files, and general purpose files. Oracle does not support redo log/flashback logs/control file compression.

When enabling ACFS compression for a file system, only new incoming files will be compressed. All existing files on the file system will remain un-compressed. Likewise, if you decide to uncompress a file system, Oracle will not de-compress files. Oracle will simply disable compression for newly created files.

To compress and uncompress ACFS file systems, execute the acfsutil compress on or acfsutil compress off commands. To view compression state and space consumption information, you can execute the “acfsutil compress info” command. The commands “acfsutil info fs” and “acfsutil info file” now support ACFS compression status.

At this time, databases with 2K or 4K block sizes are not supported for ACFS compression. ACFS compression is supported on Linux and AIX. ACFS is also supported to work with ACFS snapshot-based replication.

Loopback Devices

ACFS now supports loopback devices on the Linux operating system. With ACFS loopback device support, we can now take OVM images, templates, and virtual disks and present them as a block device. Files can be sparse or non-sparse. ACFS also supports Direct I/O on sparse images.

Metadata Collector

The metadata collector, copies metadata structures from an Oracle ACFS file system to a separate output file that can be ingested for analysis and diagnostics. The metadata collector reads the contents of the file system and all metadata is written out to a specified output file. The metadata collector can read the ACFS file system online without requiring an outage. Note, this tool is not a replacement for the file system checker command (fsck), but a supplement for additional diagnosis and support. Even though the metadata collector can read the file system while it is online, for best results, unmount the file system prior to metadata collection. The size of the output file, is directly correlated to the size of the file system that the collection is specified for. To collect metadata for a file system, invoke the “acfsutil meta” command.

The auto-resize feature, allows us to “autoextend” a file system if the size of the file system is about to run out of space. Just like an Oracle datafile that has the autoextend option enabled, we can now “autoextend” the ACFS file system to the size of the increment by option. With the –a option to the “acfsutil size” command, we can specify the increment by size.

We can also specify the maximum size or quota for the ACFS file system to “autoextend” to guard against a runaway space consumption. To set the maximum size for an ACFS file system, execute the “acfsutil size” command with the –x option.

Setting Round-Robin Multipathing Policy in VMware ESXi 6.0

Storage Array Type Plugins (SATP) and Path Selection Plugins (PSP) are part of the VMware APIs for Pluggable Storage Architecture (PSA). The SATP has all the knowledge of the storage array to aggregate I/Os across multiple channels and has the intelligence to send failover commands when a path has failed. The Path Selection Policy can be either “Fixed”, “Most Recently Used” or “Round Robin”.

If a VMware VM is using RDM with All Flash Arrays, then the Round Robin policy should be used. Furthermore, inside the Linux kernel (VM), the noop IO scheduler should be used. Both need to executed for proper throughput.

As a best practice, the preferred method to set Round Robin policy, is to create a rule that will allow any newly added FlashArray device, to automatically set the Round Robin PSP and an IO Operation Limit value of 1. In this blog I’ll refer to the PureStorage array for setting Round Robin policy as well as setting IO limit.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:

esxcli storage nmp satp rule add -s “VMW_SATP_ALUA” -V “PURE” -M “FlashArray” -P”VMW_PSP_RR” -O “iops=1”

This must be repeated for each ESXi host.
This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

$hosts = get-vmhost
foreach ($esx in $hosts)
{
$esxcli=get-esxcli -VMHost $esx
$esxcli.storage.nmp.satp.rule.add($null, $null, “PURE FlashArray RR IO Operation Limit
Rule”, $null, $null, $null, “FlashArray”, $null, “VMW_PSP_RR”, “iops=1”, “VMW_SATP_ALUA”,
$null, $null, “PURE”)
}

It is important to note that existing, previously presented devices will need to be either manually set to Round Robin and an I/O Operation Limit of 1 or unclaimed and reclaimed through either a reboot of the host or through a manual device reclaim process so that it can inherit the configuration set forth by the new rule. For setting a new I/O Operation Limit on an existing device, use the following procedure:

The first step is to change the particular device to use the Round Robin PSP. This must be done on every ESXi host and can be done with through the vSphere Web Client, the Pure Storage Plugin for the vSphere Web Client or via command line utilities.

Via esxcli:
esxcli storage nmp device set -d naa. –psp=VMW_PSP_RR

Note that changing the PSP using the Web Client Plugin is the preferred option as it will automatically configure Round Robin across all of the hosts. Note that this does not set the IO Operation Limit to 1. That is a command line option only, and must be done separately.

Round Robin can also be set on a per-device, per-host basis using the standard vSphere Web Client actions. The procedure to setup Round Robin policy for a Pure Storage volume. Note that this does not set the IO Operation Limit it 1 which is a command line option only—this must be done separately.

The IO Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or altered via command line utilities. The following command can check a particular device for the PSP and IO Operations Limit:

esxcli storage nmp device list -d naa.

To set a device that is pre-existing to have an IO Operation limit of one, run the following command:

esxcli storage nmp psp roundrobin deviceconfig set -d naa. -I 1 -t iops

Setting Jumbo Frames – Portrait of a Large MTU size

There cases where we need to ensure that large packet “address-ability” exists. This is needed to verify configuration for non standard packet sizes, i.e, MTU of 9000. For example if we are deploying a NAS or backup server across the network.

Setting the MTU can be done by editing the configuration script for the relevant interface in /etc/sysconfig/network-scripts/. In our example, we will use the eth1 interface, thus the file to edit would be ifcfg-eth1.

Add a line to specify the MTU, for example:
DEVICE=eth1
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.20.2
NETMASK=255.255.255.0
MTU=9000

Assuming that MTU is set on the system, just do a ifdown eth1 followed by ifup eth1.
An ifconfig eth1 will tell if its set correctly

eth1 Link encap:Ethernet HWaddr 00:0F:EA:94:xx:xx
inet addr:192.168.20.2 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20f:eaff:fe91:407/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:141567 errors:0 dropped:0 overruns:0 frame:0
TX packets:141306 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:101087512 (96.4 MiB) TX bytes:32695783 (31.1 MiB)
Interrupt:18 Base address:0xc000

To validate end-2-end MTU 9000 packet management

Execute the following on Linux systems:

ping -M do -s 8972 [destinationIP]
For example: ping datadomain.viscosityna.com -s 8972

The reason for the 8972 on Linux/Unix system, the ICMP/ping implementation doesn’t encapsulate the 28 byte ICMP (8) + TCP (20) (ping + standard transmission control protocol packet) header. Therefore, take in account : 9000 and subtract 28 = 8972.

[root@racnode01]# ping -s 8972 -M do datadomain.viscosityna.com
PING datadomain.viscosityna.com. (192.168.20.32) 8972(9000) bytes of data.
8980 bytes from racnode1.viscosityna.com. (192.168.20.2): icmp_seq=0 ttl=64 time=0.914 ms

To illustrate if proper MTU packet address-ability is not in place, I can set a larger packet size in the ping (8993). The packet gets fragmented you will see
“Packet needs to be fragmented by DF set”. In this example, the ping command uses ” -s” to set the packet size, and “-M do” sets the Do Not Fragment

[root@racnode01]# ping -s 8993 -M do datadomain.viscosityna.com
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING datadomain.viscosityna.com. (192.168.20.32) 8993(9001) bytes of data.
From racnode1.viscosityna.com. (192.168.20.2) icmp_seq=0 Frag needed and DF set (mtu = 9000)

By adjusting the packet size, you can figure out what the mtu for the link is. This will represent the lowest mtu allowed by any device in the path, e.g., the switch, source or target node, target or anything else inbetween.

Finally, another way to verify the correct usage of the MTU size is the command ‘netstat -a -i -n’ (the column MTU size should be 9000 when you are performing tests on Jumbo Frames)

High Level Overview of 11204 ASM Rebalance in Async ARB0

High Level look at 11204 Rebalance with Plan Optimiation and Async ARB0

 

Drop disk

 

SQL> alter diskgroup reco drop disk ‘ASM_NORM_DATA4’ rebalance power 12

here we issue the rebalance

NOTE: requesting all-instance membership refresh for group=2

GMON querying group 2 at 120 for pid 19, osid 19030

GMON updating for reconfiguration, group 2 at 121 for pid 19, osid 19030

NOTE: group 2 PST updated.

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

GMON querying group 2 at 122 for pid 13, osid 4000

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

NOTE: starting rebalance of group 2/0x89b87754 (RECO) at power 12   rebalance internally started

Starting background process ARB0    ARB0 gets started for this rebalance

SUCCESS: alter diskgroup reco drop disk ‘ASM_NORM_DATA4’ rebalance power 12

Wed Sep 19 23:54:10 2012

ARB0 started with pid=21, OS id=19526

NOTE: assigning ARB0 to group 2/0x89b87754 (RECO) with 12 parallel I/Os   ARB0 assigned to this

diskgroup rebalance. Note that it states 12 parallel I/Os

NOTE: Attempting voting file refresh on diskgroup RECO

Wed Sep 19 23:54:38 2012

NOTE: requesting all-instance membership refresh for group=2   first indications that rebalance is completing

GMON updating for reconfiguration, group 2 at 123 for pid 22, osid 19609

NOTE: group 2 PST updated.

SUCCESS: grp 2 disk ASM_NORM_DATA4 emptied    Once rebalanced relocation phase is complete, the disk is emptied

NOTE: erasing header on grp 2 disk ASM_NORM_DATA4   The emptied disk’s header is erased and set to FORMER

NOTE: process _x000_+asm (19609) initiating offline of disk 3.3915941808 (ASM_NORM_DATA4) with mask 0x7e in group 2

The dropped disk is offlined

NOTE: initiating PST update: grp = 2, dsk = 3/0xe96887b0, mask = 0x6a, op = clear

GMON updating disk modes for group 2 at 124 for pid 22, osid 19609

NOTE: PST update grp = 2 completed successfully

NOTE: initiating PST update: grp = 2, dsk = 3/0xe96887b0, mask = 0x7e, op = clear

GMON updating disk modes for group 2 at 125 for pid 22, osid 19609

NOTE: cache closing disk 3 of grp 2: ASM_NORM_DATA4

NOTE: PST update grp = 2 completed successfully

GMON updating for reconfiguration, group 2 at 126 for pid 22, osid 19609

NOTE: cache closing disk 3 of grp 2: (not open) ASM_NORM_DATA4

NOTE: group 2 PST updated.

Wed Sep 19 23:54:42 2012

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

GMON querying group 2 at 127 for pid 13, osid 4000

GMON querying group 2 at 128 for pid 13, osid 4000

NOTE: Disk in mode 0x8 marked for de-assignment

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

NOTE: Attempting voting file refresh on diskgroup RECO

Wed Sep 19 23:56:45 2012

NOTE: stopping process ARB0    All phases of rebalance are completed and ARB0 is shutdown

SUCCESS: rebalance completed for group 2/0x89b87754 (RECO)   Rebalance marked as complete

 

 

Add disk

Starting background process ARB0

SUCCESS: alter diskgroup reco add disk ‘ORCL:ASM_NORM_DATA4’ rebalance power 16

Thu Sep 20 23:08:22 2012

ARB0 started with pid=22, OS id=19415

NOTE: assigning ARB0 to group 2/0x89b87754 (RECO) with 16 parallel I/Os

Thu Sep 20 23:08:31 2012

NOTE: Attempting voting file refresh on diskgroup RECO

Thu Sep 20 23:08:46 2012

NOTE: requesting all-instance membership refresh for group=2

Thu Sep 20 23:08:49 2012

NOTE: F1X0 copy 1 relocating from 0:2 to 0:459 for diskgroup 2 (RECO)

Thu Sep 20 23:08:50 2012

GMON updating for reconfiguration, group 2 at 134 for pid 27, osid 19492

NOTE: group 2 PST updated.

Thu Sep 20 23:08:50 2012

NOTE: membership refresh pending for group 2/0x89b87754 (RECO)

NOTE: F1X0 copy 2 relocating from 1:2 to 1:500 for diskgroup 2 (RECO)

NOTE: F1X0 copy 3 relocating from 2:2 to 2:548 for diskgroup 2 (RECO)

GMON querying group 2 at 135 for pid 13, osid 4000

SUCCESS: refreshed membership for 2/0x89b87754 (RECO)

Thu Sep 20 23:09:06 2012

NOTE: Attempting voting file refresh on diskgroup RECO

Thu Sep 20 23:09:57 2012

NOTE: stopping process ARB0

SUCCESS: rebalance completed for group 2/0x89b87754 (RECO)

SQL> select NUMBER_KFGMG, OP_KFGMG, ACTUAL_KFGMG, REBALST_KFGMG from X$KFGMG;
NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           2         1           0             2
           2         32           0             2

NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           2         1           16             1
NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           2        1           16             2
           2         32           16             2


NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           2         1           16             2

			

What should you ask your All Flash Array Vendor

I was just traveling back from a client, where the customer just bought into the concept of an all flash array for their database and VDI workloads.  They had asked me to help out where I can.  So I started pondering the things this customer (or any customers/buyers) should think through…..

At first I was going to do a comparative analysis (table) of the All Flash Arrays out on the market.  However, since the AFA market is constantly changing anyways… why bother with a comparison.

Thus, I changed my approach to aiding the buyer/architect in positioning the appropriate questions to the vendor.  Thus the approach became more of  “What to consider when considering” when purchasing a AFA.

Now note, I’m not stating some earth shattering thought leadership here or a new dimension of looking at this issue,   I’m merely sharing what I was going to present to the customer

Anyways……As with most storage decisions, its very hard to bucketize considerations into the performance, costs and manageability categories, because they are so intertwined.  Also, I specifically did not address Cost separately, since Cost traverses every layer and topic, whether for cost-performance, cost- supportability or feature-cost usage.

1. Performance is king! – We know AFA performance is awesome, but think thru and ask the following:

a. How does the AFA fair with the differing workloads; i.e., degree of sequential to random, and read/write ratios of 80/20, 70/30,  and 50/50.  And especially when the array is near capacity -> 70% or 80%

b. How is garbage collection handled.  Is it using ASIC/SSD or controller based garbage collection.  Regardless, the buyer shouldn’t have to understand the bowels of garbage collection, so the question to the vendor should be simply what is the performance consistency, or better stated “consistency of performance” –  specifically during steady state/peak workloads or during flash maintenance operations (garbage collection, flash overwrites, wear leveling, etc.).

c.  I wasn’t sure if I should even add this entry, but for completeness I will.  AFAs  on the market today use a type or combination of  SSD drives: SLC, MLC, (cMLC), eMLC, etc.  As with the above, buyers should not concern themselves with this level of detail, but one should ascertain the performance they should expect.  This category really needs to go in the costing Category – cost per IOP, cost per GB, etc.

2. Manageability

a. How does the array handle non-disruptive upgrades (NDU).  AFA occasional patches, updates and even field replaceable changes, thus, need to determine what is the impact of making these changes; i.e., is it an online transparent change, online change with a reboot (outage),  or destructive change?  For example, how is a AFA OS patch handled or how is SSD firmware changes handled?

b. Scalability –  What I mean here is really AFA expansion without disruption.  Ask whether you can add another array, another set of controllers, etc, without having to export the array data contents, add in new array, and load back data. It should be mainframe class scale.

c. Storage Array simplicity – How usable are the GUI tools to manage the operational array tasks; e.g.,  create volumes, measure performance, effective-ness of Data Services, and alert notication on failing components

3. Features (Data Services) – By now most AFA will incorporate snapshots, replication, compression, and of course de-duplication. But the real question is what is impact when using these services concurrently, what about selectively using features (by LUN/volume), and overall performance  impact of these services.

This just get you started on the things to think through !!!