Part Deux – Diggin’ into ODA ASM Add Disk to DiskGroup operation

This is part 2 of the storage expansion of ODA.   If you remember from Part1, we added a whole disk shelf and we walked thru the ASM alert.log to show how the storage gets added to the diskgroup.  In this blog post, I’m just going to illustrate the process flow of the disk addition as it looks thru the eyes of the oakd.log.  But rather than show the entire 20 disks being added (which would be voluminous), I’m going to describe the  addition of a specific disk, Slot 14.

When the disk is physically added into the slot (after taking out the filler), this disk insertion will invoke a series of backend automated scripts/tooling that will start with the OS creating a disk device entry via the event handler and end with the disk being added to ASM diskgroup.

The key point of this blog and discussion is to describe the entire end-2-end automation process of simply adding a new disk to an ODA engineered system (“dare I say push button approach”).

BTW, Still LOVE the references to COMET in the code 🙂

This section from the oak log, describes the disk characteristics; including capacity , hba port and state.  Note, since multipathing is enabled we will see 2paths, and thus 2 disk names for the same [root] slot device name :

2019-02-05 10:55:35.510: [   STMHW][710730400] Sha::Inserting OSDevName /dev/sdr for slot 14.  <— SDR

2019-02-05 10:55:35.510: [   STMHW][710730400] Sha::Inserting OSDevName /dev/sdao for slot 14. <— SDAO

2019-02-05 10:55:35.510: [   STMHW][710730400] Physical Disk [14] Info:  <--Next set includes physical disk info

2019-02-05 10:55:35.510: [   STMHW][710730400] Slot Num    = 14

2019-02-05 10:55:35.510: [   STMHW][710730400] Col  Num    = 2

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevNames  = |/dev/sdao||/dev/sdr|

2019-02-05 10:55:35.510: [   STMHW][710730400] Serial Num  = 1839J5XJ9X

2019-02-05 10:55:35.510: [   STMHW][710730400] Disk Type   = SSD  <--Well we know its an SSD

2019-02-05 10:55:35.510: [   STMHW][710730400] Expander    = 0 : 508002000231a17e

2019-02-05 10:55:35.510: [   STMHW][710730400] scsi-id     = 5000cca0a101ac54

2019-02-05 10:55:35.510: [   STMHW][710730400] sectors     = 781404246

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDisk[14] Info:  <--This next listing details the device info

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevName: /dev/sdr, Id = 14, Slot = 14, Capacity = 3200631791616: 3200gb, Type = SSD, hba port = 14 State = State: GOOD, expWwn = 5080020002311fbe, scsiId = 5000cca0a101ac54, Ctrlr = 0

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDisk[38] Info:   <--Since we have multipathing, we will get same info for /dev/sdao

2019-02-05 10:55:35.510: [   STMHW][710730400] OsDevName: /dev/sdao, Id = 38, Slot = 14, Capacity = 3200631791616: 3200gb, Type = SSD, hba port = 14 State = State: GOOD, expWwn = 508002000231a17e, scsiId = 5000cca0a101ac54, Ctrlr = 1

This section from the oak log, describes the disk details from  PDiskAdapter.scr action script and FishWrap . Note the Autodiscovery hint, as the disk is partitioned for the different diskgroups:

2019-02-05 10:55:35.946: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14 2019-02-05 10:55:35.946: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14 2019-02-05 10:55:35.946: [ ADAPTER][150968064]{1:11302:2} Running predictive failure check for: /dev/sdao 2019-02-05 10:55:35.946: [    SCSI][150968064]{1:11302:2} SCSI Inquiry Command response for /dev/sdao 2019-02-05 10:55:35.946: [   OAKFW][167753472]{1:11302:2} [ActionScript] = /opt/oracle/oak/adapters/PDiskAdapter.scr 2019-02-05 10:55:35.946: [    SCSI][150968064]{1:11302:2} Vendor = HGST     Product = HBCAC2DH2SUN3.2T Revision = A170 2019-02-05 10:55:35.946: [   OAKFW][167753472]{1:11302:2} [ActionTimeout] = 1500 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ActivePath] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AgentFile] = %COMET_MS_HOME%/bin/%TYPE_NAME% 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AsmDiskList] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AutoDiscovery] = 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [AutoDiscoveryHint] = |data:80:SSD||reco:20:SSD||redo:100:SSD| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [CheckInterval] = 600 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ColNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DiskId] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DiskType] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Enabled] = 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [ExpNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [MultiPathList] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Name] = PDType 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [NewPartAddr] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [OSUserType] = |userType:Multiuser| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PlatformName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PrevUsrDevName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SectorSize] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SerialNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Size] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [SlotNum] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [TotalSectors] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [UsrDevName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [gid] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [mode] = 660 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [uid] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [DependListOpr] = add 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Dependency] = |0| 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [IState] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Initialized] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [IsConfigDependency] = false 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [MonitorFlag] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [Name] = ResourceDef 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [PrevState] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [State] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [StateChangeTs] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [StateDetails] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} [TypeName] = 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Added new resource : e0_pd_11 to the agfw 2019-02-05 10:55:35.947: [   OAKFW][167753472][F-ALGO]{1:11302:2} Resource name : e0_pd_11, state : 0 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} PE invalidating the data model 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Evaluating Add Resource for e0_pd_11 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Executing plan size: 1 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} PE: Sending message to agent : RESOURCE_VALIDATE[e0_pd_11] ID 4361:96 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Engine received the message : RESOURCE_VALIDATE[e0_pd_09] ID 4361:90 2019-02-05 10:55:35.947: [   OAKFW][167753472]{1:11302:2} Preparing VALIDATE command for : e0_pd_09 2019-02-05 10:55:35.948: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14 2019-02-05 10:55:35.948: [   STMHW][150968064]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14 2019-02-05 10:55:35.948: [ ADAPTER][150968064]{1:11302:2} Creating resource for PD: SSD_E0_S14_2701241428 2019-02-05 10:55:35.948: [ ADAPTER][150968064]{1:11302:2} partName datapctStr  80 diskType =SSD This section from the oak log, describes the disk validation 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] print_args called with argument : validate 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Arguments passed to PDiskAdapter: 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ResName = e0_pd_14 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskId = 35000cca0a101ac54 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DevName = SSD_E0_S14_2701241428 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] MultiPaths = /dev/sdao /dev/sdr 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ActivePath = /dev/sdao 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskType = SSD 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Expander = 0 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Size = 3200631791616 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] Sectors = 781404246 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] ExpColNum = 2 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] NewPartAddr = 0 2019-02-05 10:55:36.015: [        ][4177499904]{1:11302:2} [validate] DiskSerial# = 1839J5XJ9X 2019-02-05 10:55:36.023: [        ][4085245696]{1:11302:2} [validate] [Tue Feb 5 10:55:35 EST 2019] Action script '/opt/oracle/oak/adapters/PDiskAdapter.scr' for resource [e0_pd_15] called for action validate

This section from the oak log, we see the Linux kernel changes once the device entry is created; eg, IO scheduler, queue  depth, property values

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo deadline > /sys/block/sdao/queue/scheduler;echo 4096 > /sys/block/sdao/queue/nr_requests;echo 128 > /sys/block/sdao/queue/read_ahead_kb;

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo deadline > /sys/block/sdr/queue/scheduler;echo 4096 > /sys/block/sdr/queue/nr_requests;echo 128 > /sys/block/sdr/queue/read_ahead_kb;

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 64 > /sys/block/sdao/device/queue_depth

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 64 > /sys/block/sdr/device/queue_depth

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 30 > /sys/block/sdao/device/timeout

2019-02-05 10:55:36.166: [        ][4177499904]{1:11302:2} [validate] Running echo 30 > /sys/block/sdr/device/timeout

2019-02-05 10:55:36.166: [   OAKFW][4177499904]{1:11302:2} Command : validate for: e0_pd_14 completed with status: SUCCESS

2019-02-05 10:55:36.166: [   OAKFW][167753472][F-ALGO]{1:11302:2} Engine received reply for command : validate for: e0_pd_14

2019-02-05 10:55:36.166: [   OAKFW][167753472]{1:11302:2} PE: Received last reply for : RESOURCE_VALIDATE[e0_pd_14] ID 4361:107

This section from the oak log, validates the state and complete insertion

2019-02-05 10:55:51.997: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:51.997: [        ][4177499904]{1:11302:2} [check] Validating disk header for : SSD_E0_S14_2701241428

2019-02-05 10:55:51.997: [ ADAPTER][4177499904]{1:11302:2} Succefully opened the device: /dev/sdao

2019-02-05 10:55:51.997: [ ADAPTER][4177499904]{1:11302:2} Diskheader.Read: devName = /dev/sdao, master_inc = 0, m_slave_inc = 0, disk_status = 0 disk_inc = 0, slot_num= 0, serial num =  chassis snum =   part_loaded_cnt=0

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} State has been changed for: /dev/sdr Old State: GOOD, New State: INSERTED

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} State has been changed for: /dev/sdao Old State: GOOD, New State: INSERTED

2019-02-05 10:55:52.608: [        ][4177499904]{1:11302:2} [check] Found the disk in uninitialized state.

2019-02-05 10:55:52.608: [   STMHW][4177499904]{1:11302:2} getState : 1

2019-02-05 10:55:52.608: [        ][4177499904]{1:11302:2} [check] Running ssd wear level check for: /dev/sdao

2019-02-05 10:55:52.609: [    SCSI][4177499904]{1:11302:2} SSD Media used endurance indicator: 0%

2019-02-05 10:55:52.609: [   STMHW][4177499904]{1:11302:2} Sha::Inserting OSDevName /dev/sdr for slot 14

2019-02-05 10:55:52.609: [   STMHW][4177499904]{1:11302:2} Sha::Inserting OSDevName /dev/sdao for slot 14

2019-02-05 10:55:52.609: [        ][4177499904]{1:11302:2} [check] Disk State: 1,  Label: NewDiskInserted

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} getState : 1

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} State has been changed for: /dev/sdr Old State: INSERTED, New State: GOOD

2019-02-05 10:55:53.856: [   STMHW][4085245696]{1:11302:2} State has been changed for: /dev/sdao Old State: INSERTED, New State: GOOD

2019-02-05 10:55:53.856: [        ][4085245696]{1:11302:2} [check] Validating disk header for : SSD_E0_S14_2701241428

2019-02-05 10:55:53.856: [ ADAPTER][4085245696]{1:11302:2} Succefully opened the device: /dev/sdao

2019-02-05 10:55:53.856: [ ADAPTER][4085245696]{1:11302:2} Diskheader.Read: devName = /dev/sdao, master_inc = 0, m_slave_inc = 0, disk_status = 0 disk_inc = 0, slot_num= 0, serial num =  chassis snum =   part_loaded_cnt=0

Finally if you jump over to the ASM alert.log you’ll see that the disks get added to the respective ASM diskgroup:

SQL> alter diskgroup /*+ _OAK_AsmCookie */ data add disk
'AFD:SSD_E0_S08_2701228196P1' name SSD_E0_S08_2701228196p1,
'AFD:SSD_E0_S07_2701240428P1' name SSD_E0_S07_2701240428p1,
'AFD:SSD_E0_S14_2701241428P1' name SSD_E0_S14_2701241428p1,  <-- here's our dude !!
'AFD:SSD_E0_S17_2701244644P1' name SSD_E0_S17_2701244644p1,
'AFD:SSD_E0_S19_2701246564P1' name SSD_E0_S19_2701246564p1,
'AFD:SSD_E0_S09_2701252584P1' name SSD_E0_S09_2701252584p1,
'AFD:SSD_E0_S13_2701254148P1' name SSD_E0_S13_2701254148p1,
'AFD:SSD_E0_S16_2701255896P1' name SSD_E0_S16_2701255896p1,
'AFD:SSD_E0_S05_2701256380P1' name SSD_E0_S05_2701256380p1,
'AFD:SSD_E0_S11_2701257468P1' name SSD_E0_S11_2701257468p1,
'AFD:SSD_E0_S15_2701258144P1' name SSD_E0_S15_2701258144p1,
'AFD:SSD_E0_S06_2701258544P1' name SSD_E0_S06_2701258544p1,
'AFD:SSD_E0_S12_2701258588P1' name SSD_E0_S12_2701258588p1,
'AFD:SSD_E0_S10_2701259504P1' name SSD_E0_S10_2701259504p1,
'AFD:SSD_E0_S18_2701260436P1' name SSD_E0_S18_2701260436p1
kfdp_query: callcnt 338 grp 1 (DATA)
kfdp_query: callcnt 339 grp 1 (DATA)
NOTE: Assigning number (1,5) to disk (AFD:SSD_E0_S08_2701228196P1)
Disk 0x777d6080 (1:5:AFD:SSD_E0_S08_2701228196P1) is being named (SSD_E0_S08_2701228196P1)
NOTE: Assigning number (1,6) to disk (AFD:SSD_E0_S07_2701240428P1)
Disk 0x777d5708 (1:6:AFD:SSD_E0_S07_2701240428P1) is being named (SSD_E0_S07_2701240428P1)
NOTE: Assigning number (1,7) to disk (AFD:SSD_E0_S14_2701241428P1)
Disk 0x777d9950 (1:7:AFD:SSD_E0_S14_2701241428P1) is being named (SSD_E0_S14_2701241428P1)
NOTE: Assigning number (1,8) to disk (AFD:SSD_E0_S17_2701244644P1)
Disk 0x777db5b8 (1:8:AFD:SSD_E0_S17_2701244644P1) is being named (SSD_E0_S17_2701244644P1)
NOTE: Assigning number (1,9) to disk (AFD:SSD_E0_S19_2701246564P1)
Disk 0x777dc8a8 (1:9:AFD:SSD_E0_S19_2701246564P1) is being named (SSD_E0_S19_2701246564P1)
NOTE: Assigning number (1,10) to disk (AFD:SSD_E0_S09_2701252584P1)
Disk 0x777d69f8 (1:10:AFD:SSD_E0_S09_2701252584P1) is being named (SSD_E0_S09_2701252584P1)
NOTE: Assigning number (1,11) to disk (AFD:SSD_E0_S13_2701254148P1)
Disk 0x777d8fd8 (1:11:AFD:SSD_E0_S13_2701254148P1) is being named (SSD_E0_S13_2701254148P1)
NOTE: Assigning number (1,12) to disk (AFD:SSD_E0_S16_2701255896P1)
Disk 0x777dac40 (1:12:AFD:SSD_E0_S16_2701255896P1) is being named (SSD_E0_S16_2701255896P1)
NOTE: Assigning number (1,13) to disk (AFD:SSD_E0_S05_2701256380P1)
Disk 0x777d4418 (1:13:AFD:SSD_E0_S05_2701256380P1) is being named (SSD_E0_S05_2701256380P1)
NOTE: Assigning number (1,14) to disk (AFD:SSD_E0_S11_2701257468P1)
Disk 0x777d7ce8 (1:14:AFD:SSD_E0_S11_2701257468P1) is being named (SSD_E0_S11_2701257468P1)
NOTE: Assigning number (1,15) to disk (AFD:SSD_E0_S15_2701258144P1)
Disk 0x777da2c8 (1:15:AFD:SSD_E0_S15_2701258144P1) is being named (SSD_E0_S15_2701258144P1)
NOTE: Assigning number (1,16) to disk (AFD:SSD_E0_S06_2701258544P1)
Disk 0x777d4d90 (1:16:AFD:SSD_E0_S06_2701258544P1) is being named (SSD_E0_S06_2701258544P1)
NOTE: Assigning number (1,17) to disk (AFD:SSD_E0_S12_2701258588P1)
Disk 0x777d8660 (1:17:AFD:SSD_E0_S12_2701258588P1) is being named (SSD_E0_S12_2701258588P1)
NOTE: Assigning number (1,18) to disk (AFD:SSD_E0_S10_2701259504P1)
Disk 0x777d7370 (1:18:AFD:SSD_E0_S10_2701259504P1) is being named (SSD_E0_S10_2701259504P1)
NOTE: Assigning number (1,19) to disk (AFD:SSD_E0_S18_2701260436P1)
Disk 0x777dbf30 (1:19:AFD:SSD_E0_S18_2701260436P1) is being named (SSD_E0_S18_2701260436P1)
2019-02-15 16:56:32.926*:kgfm.c@547: kgfmInitialize