To those who attended our Exadata Monitoring and Agents. Here’s some Answers and followup from the Chat room
The primary goal of the Exadata Pluigin is to digest the schematic file and validate database.xml and catalog.xml files. If the pre-check runs w/o failure then Discovery can be executed.
Agent only runs on compute nodes and monitors all components remotely; i,e ,no additional scripts/code is installed on the peripheral components. Agents pull component metrics and vitals using either ssh commands (using user equivalence based commands) or subscribe to SNMP traps.
Note, that there are always two agents deployed, the master does majority of the work, and a slave agent, which “kicks-in” if the master fails. Agents should be installed on all compute nodes
Initially, the guided discovery wizard runs ASM kfod to get disk names and reads cellip.ora.
The components monitored via the Exadata-EM plugin include the following:
• Storage Cells
• Infiniband Switches (IB switches)
EM agent runs remote ssh calls to collect switch metrics, IB switch sends SNMP traps (PUSH) for all alerts. This collection does require ssh equilavalnace for nm2user. This collection includes varipous sensor data: FAN, voltage, temparture. As well port metrics.
Plugin does the following:
Reads the components names connected to the IBM switch, matches up the compute node hostnames tp the hostnames used to install agent
• Cisco Switch
EM agent runs remote SNMP get calls to gather metric data, this includes port status, switch vitals; eg, CPU, memory, power, and temp. In addition, performance metrics are also collect; eg, ingress and egress throughput rates
• PDU and KVM
For the PDU, both active and passive PDUs are monitored. Agent runs SNMP get calls from each PDU, metric collection includes Power, temperature, Fan status. The same steps and metrics are gathered for the KVM
• ILOM targets
EM Agent executes remote ipmitool calls to each compute node’s ILOM target. This execution requires oemuser credentials to run ipmitool. Agent collects sensor data as well as configuration data (firmware version and serial number)
In EM 220.127.116.11 , the key enhancements introduced include gathering IB performance, on-demand schematic refresh, Cell performance monitoring as well as a guided resolution for cell alerts. SNMP automation notification setup for Exadata Storage Server and InfiniBand Switches.
The Agent discovers IB switches and compute nodes and sends output to ibnetdiscover. The KVM, PDU, Cisco and ILOM discovery is performed via schematic file on compute node, and finally subscribes to SNMP for cells and IBM switches; note, SNMP has to be manually setup and enabled on peripheral componets for SNMP push of cell alerts. EM agent runs cellcli via ssh to obtain Storage metrics, this does require ssh equialvance with Agent user
The latest version (as of this writing, 18.104.22.168), there were a number of key visualization and metrics enhancements. For example:
• CDB-level I/O Workload Summary with PDB-level details breakdown.
• I/O Resource Management for Oracle Database 12c.
• Exadata Database Machine-level physical visualization of I/O Utilization for CDB and PDB on each Exadata Storage Server. There is also a critical integration link to Database Resource Management UI.
• Additional InfiniBand Switch Sensor fault detection, including power supply unit sensors and fan presence sensors.
• Automatically push Exadata plug-in to agent during discovery.
Use fully qualified names with Agent, using shorten names will causes issues. If there are any issues with metrics gathering or agent, EMDiag Kit should be used to triage this. The EMDiag kit includes scripts that can be used EM issues. Specifically, the kit includes repvfy, agtvfy, and omsvfy. These tools can be used to diagnose issues with the OEM Repository, EM Agents, control management services.
To obtain the EMDiag Kit, download the zip file for the version that you need, per Oracle Support Note: MOS ID# 421053.1
$EMDIAG_HOME/bin/repvfy verify Exadata –level 9 -details