Resolving a problem with the SAN Volume Controller boot drives

Complete the following steps to resolve most problems with SAN Volume Controller boot drives.

Before you begin

The node serial number (also known as the product or machine serial number) is on the MT-M S/N label (Machine Type - Model and Serial Number label) on the front (left side) of the node. The node serial number is written to the system board and to each of the two boot drives during the manufacturing process.

When the SAN Volume Controller software starts, it reads the node serial number from the system board (by using the node serial number for the panel name) and compares it with the node serial numbers that are stored on the two boot drives.

Specific node errors are produced under the following conditions:
  • Unrecoverable node error 543: This error indicates that none of the node serial numbers that are stored in the three locations match. The node serial number from the system board must match with at least one of the two boot drives for the SAN Volume Controller software to assume that node serial number is good.
  • Unrecoverable node error 545: This error indicates that the node serial numbers on each boot drive match each other but are not the same as the node serial number from the system board. In this case, the node serial number on the system board might be wrong or the node serial number on the boot drives might be wrong. For example, the system board that is changed or the boot drives come from another node.
  • Node error 743: This error indicates that the node serial number cannot be read from one of the two boot drives because that drive failed, is missing, or is out of sync with the other boot drive.
  • Node error 744: This error indicates that the node serial number from one of the boot drives identifies as belonging to a different node. If boot drives were swapped between drive slots 1 and 2, node error 744 is produced.
  • Node error 745: This error indicates that a boot drive is found in an unsupported slot. This error occurs when at least one of the first two drives is online and at least one invalid slot (3-8) is occupied.

About this task

An event is displayed in the Monitoring > Events panel of the management GUI if the problem produces node error 743, 744, or 745. Run the fix procedure for that event. Otherwise, connect to the technician port to use the MT-M S/N label on the node to see the boot drive slot information and determine the problem.

Attention: If a drive slot has Yes in the Active column, the operating system depends on that drive. Do not remove that drive without first shutting down the node.
  • Do not swap boot drives between slots.
  • Each boot drive has a copy of the VPD on the system board.
  • Software upgrading is to one boot drive at a time to prevent failures during CCU.

Procedure

To resolve a problem with a boot drive, complete the following steps in order:

  1. Remove any drive that is in an unsupported slot. Move the drive to the correct slot if you can.
  2. If possible, replace any drive that is shown as missing from a slot. Otherwise, reseat the drive or replace it with a drive from FRU stock.
  3. Move any drive that is in the wrong node back to the correct node.
    Note: If the node serial number does not match the node serial number on the system board, a drive slot has a status of wrong_node. If the serial number on the MT-M S/N label matches the node serial number on the drive, you can ignore this status.
  4. Move any drive that is in the wrong slot back to the correct slot.
  5. Reseat the drive in any slot that has a status of failed. If the status remains failed, replace the drive with one from FRU stock.
  6. If the drive slot has status out of sync and Yes in the can_sync column, then:
    • Use the service assistant GUI to synchronize boot drives, or
    • Use the command-line interface (CLI) command satask chbootdrive -sync.
    • If No is displayed in the can_sync column, you must resolve another boot drive problem first.

Replacing the system board:

  1. Replace the SAN Volume Controller 2145-DH8 or SAN Volume Controller 2145-SV1 main board.

When neither of the boot drives have usable SAN Volume Controller software:

For example, if you replace both of the boot drives from FRU stock at the same time, neither boot drive has usable SAN Volume Controller software. If the SAN Volume Controller software is not running, the node status, node fault, battery status, and battery fault LEDs remain off.

  1. If you cannot replace at least one of the original boot drives with a drive that contains usable SAN Volume Controller software and has a node serial number that matches the MT-M S/N label on the front of the node, contact IBM® Remote Technical support.
    IBM Remote Technical support can help you install the SAN Volume Controller software with a bootable USB flash drive.
    • Field-based USB installation also repairs the node serial number and WWNN stored on each boot drive by finding values that are stored on the system board during manufacturing.
    • If the WWNN of this node that is changed in the past, you must change the WWNN again after you complete the SAN Volume Controller software installation. For example, if the node replaced an earlier SAN Volume Controller node, you must change the WWNN to that of the earlier node. You can repeat the change to the WWNN after the SAN Volume Controller software installation with the service assistant GUI or by command.

When every copy of the node serial number is lost:

For example, if you replace the system board and both of the boot drives with FRU stock at the same time, every copy of the node serial number is lost.

  1. If you cannot replace one of the original boot drives or the original system board so that at least one copy of the original node serial number is present, you cannot repair the node in the field. Return the node to IBM for repair.

Results

The status of a drive slot is uninitialized only if the SAN Volume Controller software might not automatically initialize the FRU drive. This status can happen if the node serial number on the other boot drive does not match the node serial number on the system board. If the node serial number on the other boot drive matches the MT-M S/N label on the front that is left of the node, you can rescue the uninitialized boot drive from the other boot drive safely. Use the service assistant GUI or the satask rescuenode command to rescue the drive.