Completing the node rescue when the node boots

On SAN Volume Controller 2145-CG8 or 2145-CF8, if it is necessary to replace the hard disk drive or if the software on the hard disk drive is corrupted, you can use the node rescue procedure to reinstall the software across the Fibre Channel fabric from its partner node in the same I/O group.

Before you begin

Similarly, if you have replaced the service controller, use the node rescue procedure to ensure that the service controller has the correct software.

About this task

Attention: If you recently replaced both the service controller and the disk drive as part of the same repair operation, node rescue fails.

Node rescue works by booting the operating system from the service controller and running a program that copies all the SAN Volume Controller software from any other node that can be found on the Fibre Channel fabric.

Attention: When running node rescue operations, run only one node rescue operation on the same SAN, at any one time. Wait for one node rescue operation to complete before starting another.

Perform the following steps to complete the node rescue:

Procedure

  1. Ensure that the Fibre Channel cables are connected.
  2. Ensure that at least one other node is connected to the Fibre Channel fabric.
  3. Ensure that the SAN zoning allows a connection between at least one port of this node and one port of another node. It is better if multiple ports can connect. This is particularly important if the zoning is by worldwide port name (WWPN) and you are using a new service controller. In this case, you might need to use SAN monitoring tools to determine the WWPNs of the node. If you need to change the zoning, remember to set it back when the service procedure is complete.
  4. Turn off the node.
  5. Press and hold the left and right buttons on the front panel.
  6. Press the power button.
  7. Continue to hold the left and right buttons until the node-rescue-request symbol is displayed on the front panel (Figure 1).

Results

Figure 1. Node rescue display
This figure shows how the Node rescue error is displayed on the front panel

The node rescue request symbol displays on the front panel display until the node starts to boot from the service controller. If the node rescue request symbol displays for more than two minutes, go to the hardware boot MAP to resolve the problem. When the node rescue starts, the service display shows the progress or failure of the node rescue operation.

Note: If the recovered node was part of a clustered system, the node is now offline. Delete the offline node from the system and then add the node back into the system. If node recovery was used to recover a node that failed during a software update process, it is not possible to add the node back into the system until the code update process has completed. This can take up to four hours for an eight-node clustered system.