Fix hardware errors
Before running a system recovery procedure, it is important to identify and fix the root cause of the hardware issues.
Identifying and fixing the root cause can help recover a system, if these are the
faults that are causing the system to fail. The following are common issues that can be easily
resolved:
- The node is powered off or the power cords were unplugged.
- Check the node status of every node that is a
member of the system. Resolve all errors.
- All nodes must be reporting either a node error 578, or no cluster name is
shown on the Cluster: display. These error codes indicate that
the system lost its configuration data. If any nodes report anything other than these error
codes, do not perform a recovery. You can encounter situations where non-configuration nodes
report other node errors, such as a node error 550. The 550 error can also indicate that a
node is not able to join a system.Note: If any of the buttons on the front panel are pressed after these two error codes are reported, the report for the node returns to the 578 node error. The change in the report happens after approximately 60 seconds. Also, if the node was rebooted or if hardware service actions were taken, the node might show no cluster name on the Cluster: display.
- If any nodes show Node Error: 550, record the
data from the second line of the display. If the last character on the second line of the
display is >, use the right button to scroll the display to the
right.
- In addition to the Node Error: 550, the second line of the display can show a list of node front panel IDs (7 digits) that are separated by spaces. The list can also show the WWPN/LUN ID (16 hexadecimal digits followed by a forward slash and a decimal number).
- If the error data contains any front panel IDs, ensure that the node referred to by that front panel ID is showing Node Error 578:. If it is not reporting node error 578, ensure that the two nodes can communicate with each other. Verify the SAN connectivity and restart one of the two nodes by pressing the front panel power button twice.
- If the error data contains a WWPN/LUN ID, verify the SAN connectivity between this node and that WWPN. Check the storage system to ensure that the LUN referred to is online. After verifying, restart the node by pressing the front panel power button twice.
Note: If (after you resolve all these scenarios) half or greater than half of the nodes are reporting Node Error: 578, it is appropriate to run the recovery procedure. - For any nodes that are reporting a node error 550, ensure that all the missing hardware that is identified by these errors is powered on and connected without faults.
- If you are not able to restart the system, and if any node other than the current node is reporting node error 550 or 578, you must remove system data from those nodes. This action acknowledges the data loss and puts the nodes into the required candidate state.
- All nodes must be reporting either a node error 578, or no cluster name is
shown on the Cluster: display. These error codes indicate that
the system lost its configuration data. If any nodes report anything other than these error
codes, do not perform a recovery. You can encounter situations where non-configuration nodes
report other node errors, such as a node error 550. The 550 error can also indicate that a
node is not able to join a system.