Hot-spare node
When you add nodes, you can specify up to four of them as hot-spare nodes. A hot-spare node can become online (handling I/O operations) automatically if needed. For example, if a node fails, an available hot-spare node that matches the failed node is activated automatically and moves to the Online Spare state. The hot-spare node handles I/O operations for the failed node until it comes back online. After the node returns to the system, the hot-spare node returns to the Spare state, which indicates it can be automatically swapped for other failed nodes on the system.
The loss of a node, either for unplanned reasons, such as hardware failure, or planned outages, such as upgrades, can result in loss of redundancy or degraded system performance. To reduce this possibility, a hot-spare node is kept powered on and visible on the system. A hot-spare node has active system ports, but no host I/O ports, and is not part of any I/O group. If a node fails or is upgraded, this spare node joins the system and assumes the place of the failed node, restoring redundancy. Only host connection on Fibre Channel ports that support node port virtualization (NPIV) can be used for hot-spare nodes. The hot-spare node uses the same N_Port ID Virtualization (NPIV) worldwide port names (WWPNs) for its Fibre Channel ports as the failed node, so host operations are not disrupted. The hot-spare node retains its node identifier when it was the spare. During an upgrade, the spare node is added to the system as soon as a node is removed. As each node in a system shuts down for the upgrade, it is replaced by the hot-spare node.
You can assign node pairs to specific I/O groups and then assign the extra nodes as hot-spare nodes. When a hot-spare node is added to the system, it is in Spare state, which indicates that it is not part of an I/O group. If a node in an I/O group fails, a hot-spare node automatically replaces that node and becomes a part of the I/O group. While the hot-spare node is in the I/O group, it is in the Online Spare state and returns to the Spare state when the original node rejoins the I/O group. A system can contain up to four spares at any time, which includes any hot-spare nodes that are currently online as spare nodes. Ensure that all cabling is correct to ensure that the nodes are detected by the system. If a node is not detected, review the installation information that was included with the system.
When the hot-spare node is used to replace an existing node, the system attempts to find a spare node that matches the configuration of the replaced node perfectly. However, if a perfect match does not exist, the system continues the configuration check until a matching criteria is found. The following criteria is used by the system to determine suitable hot-spare nodes:
If the criteria are not the same for both, the system uses lower criteria until the minimal configuration is found. For example, if the Fibre Channel ports do not match exactly but all the other required criteria match, then the hot-spare node can still be used. The minimal configuration that the system can use as a hot-spare node includes identical memory, site, Fibre Channel port ID, and, if applicable, compression settings.
If the nodes on the system support and are licensed to use encryption, the hot-spare node must also support and be licensed to use encryption. For enhanced stretched or HyperSwap® configurations, hot-spare nodes must be assigned to a specific site. If a node fails on a particular site, the hot-spare node that is assigned to that site is used if it is a suitable replacement. If you are using standard configuration for a stretched system, you must update to an enhanced stretched system to use hot-spare nodes. In a standard stretched configuration, hot-spare nodes can be selected from the wrong site that overloads inter-system links and causes performance issues.