Re-adding a repaired node to a clustered system by using the CLI
You can use the command-line interface (CLI) to re-add a failed node back into a clustered system after it was repaired.
Before you begin
Before you add a node to a clustered system, you must make sure that the switchd\ zoning is configured such that the node that is being added is in the same zone as all other nodes in the clustered system. If you are replacing a node and the switch is zoned by worldwide port name (WWPN) rather than by switch port, make sure that the switch is configured such that the node that is being added is in the same VSAN/zone.
- If you are re-adding a node to the SAN, ensure that you are adding the node to the same I/O group from which it was removed. Failure to select the correct I/O group can result in data corruption. You must use the information that was recorded when the node was originally added to the clustered system. If you do not have access to this information, call the IBM® Support Center to add the node back into the clustered system without corrupting the data.
- The LUNs that are presented to the ports on the new node must be the same as the LUNs that are presented to the nodes that currently exist in the clustered system. You must ensure that the LUNs are the same before you add the new node to the clustered system.
- LUN masking for each LUN must be identical on all nodes in a clustered system. You must ensure that the LUN masking for each LUN is identical before you add the new node to the clustered system.
- You must ensure that the model type of the new node is supported by the SAN Volume Controller software level that is installed on the clustered system. If the model type is not supported by the SAN Volume Controller software level, update the clustered system to a software level that supports the model type of the new node. See the following website for the latest supported software levels:
About this task
Special procedures when you add a node to a clustered system
Applications on the host systems direct I/O operations to file systems or logical volumes that are mapped by the operating system to virtual paths (vpaths), which are pseudo disk objects that are supported by the Subsystem Device Driver (SDD). SDD maintains an association between a vpath and a SAN Volume Controller volume. This association uses an identifier (UID) which is unique to the volume and is never reused. The UID permits SDD to directly associate vpaths with volumes.
SDD operates within a protocol stack that contains disk and Fibre Channel device drivers that are used to communicate with the SAN Volume Controller using the SCSI protocol over Fibre Channel as defined by the ANSI FCS standard. The addressing scheme that is provided by these SCSI and Fibre Channel device drivers uses a combination of a SCSI logical unit number (LUN) and the worldwide node name (WWNN) for the Fibre Channel node and ports.
If an error occurs, the error recovery procedures (ERPs) operate at various tiers in the protocol stack. Some of these ERPs cause I/O to be redriven by using the same WWNN and LUN numbers that were previously used.
SDD does not check the association of the volume with the vpath on every I/O operation that it performs.
- The clustered system has more than one I/O group.
- The node that is being added to the clustered system uses physical node hardware or a slot that has previously been used for a node in the clustered system.
- The node that is being added to the clustered system uses physical node hardware or a slot that has previously been used for a node in another clustered system and both clustered systems have visibility to the same hosts and back-end storage.
- The node must be added to the same I/O group that it was previously in. You can use the command-line interface (CLI) command lsnode or the management GUI to determine the WWN of the clustered system nodes.
- Before you add the node back into the clustered system, you must shut down all of the
hosts using the clustered system. The node must then be added before the hosts are
rebooted. If the I/O group information is unavailable or it is inconvenient to shut down and reboot
all of the hosts by using the clustered system, then do the following:
- On all of the hosts that are connected to the clustered system, unconfigure the Fibre Channel adapter device driver, the disk device driver, and multipathing driver before you add the node to the clustered system.
- Add the node to the clustered system, and then reconfigure the Fibre Channel adapter device driver, the disk device driver, and multipathing driver.
Scenarios where the special procedures can apply
- Four nodes of an eight-node clustered system have been lost because of the failure
of a pair of 2145 UPS or four 2145 UPS-1U . In this case, the four
nodes must be added back into the clustered system by using the CLI command
addnode or the management GUI. Note: You do not need to run the addnode command on a node with a partner that is already in a clustered system; the clustered system automatically detects an online candidate.
- A user decides to delete four nodes from the clustered system and add them back into the clustered system using the CLI command addnode or the management GUI.
For 5.1.0 nodes, the SAN Volume Controller automatically re-adds nodes that failed back to the clustered system. If the clustered system reports an error for a node missing (error code 1195) and that node has been repaired and restarted, the clustered system automatically re-adds the node back into the clustered system. This process can take up to 20 minutes, so you can manually re-add the node by completing the following steps: