Replacing nodes nondisruptively: 2145-DH8

The following procedures describe how to replace most nodes nondisruptively with SAN Volume Controller 2145-DH8 nodes.

Before you begin

The procedures are nondisruptive because changes to your SAN environment are not required. The replacement (new) node uses the same worldwide node name (WWNN) as the node that you are replacing. An alternative to this procedure is to replace nodes disruptively either by moving volumes to a new I/O group or by rezoning the SAN. However, the disruptive procedures require more work on the hosts.

There might be some loss of system performance when the nodes are being replaced. Volumes that are managed by the I/O group that contains the node to be replaced becomes degraded when one of the nodes is shut down at the start of this procedure. The volumes remain degraded until both 2145-DH8 nodes are running.

If the array that is provided by the flash drives in the nodes that are being replaced is not in a storage pool that has other array types that are managed by IBM® Easy Tier®, it is best to move data off the flash drives before you replace the nodes.

This task assumes that the following conditions are met:

  • The existing system software on the nodes that are being replaced must be 7.3.0 or later.
    Important: If a 4-port 16 Gbps Fibre Channel adapter is installed on the 2145-DH8 node that you are adding, software level 7.6.0 or later must be installed on all nodes in the system. Otherwise, the nodes do not recognize this adapter.
  • If the node that is being replaced contains flash drives and the drives are in use, the arrays that are provided by the flash drives must be in storage pools that contain other array types that are managed by IBM Easy Tier before you replace the node.
  • If the node that is being replaced contains flash drives, transfer all flash drives and SAS adapters to the new node if it supports the drives. To prevent losing access to the data, if the new node does not support the existing flash drives, transfer the data from the flash drives before you replace the node.
    Note: Flash drives from 2145-CG8 and older nodes cannot be transferred to the 2145-DH8 or to the 2145-24F expansion enclosure.
  • All nodes that are configured in the system are present and online.
  • All errors in the system event log are addressed and marked as fixed.
  • No volumes, managed disks (MDisks), or external storage systems have a status of degraded or offline.
  • You backed up the system configuration and saved the svc.config.backup.xml file.
  • The replacement node must be able to operate at the Fibre Channel or Ethernet connection speed of the node that it is replacing.
  • If the node that is being replaced has a second I/O adapter in addition to the required Fibre Channel adapter, the replacement node must have the same type of adapter in slot two.
  • If the node that is being replaced is a SAN Volume Controller 2145-DH8 , the replacement node must have the same configuration of I/O adapters in the same slots as the old node.
  • The Fibre Channel device driver on each Fibre Channel attached host should be set to time out a missing fibre path in 3 seconds or less. If it is not practical to check the parameters of the Fibre Channel driver on each host, then you will need to reboot the new 2145-DH8 node shortly after it is added to the system so that the fibre paths to it stops long enough to ensure that they are recovered properly when the 2145-DH8 is active again.
    Tip: The timeout setting for the Emulex Fibre Channel device driver might default to 30 seconds, so it needs to be changed.
Important:
  • Do not continue this task if any of the conditions that are listed are not met unless you are instructed to do so by IBM Support.
  • Review all of the steps that follow before you proceed with this task.
  • Do not continue this task if you are not familiar with the system environments or the procedures that are described in this task.
  • If you plan to reuse the node that you are replacing, ensure that the WWNN of the node is set to a unique number on your SAN. If you do not ensure that the WWNN is unique, the WWNN and WWPN are duplicated in the SAN environment and can cause problems.
  • The node ID and possibly the node name change during this task. After the system assigns the node ID, the ID cannot be changed. However, you can change the node name after this task is complete.

About this task

To replace active nodes in a system, complete the following steps.

Procedure

  1. Optional: If if the current software level on the 2145-DH8 nodes is not the same as the software level on the active system, you might want to install the current system software level on to the 2145-DH8 node. By performing this step, you can save up to 20 minutes when a node is added to the the system at step 16. For information about accessing the service assistant GUI through the technician port so that you can view the software level and optionally install a different software level, see Technician port for node access.

    Optionally, using the service assistant, you can also change the WWNN now to the value used by the node that you are replacing with this node.

  2. Complete the following steps.
    1. Confirm that no hosts have dependencies on the node.
      You can use either the management GUI or a command-line interface (CLI) command:
      1. In the management GUI, select Monitoring > System .
      2. On the System -- Overview page, use the directional arrow near the node Node Details page.
      3. Select Node Actions > Dependent Volumes
      • If you use the CLI command, use the node parameter with the lsdependentvdisks command to view dependent volumes.
        lsdependentvdisks -node node_id | node_name
    2. If dependent volumes exist, determine whether the volumes are being used.
      If the volumes are being used, either restore the redundant configuration or suspend the host application.
    3. If a dependent quorum disk is reported, repair the access to the quorum disk or modify the quorum disk configuration.
  3. Follow these steps to determine the system configuration node, and the ID, name, I/O group ID, and I/O group name for the node that you want to replace. If you already know the physical location of the node that you want to replace, you can skip this step and proceed to the next step.
    Tip: If one of the nodes that you want to replace is the system configuration node, replace it last.
    1. Issue this command from the command-line interface (CLI).
      lsnode -delim : 
    2. In the config_node column, find the value yes and record the values in the id and name columns.
    3. Record the values in the id and the name columns for each node in the system.
    4. Record the values in the IO_group_id and the IO_group_name columns for each node in the system.
    5. Issue this command from the CLI for each node in the system to determine the front panel ID, where node_name or node_id is the name or ID of the node for which you want to determine the front panel ID.
      lsnodevpd node_name or node_id
    6. Record the value in the front_panel_id column.
      The front panel ID is displayed on the front of each node. You can use this ID to determine the physical location of the node that matches the node ID or node name that you want replace.
  4. Record the WWNN and iSCSI name of the node that you want to replace:
    1. Issue this command from the CLI, where node_name or node_id is the name or ID of the node for which you want to determine the WWNN and iSCSI name.
      lsnode -delim : node_name | node_id
      
    2. Record the WWNN and iSCSI name of the node that you want to replace.
    3. Record the order of the Fibre Channel and Ethernet ports.
    4. If the system has Ethernet port IPs configured, store the current settings so that they can be applied to the replacement nodes. To do so, enter the following command.
      lsportip -delim :
  5. Required: Complete these steps:
    1. Record and mark the order of the Fibre Channel or Ethernet cables with the node port number (port 1 - 4 for Fibre Channel, or port 1 - 2 for Ethernet) before you remove the cables from the back of the node.
      The Fibre Channel ports on the back of the node are numbered 1 - 4 from left to right. You must reconnect the cables in the exact order on the replacement node to avoid issues when the replacement node is added to the system. If the cables are not connected in the same order, the port IDs can change, which impacts the ability of the host to access volumes. See the hardware documentation specific to your model to determine how the ports are numbered.
    2. Do not connect the replacement node to different ports on the switch or to a different switch.
      If the Fibre Channel switches are going to be changed so that the 8 Gbps speed can be reached, then this task must be a separate task that is done before or after this node replacement procedure.
  6. If the node has 10 Gbps Ethernet IPs configured, delete these settings by using the following command, ensuring that you note the current settings:
    rmportip -node [node ID or name] [port ID]
  7. Issue this CLI command to delete this node from the system and I/O group, Where node_name or node_id is the name or ID of the node that you want to delete. You can use the CLI to verify that the deletion process was completed.
    rmnode node_name or node_id
  8. Optional: If you want to use the removed node as a spare node, enter this CLI command to ensure that the node is no longer a member of the system:
    lsnode 
    A list of nodes is displayed. Wait until the removed node is not listed in the command output.
  9. Change the WWNN and iSCSI name of the node that you deleted from the system to 1FFFF:
    • For a SAN Volume Controller 2145-DH8 node:
      1. Power on the node.
      2. Issue this CLI command:
        satask chvpd -wwnn FFFFFFFFFFFFFFFF
  10. Install the replacement node and any expansion enclosures, if present, in the rack.
    Important: Do not connect the Fibre Channel or Ethernet cables during this step.
  11. Power on the replacement node.
  12. Record the WWNN of the replacement node. This name can be reused by another SAN Volume Controller 2145-DH8 node.
  13. Change the WWNN name of the replacement node to match the name that you recorded in step 4.
    Use the service assistant interface to change the WWNN or run the following CLI command, where WWNN is the value you recorded from the original node.
    satask chvpd -wwnn WWNN 
  14. Enter the following CLI command to verify that the last 5 characters of the WWNN are correct.
    lsnodecandidate
    Important: If the WWNN is not what you recorded in step 4, you must repeat step 13.
  15. Connect the Fibre Channel or Ethernet cables to the same port numbers that you recorded for the original node in step 5.
  16. Use the service assistant interface or enter the following CLI command to add the node to the system, where WWNN and iogroupname_id are the values that you recorded for the original node.

    When you add the node, this step ensures that it has the same name as the original node and is in the same I/O group as the original node. For more information, see the addnode command documentation.

    addnode -wwnodename WWNN -iogrp iogroupname_id  

    The system reassigns the node with the name that was used originally. If the original name of the node name was automatically assigned by the system, it is not possible to reuse the same name. It was automatically assigned if its name starts with node. In this case, either specify a different name that does not start with node or do not use the name parameter so that the system automatically assigns a new name to the node.

    If necessary, the new node is updated to the same system software version as the system. This update can take up to 20 minutes.

    If Ethernet IPs were previously configured, configure the Ethernet ports to reuse the settings from the replaced node. Ethernet port IPs can be configured by using the management GUI or the CLI command. (The following command examples are presented on multiple lines for clarity).
    • For IPv4 IPs
      cfgportip -node [node name or ID] -ip [IPv4]
      -mask [subnet mask] -gw [gateway] [port ID]
    • For IPv6 IPs
      cfgportip -node [node name or ID] -ip_6 [IPv6]
      -prefix_6 [prefix] -gw_6 [gateway] [port ID]

    If you have 10 Gbps iSCSI hosts, check that the iSCSI hosts are now using the 10 Gbps Ethernet port 4 and port 5 on the 2145-DH8 node.

    Important:
    1. Both nodes in the I/O group cache data; however, the cache sizes are asymmetric. The replacement node is limited by the cache size of the partner node in the I/O group. Therefore, it is possible that the replacement node does not use the full cache size until you replace the other node in the I/O group.
    2. You do not need to reconfigure the host multipathing device drivers because the replacement node uses the same WWNN and WWPN as the previous node. The multipathing device drivers detect the recovery of paths that are available to the replacement node.
    3. The host multipathing device drivers take approximately 30 minutes to recover the paths. Do not update the other node in the I/O group for at least 30 minutes after you successfully update the first node in the I/O group. If you have other nodes in different I/O groups to update, you can do those updates while you wait.
    4. If you are not able to check that the Fibre Channel device driver of every host is set to time out a Fibre Channel path in 3 seconds or less, then it is best to reboot the new SAN Volume Controller 2145-DH8 node now to guarantee that the fibre path becomes active when the node becomes active again.
  17. Important: Ask the host administrator to query the paths on each host to ensure that all paths to the replacement node are active before you proceed to the next step. If you are using the IBM Multipath Subsystem Device Driver (SDD) , the command to query paths is datapath query device. Documentation that is provided with your multipathing device driver shows how to query paths. Force the multipath driver to rescan for paths if the expected paths are not active.
  18. Optional: If you want to use the replaced node as a spare node, follow these steps:
    For SAN Volume Controller 2145-DH8 :
    1. Connect to the service assistant interface on the node by using the technician port.
    2. Ensure that you are connected to the correct node and then select Configure Node.
    3. Select Update WWNN.
    4. Under Specify WWNN, enter 00000.
    5. Click Modify to confirm.

    This node can now be used as a spare node.

  19. Use the CLI to create appropriate RAID storage arrays (MDisks) with the flash drives in the expansion enclosure to hold the data for all the volumes that are being migrated from the internal disks of this I/O group in the same storage pools (MDisk groups) that contain the MDisks for the internal storage in this I/O group.
  20. To remove the MDisks for the internal drives, enter the following CLI command:
    rmmdisk -mdisk mdisk_list -force mdisk_group_id| mdisk_group_name
    This command appears complete asynchronously before the actual data migration is completed.
  21. Check the progress of the active migrations by entering this command:
    svcinfo lsmigrate
  22. Check that the flash drives in the old node are not in member state by entering the lsdrive CLI command.
  23. Remove the original drives from the system configuration by changing their use to unused.
  24. Repeat steps 4 to 23 for each node that you want to replace.