Updating the system

The system update process involves the updating of your entire system environment. This process might involve memory and software changes.

Start here to update to version 8.1.0 or later from version 7.7.0 or later.

If you are updating from a release before version 7.7.0, follow the instructions in that previous release.

Adding more memory to a node or correcting a DIMM failure

Important: Before you upgrade a node by adding more memory, you must first remove that node from the system configuration. To do so, complete the following procedure. Similarly, if you encounter a memory DIMM failure to any node during the update process, stop immediately. Then, follow this procedure to ensure a successful update.
  1. If you are adding memory to a node, you must remove the node from the system configuration. To do so, you can use the management GUI or the CLI.
    • To use the management GUI, right-click on the node and select Remove.
    • To use the CLI, enter the following command, where node_id | node_name identifies the node.
      svctask rmnode node_id | node_name
    Note: If you are replacing a faulty DIMM, you do not have to remove the node from the system. Go to step 2.
  2. If you are correcting a DIMM failure in the node, remove the DIMM, as described in Removing the memory modules (DIMM). Then, continue to step 3.
  3. To upgrade a node with more memory or replace a DIMM on a failing node, follow the steps that are described in Replacing the memory modules (DIMM). Then, continue to step 4.
  4. Check the status of the remaining nodes in the system and the update status:
    svcinfo lssoftwareupgradestatus
  5. If the partner node is up and the system update status is updating, update the node in service mode and add it back into the system:
    svctask addnode
    Refer to the addnode command information for possible flags. The update continues.
  6. If the partner node is up and the system update status is stalled, decide whether to complete the update (roll forward) or cancel (roll back). Your decision is partly based on how far through the update you were when the failure occurred. You can roll forward with either a service update strategy or node removal (rmnode command).
    • Roll forward (service update): To manually complete the update, use a service mode update process to update the remaining down-level nodes. After all the nodes are running the same level, the update is committed.
    • Roll forward (rmnode command): Use the rmnode command procedure only if the update is more than or equal to 50% complete.
    • Roll back (cancel the update). The -force parameter is required if one or more nodes are offline.
       svctask applysoftware -abort -force
      Important: Using the -force parameter might result in a loss of access. Choose this option only if the partner node (of your offline node) is at the original code level.
      Updated nodes are rolled back to the original software level, one node at a time.
  7. Verify that all nodes are back and running the same firmware.
  8. Enter the following command:
    svcconfig backup
  9. Verify the health of the system.

Planning considerations

For the most recent information about restrictions before you update, see the following site:

http://www.ibm.com/support/docview.wss?uid=ssg1S1001707

Allow up to a week to plan your tasks, go through your preparatory update tasks, and complete the update of the system environment. The update procedures can be divided into the general processes that are shown in Table 1.
Table 1. Updating tasks
Sequence Update task
1 Before you update, become familiar with the prerequisites and tasks involved. During the automatic update procedure, the clustered system updates each of the nodes systematically. Decide whether you want to update automatically or update manually. During an automatic update procedure, the clustered system updates each of the nodes systematically. The automatic method is the preferred procedure for updating software on nodes. However, you can also update each node manually.
2 Ensure that CIM object manager (CIMOM) clients are working correctly. When necessary, update these clients so that they can support the new version of system code.
3 Ensure that multipathing drivers in the environment are fully redundant.
4 Update your system. The system update includes component firmware updates. The drive firmware update is a separate process.
5 Update other devices in the system environment. Examples might include updating hosts and switches to the correct levels.
Note: The amount of time can vary depending on the amount of preparation work that is required and the size of the environment. For automatic update, it takes about 20 minutes for each node plus 30 minutes for each system. The 30-minute interval provides time for the multipathing software to recover.
Attention: If you experience failover issues with multipathing driver support, resolve these issues before you start normal operations.

Firmware and software for the system and its attached adapters are tested and released as a single package. The package number increases each time that a new release is made.

Some code levels support updates only from specific previous levels, or the code can be installed only on certain hardware types. If you update to more than one level above your current level, you might be required to install an intermediate level. For example, if you are updating from level 1 to level 3, you might need to install level 2 before you can install level 3. For more information about the prerequisites for each code level, see this website:

www.ibm.com/support
Attention: Ensure that you have no unfixed errors in the log and that the system date and time are correctly set. Start the fix procedures, and ensure that you fix any outstanding errors before you attempt to concurrently update the code.
Note: After the system software update completes, the Fibre Channel over Ethernet (FCoE) function can be enabled on each node by following the fix procedures for these events by using the management GUI. The FCoE activation procedure involves a node reboot. Allow time for host multipathing to recover between activation of different nodes in the same I/O group.

The update process

During the automatic update process, each node in a system is updated one at a time, and the new code is staged on the nodes. While each node restarts, there might be some degradation in the maximum I/O rate that can be sustained by the system. After all the nodes in the system are successfully restarted with the new code level, the new level is automatically committed.

During an automatic code update, each node of a working pair is updated sequentially. The node that is being updated is temporarily unavailable and all I/O operations to that node fail. As a result, the I/O error counts increase and the failed I/O operations are directed to the partner node of the working pair. Applications do not see any I/O failures. When new nodes are added to the system, the update package is automatically downloaded to the new nodes from the system.

The update can normally be done concurrently with normal user I/O operations. However, performance might be impacted. If any restrictions apply to the operations that can be done during the update, these restrictions are documented on the product website that you use to download the update packages. During the update procedure, most of the configuration commands are not available. Only the following commands are operational from the time the update process starts to the time that the new code level is committed, or until the process is backed out:

  • All information commands
  • The rmnode command

To determine when your update process completes, you are notified through the management GUI. If you are using the command-line interface, issue the lsupdate command to display the status of the update.

Because of the operational limitations that occur during the update process, the code update is a user task. However, if you have problems with an update, contact your support center. Do not try to troubleshoot update problems without technical assistance. For further directions, see the topic about how to get information, help, and technical assistance.

Multipathing driver

Before you update, ensure that the multipathing driver is fully redundant with every path available and online. You might see errors that are related to the paths that are going away (fail over) and the error count increasing during the update. When the paths to the nodes are back, the nodes fall back to become a fully redundant system. After the 30-minute delay, the paths to the other node go down.

If you are using IBM® Subsystem Device Driver (SDD) or IBM Subsystem Device Driver Device Specific Module (SDDDSM) as the multipathing software on the host, increased I/O error counts are displayed by the datapath query device or datapath query adapter commands to monitor the state of the multipathing software. For more information, see the IBM Multipath Subsystem Device Driver User's Guide for more information about the datapath query commands.

If you are using IBM Subsystem Device Driver Path Control Module (SDDPCM) as the multipathing software on the host, increased I/O error counts are displayed by the pcmpath query device or pcmpath query adapter commands to monitor the state of the multipathing software.

Metro Mirror and Global Mirror relationships

When you update software on a system that has secondary volumes of running Metro Mirror or Global Mirror relationships, write performance might be degraded on the primary volumes, and Global Mirror relationships can be automatically stopped with one or more errors with error code 1920. You might want to proactively stop such relationships before you update the software to avoid the write performance degradation, and restart the relationships after the update completes.

With system version 6.4.0 or later, support for four Fibre Channel and two Fibre Channel over Ethernet (FCoE) ports was enabled. If a system contains these software versions, it is not possible to establish a remote copy partnership with another system that is running a software version earlier than 6.4.0. If a system that runs 6.4.0 or later has an existing remote copy partnership with another system that is running an earlier software version, you cannot add a node with a combined total of more than four Fibre Channel and FCoE ports. You also cannot activate more ports (either by enabling FCoE or installing new hardware) on existing nodes in the system. To resolve these problems, you have two options:
  • Update the software on the remote system to 6.4.0 or later, or
  • Use the chnodehw -legacy CLI command to disable the additional hardware on nodes in the system with 6.4.0 or later software version installed
The -legacy parameter of the chnodehw CLI controls activating and deactivating the FCoE ports.
To activate the additional hardware, run the following CLI command:
chnodehw node id
Where node_name | node_id (required) specifies the node to be modified. The variable that follows the parameter is either:
  • The node name that you assigned when you added the node to the system.
  • The node ID that is assigned to the node (not the worldwide node name).
To disable the additional hardware, run the following command:
chnodehw -legacy software_level node_id
Where software_level indicates the level of software the node must interoperate with. If the value is less than 6.4.0, then the node configures its hardware to support only a maximum of four Fibre Channel or FCoE ports. And node_name | node_id (required) specifies the node to be modified. The variable that follows the parameter is either:
  • The node name that you assigned when you added the node to the system
  • The node ID that is assigned to the node (not the worldwide node name)
With support for six ports (four Fibre Channel and two FCoE ports) on each node with 6.4.0 code, rules govern how to set up a partnership with a pre-6.4.0 system.
  • A 6.4.0 system cannot form a partnership with a pre-6.4.0 system with more than 4 FC/FCoE I/O ports enabled.
    For example, a multi-system partnership configuration between three systems, A, B, and C.
    A <-> B<-> C
    System A has pre-6.4.0 installed, and systems B and C have 6.4.0 installed.
    The remote copy services are possible in this configuration only if System B does not have FCoE ports enabled.
    Partnerships between systems A and B are not affected because of activated FCoE ports on nodes in system C.
  • If a 6.4.0 system has an already established partnership with a pre-6.4.0 system and if more hardware (four Fibre Channel and two FCoE ports) is enabled while the partnership is stopped, then the partnership cannot be started again until the remote system is updated or the extra hardware is disabled by using the chnodehw -legacy command.
  • A node with an older hardware configuration (including a system that was updated from 6.3.0 to 6.4.0 that has 10 Gb Ethernet adapters) might generate event logs indicating that new hardware (the FCoE function) is available and should be enabled with the chnodehw command. If you want to continue to operate remote copy partnerships with systems that are running older levels of software, leave this event log unfixed.

If the additional hardware is activated and a partnership is required to be established with a system that is running pre-6.4 software, then the additional hardware must be disabled first by using the chnodehw -legacy software version (pre 6.4) node id command.

When a node is added to a system, the system checks for (started) partnerships and determine the lowest software level of the partnered systems. This software level is passed to the node that is being added to the system. The node processes the equivalent of a chnodehw -legacy software level command as it joins the system.

After the system update

The audit log content that was on your system before the update is sent to a file in the/dumps/audit directory on the configuration node. The audit log will now contain content that occurs from commands that are run after a successful update of the system.