Updating the system

The system update process involves the updating of your entire SAN Volume Controller environment.

Start here to update to version 7.5 or later from version 7.4 or later.

If you are updating from a release before version 7.4.0, follow the instructions in that previous release. However, you are required to confirm the update, which is not included in the instructions for your current release. After you follow the instructions for your release, return to the corresponding final instructions in this version. Follow the step that refers to receiving a status message and confirming the update.

Restriction: When you update a system from version 7.1.0 or earlier to version 7.2.0 or later, you must stop any Global Mirror relationships that have their secondary volume on the system that is being updated. You can restart these relationships after the update process completes.
Attention: If you encounter a memory DIMM failure to any node during the update process, stop immediately. Follow this procedure to ensure a successful update:
  1. Replace the DIMM on the failing node.
  2. Remove the node that has the DIMM failure from the system:
    svctask rmnode object_id | object_name
  3. Check the status of the remaining nodes in the system and the update status:
    svcinfo lssoftwareupgradestatus
  4. If the partner node is up and the system update status is updating, update the node in service mode and add it back into the system:
    svctask addnode
    Refer to the addnode command information for possible flags. The update continues.
  5. If the partner node is up and the system update status is stalled, decide whether to complete the update (roll forward) or cancel (roll back). Your decision is partly based on how far through the update you were when the failure occurred. You can roll forward with either a service update strategy or node removal (rmnode command).
    • Roll forward (service update): To manually complete the update, use a service mode update process to update the remaining down-level nodes. After all the nodes are running the same level, the update is committed.
    • Roll forward (rmnode command): Use the rmnode command procedure only if the update is more than or equal to 50% complete.
    • Roll back (cancel the update):
       svctask applysoftware -abort -force
      The -force parameter is required if one or more nodes are offline.  
      Important: Using the -force parameter might result in a loss of access. Choose this option only if the partner node (of your offline node) is at the original code level.
      Updated nodes are rolled back to the original software level, one node at a time.
  6. Verify that all nodes are back and running the same firmware.
  7. Enter the following command:
    svcconfig backup
  8. Verify the health of the system.

For the most recent information about restrictions before you update, search for flashes, alerts, and bulletins at this support site:

www.ibm.com/support

Allow up to a week to plan your tasks, go through your preparatory update tasks, and complete the update of the SAN Volume Controller environment. The update procedures can be divided into the general processes that are shown in Table 1.
Table 1. Updating tasks
Sequence Update task
1 Before you update, become familiar with the prerequisites and tasks involved. Decide whether you want to update automatically or update manually. During an automatic update procedure, the clustered system updates each of the nodes systematically. The automatic method is the preferred procedure for updating software on nodes. However, you can also update each node manually.
2 Ensure that CIM object manager (CIMOM) clients are working correctly. When necessary, update these clients so that they can support the new version of SAN Volume Controller code.
3 Ensure that multipathing drivers in the environment are fully redundant.
4 Update your system. The system update includes component firmware updates. The drive firmware update is a separate process.
5 Update other devices in the SAN Volume Controller environment. Examples might include updating hosts and switches to the correct levels.
Note: The amount of time can vary depending on the amount of preparation work that is required and the size of the environment. For automatic update, it takes about 20 minutes for each node plus 30 minutes for each system. The 30-minute interval provides time for the multipathing software to recover.
Attention: If you experience failover issues with multipathing driver support, resolve these issues before you start normal operations.

Firmware and software for the system and its attached adapters are tested and released as a single package. The package number increases each time that a new release is made.

Some code levels support updates only from specific previous levels, or the code can be installed only on certain hardware types. If you update to more than one level above your current level, you might be required to install an intermediate level. For example, if you are updating from level 1 to level 3, you might need to install level 2 before you can install level 3. For information about the prerequisites for each code level, see this website:

www.ibm.com/support
Attention: Ensure that you have no unfixed errors in the log and that the system date and time are correctly set. Start the fix procedures, and ensure that you fix any outstanding errors before you attempt to concurrently update the code.
Note: After the system software update completes, the Fibre Channel over Ethernet (FCoE) function can be enabled on each node by following the fix procedures for these events by using the management GUI. The FCoE activation procedure involves a node reboot. Allow time for host multipathing to recover between activation of different nodes in the same I/O group.

The update process

During the automatic update process, each node in a system is updated one at a time, and the new code is staged on the nodes. While each node restarts, there might be some degradation in the maximum I/O rate that can be sustained by the system. After all the nodes in the system are successfully restarted with the new code level, the new level is automatically committed.

During an automatic code update, each node of a working pair is updated sequentially. The node that is being updated is temporarily unavailable and all I/O operations to that node fail. As a result, the I/O error counts increase and the failed I/O operations are directed to the partner node of the working pair. Applications do not see any I/O failures. When new nodes are added to the system, the update package is automatically downloaded to the new nodes from the SAN Volume Controller system.

The update can normally be done concurrently with normal user I/O operations. However, performance might be impacted. If any restrictions apply to the operations that can be done during the update, these restrictions are documented on the product website that you use to download the update packages. During the update procedure, most of the configuration commands are not available. Only the following commands are operational from the time the update process starts to the time that the new code level is committed, or until the process is backed out:

  • All information commands
  • The rmnode command

To determine when your update process completes, you are notified through the management GUI. If you are using the command-line interface, issue the lsupdate command to display the status of the update.

Because of the operational limitations that occur during the update process, the code update is a user task. However, if you have problems with an update, contact your support center. Do not try to troubleshoot update problems without technical assistance. For further directions, see the topic about how to get information, help, and technical assistance.

Multipathing driver

Before you update, ensure that the multipathing driver is fully redundant with every path available and online. You might see errors that are related to the paths that are going away (fail over) and the error count increasing during the update. When the paths to the nodes are back, the nodes fall back to become a fully redundant system. After the 30-minute delay, the paths to the other node go down.

If you are using IBM® Subsystem Device Driver (SDD) or IBM Subsystem Device Driver Device Specific Module (SDDDSM) as the multipathing software on the host, increased I/O error counts are displayed by the datapath query device or datapath query adapter commands to monitor the state of the multipathing software. For more information, see the IBM System Storage Multipath Subsystem Device Driver User's Guide for more information about the datapath query commands.

If you are using IBM Subsystem Device Driver Path Control Module (SDDPCM) as the multipathing software on the host, increased I/O error counts are displayed by the pcmpath query device or pcmpath query adapter commands to monitor the state of the multipathing software.

Updating SAN Volume Controller 2145-CG8 or 2145-CF8 systems with internal flash drives

The SAN Volume Controller update process reboots each node in the system in turn. Before the update commences and before each node is updated, the update process checks for dependent volumes.You can check for dependent volumes by using the lsdependentvdisks command-line interface (CLI) command with the node parameter.

Updating systems with internal flash drives that use RAID 0
The update process takes each node offline temporarily to process the update. While the node that contains an internal flash drive is offline, any data written to volumes with a mirrored copy on the offline node are written only to the other online copy. After the updated node rejoins the system, data is resynchronized from the copy that remained online. The update process delays approximately 30 minutes before the update on the partner node is started. The synchronization must complete within this time or the update stalls and requires manual intervention. For any mirrored volume that uses disk extents on a flash drive that is on a SAN Volume Controller node for one or both of its volume copies, set its synchronization rate set to 80 or above to ensure that the resynchronization completes in time.
Note: To increase the amount of time between the two nodes that contain volume copies and prevent them from going offline during the update process, consider manually updating the code.
Table 2 defines the synchronization rates.
Table 2. Resynchronization rates of volume copies
Synchronization rate Data copied/sec
1-10 128 KB
11-20 256 KB
21-30 512 KB
31-40 1 MB
41-50 2 MB
51-60 4 MB
61-70 8 MB
71-80 16 MB
81-90 32 MB
91-100 64 MB
Updating systems with internal flash drives that use RAID 1 or 10
The update process takes each node offline temporarily to process the update. During this time, write operations to a mirrored array on an offline node are written only to the drive that is in the online node. When the node comes back online, the drive that was offline is then resynchronized from the online mirrored array. However, if this synchronization process does not complete before the partner node needs to be updated, the dependent volume process fails and the update stalls.
Attention: To increase the amount of time between the two nodes going offline during the update process, consider manually updating the code.

Metro Mirror and Global Mirror relationships

When you update software on a system that has secondary volumes of running Metro Mirror or Global Mirror relationships, write performance might be degraded on the primary volumes, and Global Mirror relationships can be automatically stopped with one or more errors with error code 1920. You might want to proactively stop such relationships before you update the software to avoid the write performance degradation, and restart the relationships after the update completes.

With SAN Volume Controller version 6.4.0 or later, support for four Fibre Channel and two Fibre Channel over Ethernet (FCoE) ports was enabled. If a system contains these software versions, it is not possible to establish a remote copy partnership with another system that is running a software version earlier than 6.4.0. If a system that runs 6.4.0 or later has an existing remote copy partnership with another system that is running an earlier software version, you cannot add a node with a combined total of more than four Fibre Channel and FCoE ports. You also cannot activate more ports (either by enabling FCoE or installing new hardware) on existing nodes in the system. To resolve these problems, you have two options:
  • Update the software on the remote system to 6.4.0 or later, or
  • Use the chnodehw -legacy CLI command to disable the additional hardware on nodes in the system with 6.4.0 or later software version installed
The -legacy parameter of the chnodehw CLI controls activating and deactivating the FCoE ports.
To activate the additional hardware, run the following CLI command:
chnodehw node id
Where node_name | node_id (required) specifies the node to be modified. The variable that follows the parameter is either:
  • The node name that you assigned when you added the node to the system
  • The node ID that is assigned to the node (not the worldwide node name)
To disable the additional hardware, run the following command:
chnodehw -legacy software_level node id
Where software_level indicates the level of software the node must interoperate with. If the value is less than 6.4.0, then the node configures its hardware to support only a maximum of four Fibre Channel or FCoE ports. And node_name | node_id (required) specifies the node to be modified. The variable that follows the parameter is either:
  • The node name that you assigned when you added the node to the system
  • The node ID that is assigned to the node (not the worldwide node name)
With support for six ports (four Fibre Channel and two FCoE ports) on each node with 6.4.0 code, rules govern how to set up a partnership with a pre-6.4.0 system.
  • A 6.4.0 system cannot form a partnership with a pre-6.4.0 system with more than 4 FC/FCoE I/O ports enabled.
    For example, a multi-system partnership configuration between three systems, A, B, and C.
    A <-> B<-> C
    System A has pre-6.4.0 installed, and systems B and C have 6.4.0 installed.
    The remote copy services are possible in this configuration only if System B does not have FCoE ports enabled.
    Partnerships between systems A and B are not affected because of activated FCoE ports on nodes in system C.
  • If a 6.4.0 system has an already established partnership with a pre-6.4.0 system and if more hardware (four Fibre Channel and two FCoE ports) is enabled while the partnership is stopped, then the partnership cannot be started again until the remote system is updated or the extra hardware is disabled by using the chnodehw -legacy command.
  • A node with an older hardware configuration (including a system that was updated from 6.3.0 to 6.4.0 that has 10 Gb Ethernet adapters) might generate event logs indicating that new hardware (the FCoE function) is available and should be enabled with the chnodehw command. If you want to continue to operate remote copy partnerships with systems that are running older levels of software, leave this event log unfixed.

If the additional hardware is activated and a partnership is required to be established with a system that is running pre-6.4 software, then the additional hardware must be disabled first by using the chnodehw -legacy software version (pre 6.4) node id command.

When a node is added to a system, the system checks for (started) partnerships and determine the lowest software level of the partnered systems. This software level is passed to the node that is being added to the system. The node processes the equivalent of a chnodehw -legacy software level command as it joins the system.

After the system update

The audit log content that was on your system before the update is sent to a file in the/dumps/audit directory on the configuration node. The audit log will now contain content that occurs from commands that are run after a successful update of the system.