Completing recovery procedure for clustered systems by using the front panel

Start recovery when all nodes that were members of the system are online and are in candidate status. If there are any nodes that display error code 550 or error code 578, remove their system data to place them into candidate status. Do not run the recovery procedure on different nodes in the same system; this restriction includes remote clustered systems.

About this task

Attention: This service action has serious implications if not completed properly. If at any time an error is encountered not covered by this procedure, stop and call IBM® Support.
Any one of the following categories of messages may be displayed:
  • T3 successful
    The volumes are online. Use the final checks to make the environment operational; see What to check after running the system recovery.
  • T3 incomplete
    One or more of the volumes is offline because there was fast write data in the cache. Further actions are required to bring the volumes online; see Recovering from offline volumes using the CLI for details (specifically, see the task on recovery from offline VDisks by using the command-line interface (CLI)).
  • T3 failed
    Call IBM Support. Do not attempt any further action.

Start the recovery procedure from any node in the system; the node must not have participated in any other system. To receive optimal results in maintaining the I/O group ordering, run the recovery from a node that was in I/O group 0.

Note: Each individual stage of the recovery procedure might take significant time to complete, dependant upon the specific configuration.

Procedure

  1. Click the up or down button until the Actions menu option is displayed; then, click Select.
  2. Click the up or down button until the Recover Cluster? option is displayed, and then click Select; the node displays Confirm Recover?.
  3. Click Select; the node displays Retrieving.

    After a short delay, the second line displays a sequence of progress messages that indicate the actions are taking place; for example, Finding qdisks. The backup files are scanned to find the most recent configuration backup data.

    After the file and quorum data retrieval is complete, the node displays T3 data: on the top line.

  4. Verify the date and time on the second line of the display. The time stamp that is shown is the date and time of the last quorum update and must be less than 30 minutes before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
    Attention: If the time stamp is not less than 30 minutes before the failure, call IBM support.
  5. After you verify that the time stamp is correct, press and hold the UP ARROW and click Select.

    The node displays Backup file on the top line.

  6. Verify the date and time on the second line of the display. The time stamp that is shown is the date and time of the last configuration backup and must be less than 24 hours before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
    Attention: If the time stamp is not less than 24 hours before the failure, call IBM support.
    Note: Changes that are made after the time of this configuration backup might not be restored.
  7. After you verify that the time stamp is correct, press and hold the UP ARROW and click Select.

    The node displays Restoring. After a short delay, the second line displays a sequence of progress messages that indicate the actions that are taking place; then, the software on the node restarts.

    The node displays Cluster on the top line and a management IP address on the second line. After a few moments, the node displays T3 Completing.

    Note: Any system errors that are logged might temporarily overwrite the display; ignore the message: Cluster Error: 3025. After a short delay, the second line displays a sequence of progress messages that indicate the actions that are taking place.

    When each node is added to the system, the display shows Cluster: on the top line, and the cluster (system) name on the second line.

    Attention: After the last node is added to the system, there is a short delay to allow the system to stabilize. Do not attempt to use the system. The recovery is still in progress. After recovery is complete, the node displays T3 Succeeded on the top line.
  8. Click Select to return the node to normal display.

Results

Recovery is complete when the node displays T3 Succeeded. Verify that the environment is operational by completing the checks that are provided in What to check after running the system recovery.