Running system recovery using the service assistant

You can use the service assistant to start recovery when all nodes that were members of the system are online and are in candidate status. If any nodes display error code 550 or 578, remove system information to place them into candidate status. Do not run the recovery procedure on different nodes in the same system; this restriction includes remote systems.

Before you begin

Note: Ensure that the web browser is not blocking pop-up windows. If it does, progress windows cannot open.

Before you begin this procedure, read the recover system procedure introductory information; see Recover system procedure.

About this task

Attention: This service action has serious implications if not completed properly. If at any time an error is encountered not covered by this procedure, stop and call the support center.

Run the recovery from any nodes in the system; the nodes must not have participated in any other system.

If the system has USB encryption, run the recovery from any node in the system that has a USB flash drive inserted which contains the encryption key.

If a cluster contains an encrypted cloud account that uses USB encryption, a USB flash drive with the cluster master key must be present in the configuration node before the cloud account can move to the online state. This requirement is necessary when the cluster is powered down, and then restarted.

If the system has key server encryption, note the following items before you proceed with the T3 recovery.
  • Run the recovery on a node that is attached to the key server. The keys are fetched remotely from the key server.
  • Run the recovery procedure on a node that is not hardware replaced or node rescued. All of the information that is required for a node to successfully fetch the key from the key server resides on the node's file system. If the contents of the node's original file system are damaged or no longer exist (rescue node, hardware replacement, file system that is corrupted, and so on), then the recovery fails from this node.
Note: Each individual stage of the recovery procedure can take significant time to complete, depending on the specific configuration.

Procedure

  1. Point your browser to the service IP address of one of the nodes.
    If you do not know the IP address or if it has not been configured, configure the service address in one of the following ways:
    • On SAN Volume Controller models 2145-CG8 and 2145-CF8 nodes, use the front panel menu to configure a service address on the node.
    • On SAN Volume Controller 2145-DH8 nodes, use the technician port to connect to the service assistant and configure a service address on the node.
  2. Log on to the service assistant.
  3. Select Recover System from the navigation.
  4. Follow the online instructions to complete the recovery procedure.
    1. Verify the date and time of the last quorum time. The time stamp must be less than 30 minutes before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
      Attention: If the time stamp is not less than 30 minutes before the failure, call the support center.
    2. Verify the date and time of the last backup date. The time stamp must be less than 24 hours before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
      Attention: If the time stamp is not less than 24 hours before the failure, call the support center.

      Changes that are made after the time of this backup date might not be restored.

Results

Any one of the following categories of messages might be displayed:
  • T3 successful
    The volumes are back online. Use the final checks to get your environment operational again.
  • T3 recovery completed with errors
    T3 recovery completed with errors: One or more of the volumes are offline because there was fast write data in the cache. To bring the volumes online, see Recovering from offline volumes using the CLI for details.
  • T3 failed
    Call the support center. Do not attempt any further action.
Verify that the environment is operational by completing the checks that are provided in What to check after running the system recovery.

If any errors are logged in the error log after the system recovery procedure completes, use the fix procedures to resolve these errors, especially the errors that are related to offline arrays.