The recover system procedure
recovers the entire storage system if the system state is lost from all control enclosure node
canisters. The procedure re-creates the storage system by using saved configuration data and
is also known as Tier 3 (T3) recovery. The saved
configuration data is in the active quorum disk and the latest XML configuration backup file.
The recovery might not be able to restore all volume
data.
CAUTION:
If the system encounters a state where:
- No nodes are active
- One or more nodes have node
errors that require a node rescue, node canister replacement, or node firmware re-installation.
Do not attempt system
recovery. Contact IBM® Support. If you start the system
recovery system procedure while in this specific state, then loss of the XML backup of the block
volume storage configuration can result.
Attention:
- Run service actions only when directed by the fix procedures. If used inappropriately,
service actions can cause loss of access to data or even data loss. Before you attempt to recover a
system, investigate the cause of the failure and attempt to resolve those issues by using other fix
procedures. Read and understand all of the instructions before you complete any action.
- The recovery procedure can take several hours if the system uses large-capacity
devices as quorum devices.
- If there are offline arrays after running the recovery procedure, contact IBM
Support.
Do not attempt the recover system procedure unless the following
conditions are
met:
- All of the conditions are met in When to run the recover system procedure.
- All hardware errors are fixed. See Fix hardware errors
- All node canisters
have candidate status. Otherwise, see step 1.
- All node canisters
must be at the same level of code that the storage system had before the system failure. If any node
canisters were modified or replaced, use the service assistant to verify the levels of code, and
where necessary, to reinstall the level of code so that it matches the level that is running on the
other node canisters in the system.
- If the system was using IP quorum for T3 metadata, verify that all
the IP quorum applications are running.
The system recovery procedure is one of several tasks that must be completed.
The following list is an overview of the tasks and the order in which they must be
completed:
- Preparing for system recovery:
- Review the information about when to run the recover system procedure.
- Fix your hardware errors and make sure that all nodes in the system are shown in
service assistant or in the output from sainfo
lsservicenodes.
- Remove the system
information for node canisters with error code 550 or error code 578 by using the service assistant,
but only if the recommended user response for these node errors are followed. See Removing system information for node canisters with error code 550 or error code 578 using the service assistant.
- For Virtual Volumes (VVols), shut down the services for any instances of Spectrum
Control Base that are connecting to the system. Use the Spectrum Control Base
command service ibm_spectrum_control stop.
- Running the system recovery. After you prepared the system for recovery and met all
the pre-conditions, run the system recovery.
Note: Run the procedure on one system in
a fabric at a time. Do not run the procedure on different node canisters in the same system. This
restriction also applies to remote systems.
- Completing actions to get your environment operational.
- Recovering from offline volumes by using the CLI.
- Checking your system, for example, to ensure that all mapped volumes can access
the host.