Removing and replacing a faulty node canister
You can use this procedure to remove a faulty node canister and replace it with a new node canister. You can remove the parts from the faulty node canister and reinstall them into the new node canister.
About this task
Notes:
- Ensure the FRU part number (P/N) of the replacement part matches that of the failed node canister, or is an approved substitute. The FRU P/N is identified on the label of the canister and on the FRU packaging.
- Do not operate the control enclosure with one node canister that is removed for longer than 16 minutes. Operating for longer than this period might cause the enclosure to shut down due to overheating.
- No tools are required to complete this task. Do not remove or loosen any screws.
- Use care when you remove a node canister from the control enclosure. The node canister is long and its center of gravity is far forward. It can be helpful to have a lift or other sturdy, flat surface ready to receive the node canister during removal.
Procedure
- Review the Event Log to identify the faulty node canister.
- Review Procedure: Understanding system volume dependencies to identify any volume dependencies on the node canister.
- Follow Procedure: Powering off a node canister to verify that the hosts will not lose access to data in volumes.
- From the rear of the control enclosure, label each cable and remove it from the node canister.
Removing the faulty node canister
- Remove the node canister, as described in Reseating a node canister in the control enclosure, and place it on a flat, level surface.
-
Remove the new node canister from its packaging.
Ensure that the FRU P/N of the replacement node canister matches that of the failed node canister or that the new P/N is an approved substitute.
- Remove the covers from the faulty and replacement node canisters and set them aside, as described in Removing and replacing the cover of a node canister.
-
Complete the following procedures to remove parts from the faulty node canister and install
them in the replacement canister.
- Removing and replacing a memory module
- Removing and replacing the Trusted Platform Module
- Removing and replacing a fan module
- Removing and replacing the node canister battery
- Removing and replacing a PCIe riserNotes:
- You do not have to remove the SFP transceiver or host interface adapter before you remove each PCIe riser from the faulty node canister. Similarly, you can reinstall a PCIe riser into the new node canister while the SFP and host interface adapter are installed.
- When you install the PCIe risers in the new node canister, ensure that you use the same numbered slots that were used in the faulty node canister.
- Removing and replacing a boot drive
Note: Transfer the boot drive in boot drive slot 1 (closest to the CPU) into the same slot in the replacement node canister.
Replacing the new node canister
- Replace the cover of the new node canister, as described in Removing and replacing the cover of a node canister.
- Install the new node canister into the control enclosure, as described in Reseating a node canister in the control enclosure.
- Reconnect the cables that were removed in step 4 to the appropriate ports in the replacement node canister.
- If the node canister was communicating with other node canisters using RDMA over Ethernet, then use the Service Assistant Tool or the sainfo lsnodeip command to check if the node IP configuration has been lost. Use the Service Assistant Tool or the satask chnodeip command to set the node IP if needed.
- Use the management GUI or service assistant GUI to check that the node canister is online (or is Active) in the system.
-
Enter the service assistant command satask chbootdrive -replacecanister to
update the drives to match the serial number of the new node canister.
Note: Node error code 545 is expected. For more information, see 545.
To help identify the node canister, the inside of the release levers are labeled with the serial number.
- Review the management GUI to determine that all errors are resolved.