MAP 6001: Replace offline SSD in a RAID 0 array

MAP 6001: This procedure replaces a flash drive that has failed while it is still a member of a storage pool.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first read Using the maintenance analysis procedures.

This map applies to models with internal flash drives. Be sure that you know which model you are using before you start this procedure. To determine which model you are working on, look for the label that identifies the model type on the front of the node.

Attention:
  1. Back up your SAN Volume Controller configuration before you begin these steps.
  2. If the drive use property is member and the drive must be replaced, contact IBM support before taking any actions.

About this task

Perform the following steps only if a drive in a RAID 0 (striped) array has failed:

Procedure

  1. Record the properties of all volume copies, MDisks, and storage pools that are dependent on the failed drive.
    1. Identify the drive ID and the error sequence number with status equals offline and use equals failed using the lsdrive CLI command.
    2. Review the offline reason using the lsevent <seq_no> CLI command.
    3. Obtain detailed information about the offline drive or drives using the lsdrive <drive_id> CLI command.
    4. Record the mdisk_id, mdisk_name, node_id, node_name, and slot_id for each offline drive.
    5. Obtain the storage pools of the failed drives using the lsmdisk <mdisk_id> CLI command for each MDisk that was identified in the substep 1c.

      Continue with the following steps by replacing all the failed drives in one of the storage pools. Make note of the node, slot, and ID of the selected drives.

    6. Determine all the MDisks in the storage pool using the lsmdisk -filtervalue mdisk_grp_id=<grp id> CLI command.
    7. Identify which MDisks are internal (ctrl_type equals 4) and which MDisks contain SSDs (ctrl_type equals 6).
    8. Find the volumes with extents in the storage pool using the lsmdiskmember <mdisk_id> CLI command for each MDisk found in substep 1f.

      It is likely that the same volumes will be returned for each MDisk.

    9. Record all the properties on each volume listed in step 1h by using the lsvdisk <vdisk_id> CLI command. For each volume check if it has online volume copies which indicate it is mirrored. Use this information in step 9.
    10. Obtain a list of all the drives in each internal MDisk in the storage pool using the lsdrive -filtervalue mdisk_id=<mdisk_id> CLI command. Use this information in step 8.
    11. Record all the properties of all the MDisks in the storage pool using the lsmdisk <mdisk_id> CLI command. Use this information in step 8.
    12. Record all the properties of the storage pool using the lsmdisk <mdisk_id> CLI command. Use this information in step 7.
    Note: If a listed volume has a mirrored, online, and in-sync copy, you can recover the copied volume data from the copy. All the data on the unmirrored volumes will be lost and will need to be restored from backup.
  2. Delete the storage pool using the rmmdiskgrp -force <mdiskgrp id> CLI command.

    All MDisks and volume copies in the storage pool are also deleted. If any of the volume copies were the last in-sync copy of a volume, all the copies that are not in sync are also deleted, even if they are not in the storage pool.

  3. Using the drive ID that you recorded in substep 1e, set the use property of the drive to unused using the chdrive command.
    chdrive -use unused <id of offline drive>

    The drive is removed from the drive listing.

  4. Follow the physical instructions to replace or remove a drive. See the "Replacing a SAN Volume Controller 2145-CG8 flash drive" documentation or the "Removing a SAN Volume Controller 2145-CG8 flash drive" documentation to find out how to perform the procedures.
  5. A new drive object is created with the use attribute set to unused. This action might take several minutes.

    Obtain the ID of the new drive using the lsdrive CLI command.

  6. Change the use property for the new drive to candidate.
    chdrive -use candidate <drive id of new drive>
  7. Create a new storage pool with the same properties as the deleted storage pool. Use the properties that you recorded in substep 1l.
    mkmdiskgrp -name <mdiskgrp name as before> -ext <extent size as before>
  8. Create again all MDisks that were previously in the storage pool using the information from steps 1j and 1k.
    • For internal RAID 0 MDisks, use this command:
      mkarray -level raid0 -drive <list of drive IDs> -name 
      <mdisk_name> <mdiskgrp id or name>

      where -name <mdisk_name> is optional, but you can use the parameter to make the new array have the same MDisk name as the old array.

    • For external MDisks, use the addmdisk CLI command.
    • For non-RAID 0 MDisks, use the mkarray CLI command.
  9. For all the volumes that had online, in sync, mirrored volume copies before the MDisk group was deleted, add a new volume copy in the new storage pool to restore redundancy using the following command:
    addvdiskcopy -mdiskgrp <mdiskgrp id> -vtype striped -easytier 
    <on or off as before> <vdisk_id>
  10. For any volumes that did not have an online, in sync, mirrored copy, create the volume again and restore the data from a backup or use other methods.
  11. Mark the drive error as fixed using the error sequence number from step 1b.
    cherrstate -sequencenumber <error_sequence_number>