Miscellaneous troubleshooting

Use this information to help pinpoint potential causes for stateful pod failure.

General troubleshooting
Use the following command for general troubleshooting:
$> kubectl get -n <namespace>  csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi
For example:
$> kubectl get -n csi-ns csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset |
grep ibm-block-csi
csidriver.storage.k8s.io/ibm-block-csi-driver 7d

serviceaccount/ibm-block-csi-controller-sa 1 2h
serviceaccount/ibm-block-csi-node-sa 1 2h
serviceaccount/ibm-block-csi-operator 1 2h

clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrole 2h
clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrole 2h
clusterrole.rbac.authorization.k8s.io/ibm-block-csi-operator 2h

clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrolebinding 2h
clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrolebinding 2h
clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-operator 2h


statefulset.apps/ibm-block-csi-controller 1 1 2h
pod/ibm-block-csi-controller-0 4/4 Running 0 2h
pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h
pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h
pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h

daemonset.extensions/ibm-block-csi-node 2 2 2 2 2 <none> 2h
Error during pod creation
If the following error occurs during stateful application pod creation (the pod status is ContainerCreating):
    -8e73-005056a49b44" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/dm-26 but could not correct them: fsck from util-linux 2.23.2
    /dev/mapper/mpathym: One or more block group descriptor checksums are invalid. FIXED.
    /dev/mapper/mpathym: Group descriptor 0 checksum is 0x0000, should be 0x3baa.

    /dev/mapper/mpathym: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
    (i.e., without -a or -p options)
  1. Log in to the relevant worker node and run the fsck command to repair the filesystem manually.
    fsck /dev/dm-<X>

    The pod should come up immediately. If the pod is still in a ContainerCreating state, continue to the next step.

  2. Run the # multipath -ll command to see if there are faulty multipath devices.
    If there are faulty multipath devices:
    1. Restart multipath daemon, using the systemctl restart multipathd command.
    2. Rescan any iSCSI devices, using the rescan-scsi-bus.sh command.
    3. Restart the multipath daemon again, using the systemctl restart multipathd command.
    The multipath devices should be running properly and the pod should come up immediately.
Error during automatic iSCSI login
If an error during automatic iSCSI login occurs, perform the following steps for manual login:
  1. Verify that the node.startup in the /etc/iscsi/iscsid.conf file is set to automatic. If not, set it as required and then restart the iscsid service ($ service iscsid restart).
  2. Discover and log into at least two iSCSI targets on the relevant storage systems.
    $> iscsiadm -m discoverydb -t st -p ${STORAGE-SYSTEM-iSCSI-PORT-IP1}:3260 --discover
    $> iscsiadm -m node  -p ${STORAGE-SYSTEM-iSCSI-PORT-IP1} --login
    
    $> iscsiadm -m discoverydb -t st -p ${STORAGE-SYSTEM-iSCSI-PORT-IP2}:3260 --discover
    $> iscsiadm -m node  -p ${STORAGE-SYSTEM-iSCSI-PORT-IP2} --login
  3. Verify that the login was successful and display all targets that you logged into. The portal value must be the iSCSI target IP address.
    $> iscsiadm -m session -rescan
    Rescanning session [sid: 1, target: {storage system IQN},
    portal: {storage system iSCSI port IP},{port number}