Troubleshooting

This section can help you detect and solve problems that you might encounter when using the IBM block storage CSI driver.

Checking logs

You can use the CSI (Container Storage Interface) driver logs for problem identification. To collect and display logs, related to the different components of IBM block storage CSI driver, use the following Kubernetes commands:
Log collection for CSI pods, daemonset, and StatefulSet
$> kubectl get all -n kube-system  -l csi
For example:
$> kubectl get all -n kube-system -l csi
NAME READY STATUS RESTARTS AGE
pod/ibm-block-csi-controller-0 4/4 Running 0 2h
pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h
pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h
pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/ibm-block-csi-node 2 2 2 2 2 <none> 2h

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/ibm-block-csi-operator 1 1 1 1 2h

NAME DESIRED CURRENT READY AGE
replicaset.apps/ibm-block-csi-operator-7684549698 1 1 1 2h

NAME DESIRED CURRENT AGE
statefulset.apps/ibm-block-csi-controller 1 1 2h
Log collection for IBM block storage CSI driver controller
$> kubectl log -f -n kube-system ibm-block-csi-controller-0 -c ibm-block-csi-controller
Log collection for IBM block storage CSI driver node (per worker node or PODID)
$> kubectl log -f -n kube-system ibm-block-csi-node-<PODID> -c ibm-block-csi-node

Detecting errors

This is an overview of actions that you can take to pinpoint a potential cause for a stateful pod failure. The table at the end of the procedure describes the problems and provides possible corrective actions.
  1. Verify that the CSI driver is running. (Make sure the csi-controller pod status is Running).
    $> kubectl get all -n kube-system -l csi
    NAME READY STATUS RESTARTS AGE
    pod/ibm-block-csi-controller-0 4/4 Running 0 2h
    pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h
    pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h
    pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h
    
    NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    daemonset.apps/ibm-block-csi-node 2 2 2 2 2 <none> 2h
    
    NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
    deployment.apps/ibm-block-csi-operator 1 1 1 1 2h
    
    NAME DESIRED CURRENT READY AGE
    replicaset.apps/ibm-block-csi-operator-7684549698 1 1 1 2h
    
    NAME DESIRED CURRENT AGE
    statefulset.apps/ibm-block-csi-controller 1 1 2h
  2. If pod/ibm-block-csi-controller-0 is not in a Running state, run the following command:
    $> kubectl describe -n kube-system pod/ibm-block-csi-controller-0

    View the logs.

Miscellaneous troubleshooting

General troubleshooting
Use the following command for general troubleshooting:
$> kubectl get -n kube-system  csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi
For example:
$> kubectl get -n kube-system csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset |
grep ibm-block-csi
csidriver.storage.k8s.io/ibm-block-csi-driver 7d

serviceaccount/ibm-block-csi-controller-sa 1 2h
serviceaccount/ibm-block-csi-node-sa 1 2h
serviceaccount/ibm-block-csi-operator 1 2h

clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrole 2h
clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrole 2h
clusterrole.rbac.authorization.k8s.io/ibm-block-csi-operator 2h

clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrolebinding 2h
clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrolebinding 2h
clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-operator 2h


statefulset.apps/ibm-block-csi-controller 1 1 2h
pod/ibm-block-csi-controller-0 4/4 Running 0 2h
pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h
pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h
pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h

daemonset.extensions/ibm-block-csi-node 2 2 2 2 2 <none> 2h
Error during pod creation
If the following error occurs during stateful application pod creation (the pod status is ContainerCreating):
    -8e73-005056a49b44" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/dm-26 but could not correct them: fsck from util-linux 2.23.2
    /dev/mapper/mpathym: One or more block group descriptor checksums are invalid. FIXED.
    /dev/mapper/mpathym: Group descriptor 0 checksum is 0x0000, should be 0x3baa.

    /dev/mapper/mpathym: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
    (i.e., without -a or -p options)
  1. Log in to the relevant worker node and run the fsck command to repair the filesystem manually.
    fsck /dev/dm-<X>

    The pod should come up immediately. If the pod is still in a ContainerCreating state, continue to the next step.

  2. Run the # multipath -ll command to see if there are faulty multipath devices.
    If there are faulty multipath devices:
    1. Restart multipath daemon, using the systemctl restart multipathd command.
    2. Rescan any iSCSI devices, using the rescan-scsi-bus.sh command.
    3. Restart the multipath daemon again, using the systemctl restart multipathd command.
    The multipath devices should be running properly and the pod should come up immediately.