Troubleshooting
This section can help you detect and solve problems that you might encounter when using the IBM block storage CSI driver.
Checking logs
You can use the CSI (Container Storage Interface) driver logs for problem
identification. To collect and display logs, related to the different components of IBM block storage CSI driver, use the following Kubernetes commands:
- Log collection for CSI pods, daemonset, and StatefulSet
For example:$> kubectl get all -n kube-system -l csi$> kubectl get all -n kube-system -l csi NAME READY STATUS RESTARTS AGE pod/ibm-block-csi-controller-0 4/4 Running 0 2h pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ibm-block-csi-node 2 2 2 2 2 <none> 2h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/ibm-block-csi-operator 1 1 1 1 2h NAME DESIRED CURRENT READY AGE replicaset.apps/ibm-block-csi-operator-7684549698 1 1 1 2h NAME DESIRED CURRENT AGE statefulset.apps/ibm-block-csi-controller 1 1 2h- Log collection for IBM block storage CSI driver controller
-
$> kubectl log -f -n kube-system ibm-block-csi-controller-0 -c ibm-block-csi-controller - Log collection for IBM block storage CSI driver node (per worker node or PODID)
-
$> kubectl log -f -n kube-system ibm-block-csi-node-<PODID> -c ibm-block-csi-node
Detecting errors
This is an overview of actions that you can take to pinpoint a potential cause for a stateful pod failure. The table at the end of the procedure describes the problems and provides possible corrective actions.- Verify that the CSI driver is running.
(Make sure the csi-controller pod status is
Running).
$> kubectl get all -n kube-system -l csi NAME READY STATUS RESTARTS AGE pod/ibm-block-csi-controller-0 4/4 Running 0 2h pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ibm-block-csi-node 2 2 2 2 2 <none> 2h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/ibm-block-csi-operator 1 1 1 1 2h NAME DESIRED CURRENT READY AGE replicaset.apps/ibm-block-csi-operator-7684549698 1 1 1 2h NAME DESIRED CURRENT AGE statefulset.apps/ibm-block-csi-controller 1 1 2h - If pod/ibm-block-csi-controller-0 is not in a Running
state, run the following command:
$> kubectl describe -n kube-system pod/ibm-block-csi-controller-0View the logs.
Miscellaneous troubleshooting
- General troubleshooting
- Use the following command for general
troubleshooting:
For example:$> kubectl get -n kube-system csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi$> kubectl get -n kube-system csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi csidriver.storage.k8s.io/ibm-block-csi-driver 7d serviceaccount/ibm-block-csi-controller-sa 1 2h serviceaccount/ibm-block-csi-node-sa 1 2h serviceaccount/ibm-block-csi-operator 1 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrole 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrole 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-operator 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrolebinding 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrolebinding 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-operator 2h statefulset.apps/ibm-block-csi-controller 1 1 2h pod/ibm-block-csi-controller-0 4/4 Running 0 2h pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h daemonset.extensions/ibm-block-csi-node 2 2 2 2 2 <none> 2h - Error during pod creation
- If the following error occurs during stateful application pod creation (the pod status is
ContainerCreating):
-8e73-005056a49b44" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/dm-26 but could not correct them: fsck from util-linux 2.23.2 /dev/mapper/mpathym: One or more block group descriptor checksums are invalid. FIXED. /dev/mapper/mpathym: Group descriptor 0 checksum is 0x0000, should be 0x3baa. /dev/mapper/mpathym: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options)- Log in to the relevant worker node and run the fsck command to repair the
filesystem manually.
fsck /dev/dm-<X>The pod should come up immediately. If the pod is still in a ContainerCreating state, continue to the next step.
- Run the # multipath -ll command to see if
there are faulty multipath devices.If there are faulty multipath devices:
- Restart multipath daemon, using the systemctl restart multipathd command.
- Rescan any iSCSI devices, using the rescan-scsi-bus.sh command.
- Restart the multipath daemon again, using the systemctl restart multipathd command.
- Log in to the relevant worker node and run the fsck command to repair the
filesystem manually.