Miscellaneous troubleshooting
Use this information to help pinpoint potential causes for stateful pod failure.
|
Note: These procedures are applicable for both Kubernetes and Red Hat OpenShift. For Red
Hat OpenShift, replace kubectl with oc in all relevant
commands.
|
- General troubleshooting
- Error during pod creation (for volumes using StatefulSet only)
- Error during automatic iSCSI login
- General troubleshooting
- Use the following command for general
troubleshooting:
For example:$> kubectl get -n <namespace> csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi$> kubectl get -n csi-ns csidriver,sa,clusterrole,clusterrolebinding,statefulset,pod,daemonset | grep ibm-block-csi csidriver.storage.k8s.io/ibm-block-csi-driver 7d serviceaccount/ibm-block-csi-controller-sa 1 2h serviceaccount/ibm-block-csi-node-sa 1 2h serviceaccount/ibm-block-csi-operator 1 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrole 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrole 2h clusterrole.rbac.authorization.k8s.io/ibm-block-csi-operator 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-attacher-clusterrolebinding 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-external-provisioner-clusterrolebinding 2h clusterrolebinding.rbac.authorization.k8s.io/ibm-block-csi-operator 2h statefulset.apps/ibm-block-csi-controller 1 1 2h pod/ibm-block-csi-controller-0 4/4 Running 0 2h pod/ibm-block-csi-node-nbtsg 3/3 Running 0 2h pod/ibm-block-csi-node-wd5tm 3/3 Running 0 2h pod/ibm-block-csi-operator-7684549698-hzmfh 1/1 Running 0 2h daemonset.extensions/ibm-block-csi-node 2 2 2 2 2 <none> 2h - Error during pod creation
-
If the following error occurs during stateful application pod creation (the pod status is ContainerCreating):Note: This troubleshooting procedure is relevant for volumes using file system types only (not for volumes using raw block volume types).
-8e73-005056a49b44" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/dm-26 but could not correct them: fsck from util-linux 2.23.2 /dev/mapper/mpathym: One or more block group descriptor checksums are invalid. FIXED. /dev/mapper/mpathym: Group descriptor 0 checksum is 0x0000, should be 0x3baa. /dev/mapper/mpathym: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options)- Log in to the relevant worker node and run the fsck command to repair the
filesystem manually.
fsck /dev/dm-<X>The pod should come up immediately. If the pod is still in a ContainerCreating state, continue to the next step.
- Run the # multipath -ll command to see if there are
faulty multipath devices.If there are faulty multipath devices:
- Restart multipath daemon, using the systemctl restart multipathd command.
- Rescan any iSCSI devices, using the rescan-scsi-bus.sh command.
- Restart the multipath daemon again, using the systemctl restart multipathd command.
- Log in to the relevant worker node and run the fsck command to repair the
filesystem manually.
- Error during automatic iSCSI login
- If an error during automatic iSCSI login occurs, perform the following steps
for manual login:
Note: These procedures are applicable for both Kubernetes and Red Hat OpenShift. For Red Hat OpenShift, replace kubectl with oc in all relevant commands.Note: This procedure is applicable for both RHEL and RHCOS users. When using RHCOS, use the following:
- Log into the RHCOS node with the core user (for example,
ssh core@worker1.apps.openshift.mycluster.net) - iscsiadm commands must start with sudo
- Verify that the node.startup in the /etc/iscsi/iscsid.conf file is set to automatic. If not, set it as required and then restart the iscsid service ($ service iscsid restart).
- Discover and log into at least two iSCSI targets on the relevant storage systems.
Note: A multipath device can't be created without at least two ports.
$> iscsiadm -m discoverydb -t st -p ${STORAGE-SYSTEM-iSCSI-PORT-IP1}:3260 --discover $> iscsiadm -m node -p ${STORAGE-SYSTEM-iSCSI-PORT-IP1} --login $> iscsiadm -m discoverydb -t st -p ${STORAGE-SYSTEM-iSCSI-PORT-IP2}:3260 --discover $> iscsiadm -m node -p ${STORAGE-SYSTEM-iSCSI-PORT-IP2} --login - Verify that the login was successful and display all targets that you logged into. The
portal value must be the iSCSI target IP address.
$> iscsiadm -m session -rescan
Rescanning session [sid: 1, target: {storage system IQN},
portal: {storage system iSCSI port IP},{port number}
- Log into the RHCOS node with the core user (for example,