Troubleshooting the IBM Storage Enabler for Containers
This section can help you detect and solve problems that you might encounter when using the IBM Storage Enabler for Containers.
Checking logs
You can use the IBM Storage Enabler for Containers logs for problem identification. To collect
and display logs, related to the different components of IBM Storage Enabler for Containers, use the
following Kubernetes commands:
- Log collection – ./ubiquity_cli.sh -a collect_logs. The logs are kept in a folder, named as ./ubiquity_collect_logs_MM-DD-YYYY-h:m:s. The folder is placed in the directory, from which the log collection command was run.
- IBM Storage Enabler for Containers – $> kubectl logs -n ubiquity deploy/ubiquity.
- IBM Storage Enabler for Containers database – $> kubectl logs -n ubiquity deploy/ubiquity-db.
- IBM Storage Kubernetes Dynamic Provisioner – $> kubectl logs -n ubiquity deploy/ubiquity-k8s-provisioner.
- IBM Storage Kubernetes FlexVolume for a pod – $> kubectl logs -n ubiquity pod ubiquity-k8s-flex<pod_ID>. In addition, events for all pods on a specific Kubernetes node are recorded in the ubiquity-k8s-flex.log file. You can view this file in the following default directory: /var/log. You can change this directory by configuring FLEX-LOG-DIR parameter in the ubiquity-configmap.yml file, as detailed in Updating the Enabler for Containers configuration files.
- Controller-manager:
- Static pod – kubectl get pods -n kube-system to display the master pod name. Then, kubectl logs -n kube-system pod_name to check the logs.
- Non-static pod – journalctl to display the system journal. Then, search for the lines that have controller-manager entries.
Detecting errors
This is an overview of actions that you can take to pinpoint a potential cause for a stateful pod failure. The table at the end of the procedure describes the problems and provides possible corrective actions.- Run the ubiquitu_cli.sh -a status_wide command to check if:
- All Kubernetes pods are in Running state.
- All PVCs are in Bound state.
- ubiquity-k8s-flex pod exists on each master node in the cluster. If you have three master nodes and five worker nodes, you must see a eight ubiquity-k8s-flex pods.
Note: The output of the ubiquitu_cli.sh -a status_wide is similar to the ./ubiquity_cli.sh -a status output, illustrated in the What to do next section of Performing installation of IBM Storage Enabler for Containers. - If you find no errors, but still unable to create or delete pods with PVCs, continue to the next step.
- Display the malfunctioned stateful pod ($> kubectl describe pod pod_ ID). Usually, pod description contains information about possible cause of the failure. Then, proceed with reviewing the IBM Storage Enabler for Containers logs.
- Display the IBM Storage Kubernetes FlexVolume log for the active master node (the node that the
controller-manager is running on). Use the $> kubectl logs -n ubiquity
ubiquity-k8s-flex-<pod_ID_running_on_master_node> command. As the controller-manager
triggers the storage system volume mapping, the log displays details of the FlexVolume attach or
detach operations.
Additional information can be obtained from the controller-manager log as well. - Review the IBM Storage Kubernetes FlexVolume log for the worker node, on which the container pod
is scheduled. Use the $> kubectl logs -n ubiquity
ubiquity-k8s-flex-<pod_ID_running_on_worker_node> command. As the kubelet
service on the worker node triggers the FlexVolume mount and umount operations, the log is expected
to display the complete volume mounting flow.
Additional information can be obtained from the kubelet service as well, using the $> journalctl -u kubelet command. - Display the IBM Storage Enabler for Containers server log ($> kubectl logs -n ubiquity deploy/ubiquity command) or its database log ($> kubectl logs -n ubiquity deploy/ubiquity-db command) to check for possible failures.
- Display the IBM Storage Dynamic Provisioner log ($> kubectl logs -n ubiquity
ubiquity-k8s-provisioner) to identify any problem related to volume provisioning.
Note: In general, you can use a request ID of a log entry to identify a particular event in the IBM Storage Enabler for Containers log. This will help you understand if the event is related to FlexVolume or Dynamic Provisioner.
For example, if you have this event stored in the Provisioner log INFO provision.go:141 volume::Provision [9a48c08c-6e3e-11e8-b510-a2547d4dae22-Provision] PVC with capacity 1073741824, the string section 9a48c08c-6e3e-11e8-b510-a2547d4dae22 serves as a request ID. It identifies the event as related to the volume provisioning. Then, in the Enabler for Containers log, you can easily detect all entries with the same request ID, identifying them as relevant to volume provisioning.
The same identification method can be applied to events, originating from the FlexVolume log. - View the Spectrum Connect Spectrum Connect log (hsgsrv.log) for list of additional events related to the storage system and volume operations.
| Description | Corrective action |
|---|---|
| IBM Storage Kubernetes FlexVolume log for the active master node has no attach operations | Verify that:
|
| IBM Storage Kubernetes FlexVolume log for the worker node that runs the pod has no new entries, except for ubiquitytest (Kubernetes 1.6 or 1.7 only) | Restart the kubelet on Kubernetes worker and master nodes. See Performing installation of IBM Storage Enabler for Containers. |
| IBM Storage Kubernetes FlexVolume log for the worker node that runs the pod contains errors, related to WWN identification in the multipath -ll output | Check that:
|
| No connectivity between the FlexVolume pod and the IBM Storage Enabler for Containers server | Log into the node and run the FlexVolume in a test mode ($>
/usr/libexec/kubernetes/kubelet-plugins/volume/exec/ibm~ubiquity-k8s-flex/ubiquity-k8s-flex
testubiquity). If there is an error, make sure the IP of ubiquity service is the same as configured in the ubiquity-configmap.yml file. If not, configure the IP properly, then delete the FlexVolume DeamonSet and re-create it to apply the new address value. |
| Failure to mount a storage volume to a Kubernetes node | If the FlexVolume fails to locate a WWPN within multipath devices, verify your multipathing configuration and connectivity to a storage system. See Compatibility and requirements for IBM Storage Enabler for Containers. |
| IBM Storage Enabler for Containers database fails to achieve the Running status after the configured timeout expires |
|
| IBM Storage Enabler for Containers database persists in the Creating status. In addition, the Volume has not been added to the list of VolumesInUse in the node's volume status message is stored in /var/log/message file on the node, where the database is deployed. | To resolve this, move kube-controller-manager.yaml out and into
/etc/kubernetes/manifests/ to be recreated the control-manager
pod:
|
| Persistent volume remains in the Delete state, failing to release | Review the Provisioner log ($> kubectl logs -n ubiquity deploy/ubiquity-k8s-provisioner) to identify the reason for deletion failure. Use the $ kubectl delete command to delete the volume. Then, contact the storage administrator to remove the persistent volume on the storage system itself. |
| Communication link between IBM Storage Dynamic Provisioner and other solution elements fails due to Provisioner token expiration | IBM Storage Dynamic Provisioner uses a token that in some environments has an expiration
time, for example twelve hours. To keep the link alive for an unlimited time, you can use a
service-account token without expiration time. You can replace the current token with
the service-account token, as follows:
|
| A pod creation fails and the following error is stored in the FlexVolume log of the node intended for the pod: DEBUG 4908 executor.go:63 utils::Execute Command executed with args and error and output. [[{command=iscsiadm} {args=[-m session --rescan]} {error=iscsiadm: No session found.} {output=}]]" | Verify that the node has iSCSI connectivity to the storage system. If the node has none, see the Compatibility and requirements for IBM Storage Enabler for Containers section for instructions on how to discover and log into iSCSI targets on the storage system. |
| Status of a stateful pod on a malfunctioned (crashed) node is Unknown | Manually recover the crashed node, as described in the Recovering a crashed Kubernetes node section. |
| A pod becomes unresponsive, persisting in the ContainerCreating status.
The "error=command [mount] execution failure [exit status 32]" error is stored in the
FlexVolume log of the node, where the pod was scheduled. The failure occurs because the mountPoint already exists on this node. This might happen due to earlier invalid pod deletion. |
Manually recover the pod, using the following procedure:
|
| A pod becomes unresponsive, persisting in the ContainerCreating status. An error indicating a failure to discover a new volume WWN, while running the multipath -ll command, is stored in the FlexVolume log. This log belongs to the node, where the pod was scheduled. | Restart the multipathd service by running the service multipathd restart command on the worker node, where the pod was scheduled. |