Quorum disk configuration

A quorum disk is an MDisk or a managed drive that contains a reserved area that is used exclusively for system management. A system automatically assigns quorum disk candidates. When you add new storage to a system or remove existing storage, however, it is a good practice to review the quorum disk assignments.

It is possible for a system to split into two groups where each group contains half the original number of nodes in the system. A quorum device determines which group of nodes stops operating and processing I/O requests. In this tie-break situation, the first group of nodes that accesses the quorum device is marked as the owner of the quorum device and as a result continues to operate as the system, handling all I/O requests. If the other group of nodes cannot access the quorum device or finds that the quorum device is owned by another group of nodes, it stops operating as the system and does not handle I/O requests.

A system can have only one active quorum device that is used for a tie-break situation. However, the system uses up to three quorum devices to record a backup of system configuration data to be used in the event of a disaster. The system automatically selects one quorum device to be the active quorum device. The active quorum device can be specified by using the chquorum command-line interface (CLI) command with the active parameter. To view the current quorum device status, use the lsquorum command.

The other quorum devices provide redundancy if the active quorum device fails before a system is partitioned. To avoid the possibility of losing all the quorum devices with a single failure, assign quorum disk candidates on multiple storage systems or run IP quorum applications on multiple servers.
Note: Mirrored volumes can be taken offline if no quorum disk is available. The synchronization status for mirrored volumes is recorded on the quorum disk.

In a system with a single control enclosure or without any external managed disks, quorum is automatically assigned to drives. In this scenario, manual configuration of the quorum disks is not required.

In a system with two or more I/O groups, the drives are physically connected to only some of the node canisters. In such a configuration, drives cannot act as tie-break quorum disks; however, they can still be used to back up metadata.

If suitable external MDisks are available, these MDisks are automatically used as quorum disks that do support tie-break situations.

If no suitable external MDisks or IP quorum devices exist, the entire system might become unavailable if exactly half the node canisters in the system become inaccessible (such as due to hardware failure or becoming disconnected from the fabric).

In systems with exactly two control enclosures, an uncontrolled shutdown of a control enclosure might lead to the entire system becoming unavailable because two node canisters become inaccessible simultaneously. It is therefore vital that node canisters are shut down in a controlled way when maintenance is required.

If your system contains NVMe drives, these drives are preferred over other drive types for quorum configuration. Nearline drives and SAS-attached flash drives for quorum disks are avoided. Quorum disk configuration is done based on the following criteria:

These criteria are not requirements. On configurations where meeting all the criteria is not possible, quorum still is automatically configured.

It is possible to assign quorum disks to alternative drives by using the chquorum command. However, you cannot move quorum to a drive that creates a less optimum configuration. You can override the dynamic quorum selection by using the override yes option of the chquorum command. This option is not advised, however, unless you are working with your support center.

When you change the managed disks that are assigned as quorum candidate disks, follow these general guidelines:
  • When you use quorum drives, use drives from the control enclosure when possible for the best available performance and connectivity. If no external MDisks are available, at least one drive and ideally three drives in the control enclosure must be present and have a use that is not unused or failed.
  • When you use SAN-attached quorum MDisks, aim to distribute the quorum candidate disks so that each MDisk is provided by a different storage system. For information about which storage systems are supported for quorum disk use, refer to the supported hardware list.
  • Before you issue the chquorum command, ensure the status of the MDisk or drive that is being assigned as a quorum candidate disk is reported as online.
  • Use smaller capacity MDisks, or use drives as the quorum devices, to significantly reduce the amount of time that might be needed to run a recover system procedure (also known as Tier 3 or T3 recovery), if necessary.

Quorum MDisks or drives in HyperSwap system configurations

To provide protection against failures that affect an entire location (for example, a power failure), you can use active-active relationships with a configuration that splits a single system between two physical locations. For more information, see HyperSwap® configuration details. For detailed guidance about HyperSwap system configuration for high-availability purposes, contact your IBM® regional advanced technical specialist.

If you configure a HyperSwap system, the system automatically selects quorum disks that are placed in each of the three sites.

The following scenarios describe examples that result in changes to the active quorum disk:
  • Scenario 1:
    1. Site 3 is either powered off or connectivity to the site is broken.
    2. If topology is standard, the system selects a quorum disk candidate at site 2 to become the active quorum disk. If topology is HyperSwap, the system operates without any active quorum disk.
    3. Site 3 is either powered on or connectivity to the site is restored.
    4. Assuming that the system was correctly configured initially, the system automatically recovers the configuration when the power is restored.
  • Scenario 2:
    1. The storage system that is hosting the preferred quorum disk at site 3 is removed from the configuration.
    2. If possible, the system automatically configures a new quorum disk candidate.
    3. In HyperSwap topology, the system selects only a new quorum disk that is in site 3. In a standard topology, the system selects a quorum disk candidate at site 1 or 2 to become the active quorum disk.
    4. A new storage system is added to site 3.
    5. In a standard topology, the administrator must reassign all three quorum disks to ensure that the active quorum disk is now at site 3 again. In HyperSwap topology, the system automatically assigns the new active quorum disk when the storage system is installed and the site setting is configured.