System high availability

A system has several features that can be used to deploy a high-availability storage system with no single point of failure.

Volume mirroring provides protection against a storage system failure by mirroring data across storage systems. For example, for disaster recovery, the Metro Mirror and Global Mirror features can be used to mirror data between systems at different physical locations. In addition, in the HyperSwap® system topology, a volume can be active on two I/O groups at different sites. If one site becomes unavailable, the volume can immediately be accessed by the other site.

Each I/O group within a high availability system consists of a pair of nodes. If a node fails within an I/O group, the other node in the I/O group assumes the I/O responsibilities of the failed node.

If the node contains flash drives, the connection from a node to its flash drives can be a single point of failure if the node has an outage. Use RAID 10 or RAID 1 to remove this single point of failure. If the nodes are connected to an expansion enclosure that contains flash drives, that expansion enclosure can be a single point of failure unless the volumes on those flash drives have volume copies on other expansion enclosures or other external storage systems.

If a system of nodes is split into two partitions (for example due to a SAN fabric fault), the partition with most nodes continues to process I/O operations. If a system is split into two equal-sized partitions, a quorum disk is accessed to determine which half of the system continues to read and write data.

For example, when you use Fibre Channel SAN-attached hosts, attach the node canisters to at least two SAN fabrics, and attach each host system to both fabrics.

Each node has four Fibre Channel ports, which can be used to attach the node to multiple SAN fabrics. For high availability, attach the node in a system to at least two fabrics. For high availability, attach the node in a system to at least two fabrics.For high availability, attach the node canister in a system to at least two fabrics. The system software incorporates multipathing software to communicate with the nodes. This software is also used for I/O operations among the node and storage systems. If a SAN fabric fault disrupts communication or I/O operations, the multipathing software recovers and tries the operation again through an alternative communication path. For high availability, configure your Fibre Channel host systems to use multipathing software. If a SAN fabric fault or node failure occurs, I/O operations among Fibre Channel host systems and node are tried again. Subsystem Device Driver (SDD) multipathing software is available from IBM® at no additional charge for use with the system. For more information about Subsystem Device Driver (SDD), go to the Support for IBM Systems website and enter the product name in the Product Finder field.

ibm.co/U06Fbb

iSCSI-attached hosts connect to the system through node Ethernet ports. If a node fails, the system fails over the IP addresses to the partner node in the I/O group to maintain access to the volumes.