You can create an enhanced stretched system configuration where each node
on the system is physically on a different site. When used with mirroring technologies, such as
volume mirroring or Copy Services, these configurations can be used to maintain access to data on
the system in the event of power failures or site-wide outages.
Note: If the objective of your solution design is high availability, then it is better to use an IBM®
HyperSwap® topology instead of an
enhanced stretched system configuration. However, if the objectives include topics like disaster
recovery, complex Copy Services, or highest scalability, then consider the restrictions of the
current version of HyperSwap. For more information, see
Planning for high availability.
The enhanced stretched system configuration with the topology
attribute of the system set to stretched is detailed here. Older ways of configuring a stretched
system are described in previous versions of the IBM Knowledge Center that are still supported. It
is possible to non-disruptively move to the current enhanced stretched system configuration by
following the final configuration steps that are presented here so that you get better availability
and disaster recovery. It is also possible to non-disruptively move from stretched system
configuration to HyperSwap system
configuration for even better availability, performance, and disaster recovery. Contact IBM Remote Technical support center for guidance on changing the
topology of an existing system.
In a stretched system configuration, each site is defined as an independent
failure domain. If one site experiences a failure, the other site can continue to operate without
disruption. You must also configure a third site to host a quorum device that provides an automatic
tie-break in the event of a potential link failure between the two main sites. The main site can be
in the same room or across rooms in the data center, buildings on the same campus, or buildings in
different cities. Different kinds of sites protect against different types of failures.
- Sites are within a single location
- If each site is a different power phase within single location or data center, the system can
survive the failure of any single power domain. For example, one node can be placed in one rack
installation and the other node can be in another rack. Each rack is considered a separate site with
its own power phase. In this case, if power was lost to one of the racks, the partner node in the
other rack could be configured to process requests and effectively provide availability to data even
when the other node is offline due to a power disruption.
- Each site is at separate locations
- If each site is a different physical location, the system can survive the failure of any single
location. These sites can span shorter distances, for example two sites in the same city, or they
can be spread farther geographically, such as two sites in separate cities. If one site experiences
a site-wide disaster, the remaining site can remain available to process requests.
If configured properly, the system continues to operate after the loss of one site. The key
prerequisite is that each site contains only one node from each pair of nodes. Simply placing a pair
of nodes from the same system in different sites for a stretched system configuration does not
provide high availability. You must also configure the appropriate mirroring technology and ensure
that all configuration requirements for those technologies are properly configured.
Notes: - In SAN Volume Controller 2145-DH8
models, nodes with
internal flash drives are not recommended for stretched systems.
- Stretched systems can be used with N_Port ID Virtualization (NPIV). In a site
loss, the Fibre Channel failover ports on the remote site nodes open and present to the fabric the
worldwide port names (WWPNs) of the Fibre Channel host ports from the local nodes. NPIV enables
hosts to log back in to these ports without requiring rerouting from the multipath driver. In this
case, more latency might be introduced by the round-trip data transit time to the ports that are
physically at the remote site.
- Stretched system Fibre Channel configurations with active/passive controllers such as IBM
DS5000™, IBM
DS4000®, and IBM DS3000 systems must be configured
with sufficient connections such that all sites have direct access to both external storage systems.
For iSCSI configurations with two or more active/passive controllers such as Storwize® family systems, the systems must be
configured with sufficient connections such that all sites have direct access to both external
storage systems. Quorum access for stretched system is possible only through the current owner of
the MDisk that is being used as the active quorum disk.
You must configure a stretched system to meet the following requirements:
- In Fibre Channel connections, directly connect each node to two or more SAN fabrics at the
primary and secondary sites (2 - 8 fabrics are supported).In iSCSI connections, connect each node to
two or more Ethernet fabrics at the primary and secondary sites. Sites are defined as independent
failure domains. A failure domain is a part of the system within a boundary such that any failure
(such as a power failure, fire, or flood) within that boundary is contained within the boundary and
the failure does not propagate or affect parts outside of that boundary. Failure domains can be in
the same room or across rooms in the data center, buildings on the same campus, or buildings in
different towns. Different kinds of failure domains protect against different types of faults.
- Use a third site to house a quorum disk or IP quorum
application.
Quorum disks cannot be located on iSCSI-attached storage systems; therefore, iSCSI storage cannot be
configured on a third site.
- If a storage
system is used at the third site, it must support extended quorum disks. More information is
available in the interoperability matrixes that are available at the following website:
www.ibm.com/support
- Place independent storage systems at the primary and secondary sites, and use volume mirroring
to mirror the host data between storage systems at the two sites. Where possible, set the preferred
node of each volume to the node in the same site as the host that the volume is mapped to.
- Connections can vary based on fibre type and small form-factor
pluggable (SFP) transceiver (longwave and shortwave).
- Nodes that are in the same I/O group and separated by more
than 100 meters (109 yards) must use longwave Fibre Channel or iSCSI connections. A longwave small form-factor
pluggable (SFP) transceiver can be purchased as an optional
component, and must be one of the longwave SFP
transceivers listed at the following website:
www.ibm.com/support
- Avoid using inter-switch links (ISLs) in paths between nodes and external storage systems. If
this is unavoidable, do not oversubscribe the ISLs because of substantial Fibre Channel traffic
across the ISLs. For most configurations, trunking is required. Because ISL problems are difficult
to diagnose, switch-port error statistics must be collected and regularly monitored to detect
failures.
- Using a single switch at the third site can lead to the creation of a single fabric rather than
two independent and redundant fabrics. A single fabric is an unsupported configuration.
- Ethernet port 1 on every node must be connected to the same subnet or subnets. Ethernet port 2
(if used) of every node must be connected to the same subnet (this might be a different subnet from
port 1). The same principle applies to other Ethernet ports.
- Some service actions require physical access to all nodes in a system. If nodes in a stretched
system are separated by more than 100 meters, service actions might require multiple service
personnel. Contact your service representative to inquire about multiple site support.
A stretched system locates the active quorum disk or an IP quorum application
at a third site. If communication is lost between the primary and secondary sites, the site with
access to the active quorum disk continues to process transactions. If communication is lost to the
active quorum disk, an alternative quorum disk at another site can become the active quorum
disk.
Although a system of nodes can be configured to use up to three quorum disks,
only one quorum disk can be elected to resolve a situation where the system is partitioned into two
sets of nodes of equal size. The purpose of the other quorum disks is to provide redundancy if a
quorum disk fails before the system is partitioned.
Figure 1 illustrates an example stretched system
configuration. When used with
volume mirroring, this configuration provides a
high availability solution that is tolerant of a failure at a single site. If either the primary or
secondary site fails, the remaining sites can continue doing I/O operations. In this configuration,
the connections between
the nodes in the
system are greater than 100 meters apart, and therefore must be longwave
Fibre
Channel connections.
Figure 1. A stretched system with a quorum disk at a third site
In
Figure 1, the storage
system that hosts the third-site quorum disk is attached directly
to a switch at both the primary and secondary sites by using longwave
Fibre Channel connections. If either the primary
site or the secondary site fails, you must ensure that the remaining
site retains direct access to the storage system that hosts the quorum
disks.
Restriction: Do not connect a storage system in
one site directly to a switch fabric in the other site.
An alternative configuration can use an additional Fibre Channel switch at the third site with
connections from that switch to the primary site and to the secondary
site.
A stretched system configuration is supported only when the storage system that hosts the quorum
disks supports extended quorum. Although
other types of
storage systems can be used to provide quorum disks, access to these quorum disks is always through
a single path.
For quorum disk configuration requirements, see the
technote Guidance for Identifying and Changing Managed Disks
Assigned as Quorum Disk Candidates.
When
you set up mirrored volumes in a stretched system configuration, consider
whether you want to set the mirror write priority to redundancy to
maintain synchronization of the copies through temporary delays in
write completions. For more details, see the information about mirrored
volumes.
Stretched system and Metro
Mirror or Global
Mirror
A stretched system is designed to continue operation after the loss of one failure
domain.
The stretched system cannot guarantee that it can operate after the failure of two failure
domains. If the enhanced stretched system function is configured, you can enable a manual override
for this situation. You can also use Metro
Mirror or Global
Mirror on a second
system for
extended disaster recovery with either an enhanced stretched system or a conventional stretched
system. You configure and manage Metro
Mirror or Global
Mirror partnerships that
include a stretched system in the same way as other remote copy relationships.
The
system supports SAN routing technology, which includes FCIP links, for intersystem connections
that use Metro
Mirror or
Global
Mirror.
The
two partner systems can not be in the same production site. However, they can be collocated with the
storage system that provides the active quorum disk for the stretched system.
Configuration steps
These additional configuration steps can be done by using the command-line interface (CLI) or
the
management GUI.
- Each node in the system
must be assigned to a site. Use the chnode CLI command. If additional nodes are cabled to the system, you can specify these nodes as
hot-spare nodes. Hot-spare nodes can nondisruptively take over host I/O operations if any node on
the site becomes unavailable. For more information, see the topic about adding hot-spare
nodes.
- Each back-end storage
system must
be assigned to a site. Use the chcontroller CLI command.
- Each host must be assigned to a site. Use the chhost CLI command
- After all nodes, hosts, and storage
systems are assigned to a site, the
enhanced mode must be enabled by changing the system topology to
stretched.
- For best results, configure an enhanced stretched system to include at least two I/O groups
(four nodes). A system with just one I/O group cannot guarantee to maintain mirroring of data or
uninterrupted host access in the presence of node failures or system updates.
The stretched system
cannot guarantee that it can operate after the failure of two failure domains. You can enable a
manual override for this situation if the enhanced stretched system function is configured. You can
also use Metro Mirror or Global Mirror with either an enhanced
stretched system or a conventional stretched system on a second
system for extended
disaster recovery. You configure and manage Metro Mirror or Global Mirror partnerships that include a
stretched system in the same way as other remote copy relationships.
The system
supports SAN routing technology (including FCIP links) for intersystem connections that use Metro Mirror or Global Mirror.