Failover Cluster
Step-by-Step Guide: Configuring the Quorum in a Failover Cluster
A failover
cluster is a group of independent computers that work together to increase the
availability of applications and services. The clustered servers (called nodes)
are connected by physical cables and by software. If one of the cluster nodes
fails, another node begins to provide service (a process known as failover).
Users experience a minimum of disruptions in service.
This guide
describes the new quorum options in failover clusters in
Windows Server® 2008 and provides steps for configuring the quorum in
a failover cluster. By following the configuration steps in this guide, you can
learn about failover clusters and familiarize yourself with quorum modes in failover
clustering.
In Windows
Server 2008, the improvements to failover clusters (formerly known as
server clusters) are aimed at simplifying clusters, making them more secure,
and enhancing cluster stability. Cluster setup and management are easier.
Security and networking in clusters have been improved, as has the way a
failover cluster communicates with storage. For more information about
improvements to failover clusters, see http://go.microsoft.com/fwlink/?LinkId=62368.
In
this guide
For additional
background information, also see Appendix A: Details of How
Quorum Works in a Failover Cluster and Appendix B: Additional
Information About Quorum Modes.
In simple terms,
the quorum for a cluster is the number of elements that must be online for that
cluster to continue running. In effect, each element can cast one “vote” to
determine whether the cluster continues running. The voting elements are nodes
or, in some cases, a disk witness or file share witness. Each voting element
(with the exception of a file share witness) contains a copy of the cluster
configuration, and the Cluster service works to keep all copies synchronized at
all times.
It is essential
that the cluster stops running if too many failures occur or if there is a
problem with communication between the cluster nodes. For a more detailed
explanation, see the next section, Why quorum is necessary.
Note that the
full function of a cluster depends not just on quorum, but on the capacity of
each node to support the services and applications that fail over to that node.
For example, a cluster that has five nodes could still have quorum after two
nodes fail, but each remaining cluster node would continue serving clients only
if it had enough capacity to support the services and applications that failed
over to it.
When network
problems occur, they can interfere with communication between cluster nodes. A
small set of nodes might be able to communicate together across a functioning
part of a network, but might not be able to communicate with a different set of
nodes in another part of the network. This can cause serious issues. In this
“split” situation, at least one of the sets of nodes must stop running as a
cluster.
To prevent the
issues that are caused by a split in the cluster, the cluster software requires
that any set of nodes running as a cluster must use a voting algorithm to
determine whether, at a given time, that set has quorum. Because a given
cluster has a specific set of nodes and a specific quorum configuration, the
cluster will know how many “votes” constitutes a majority (that is, a quorum).
If the number drops below the majority, the cluster stops running. Nodes will
still listen for the presence of other nodes, in case another node appears
again on the network, but the nodes will not begin to function as a cluster
until the quorum exists again.
For example, in a
five node cluster that is using a node majority, consider what happens if nodes
1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes
1, 2, and 3 constitute a majority, and they continue running as a cluster.
Nodes 4 and 5 are a minority and stop running as a cluster, which prevents the
problems of a “split” situation. If node 3 loses communication with other
nodes, all nodes stop running as a cluster. However, all functioning nodes will
continue to listen for communication, so that when the network begins working
again, the cluster can form and begin to run.
For more
information about how quorum works, see Appendix A: Details of How
Quorum Works in a Failover Cluster.
There have been
significant improvements to the quorum model in Windows Server 2008. In
Windows Server 2003, almost all server clusters used a disk in cluster
storage (the “quorum resource”) as the quorum. If a node could communicate with
the specified disk, the node could function as a part of a cluster, and
otherwise it could not. This made the quorum resource a potential single point
of failure. In Windows Server 2008, a majority of ‘votes’ is what
determines whether a cluster achieves quorum. Nodes can vote, and where
appropriate, either a disk in cluster storage (called a “disk witness”) or a
file share (called a “file share witness”) can vote. There is also a quorum
mode called No Majority: Disk Only which functions like the disk-based
quorum in Windows Server 2003. Aside from that mode, there is no single
point of failure with the quorum modes, since what matters is the number of
votes, not whether a particular element is available to vote.
This new quorum
model is flexible and you can choose the mode best suited to your cluster.
Important
|
In most
situations, it is best to use the quorum mode selected by the cluster
software. If you run the quorum configuration wizard, the quorum mode that
the wizard lists as “recommended” is the quorum mode chosen by the cluster
software. We only recommend changing the quorum configuration if you have
determined that the change is appropriate for your cluster.
|
There are four
quorum modes:
- Node Majority: Each node that is available and in communication can vote. The cluster functions only with a majority of the votes, that is, more than half.
- Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than half.
- Node and File Share Majority: Each node plus a designated file share created by the administrator (the “file share witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than half.
- No Majority: Disk Only: The cluster has quorum if one node is available and in communication with a specific disk in the cluster storage.
The following
table describes clusters based on the number of nodes and other cluster
characteristics, and lists the quorum mode that is recommended in most cases.
A “multi-site”
cluster is a cluster in which an investment has been made to place sets of
nodes and storage in physically separate locations, providing a disaster
recovery solution.
Description of cluster
|
Quorum recommendation
|
Odd
number of nodes
|
Node
Majority
|
Even
number of nodes (but not a multi-site cluster)
|
Node
and Disk Majority
|
Even
number of nodes, multi-site cluster
|
Node
and File Share Majority
|
Even
number of nodes, no shared storage
|
Node
and File Share Majority
|
The following
diagrams show how each of the quorum modes affects whether a cluster can or
cannot achieve quorum.
The following
diagram shows Node Majority used (as recommended) for a cluster with an odd
number of nodes.
Node
Majority quorum configuration, three nodes
In this mode,
each node gets one vote. In certain circumstances, you might want to install a
hotfix that lets you select which nodes will have votes. This can be useful
with certain multi-site clusters, for example, where you want one site to have
more votes than other sites in a disaster recovery situation. For more
information, see Changing the quorum
configuration in a failover cluster for unequal node weight.
The following
diagram shows Node and Disk Majority used (as recommended) for a cluster with
an even number of nodes. Each node can vote, as can the disk witness.
Node
and Disk Majority quorum configuration, four nodes (plus disk)
The following
diagram shows how the disk witness also contains a replica of the cluster
configuration database in a cluster that uses Node and Disk Majority.
Replicas
of cluster configuration in cluster that uses Node and Disk Majority
The following
diagram shows Node and File Share Majority used (as recommended) for a cluster
with an even number of nodes and a situation where having a file share witness
works better than having a disk witness. Each node can vote, as can the file
share witness.
Node
and File Share Majority quorum configuration, four nodes (plus file share)
The following
diagram shows how the file share witness can vote, but does not contain a
replica of the cluster configuration database. Note that the file share witness
does contain information about which version of the cluster configuration
database is the most recent.
Replicas
of cluster configuration in cluster that uses Node and File Share Majority
The following
illustration shows how a cluster that uses the disk as the only determiner of
quorum can run even if only one node is available and in communication with the
quorum disk. It also shows how the cluster cannot run if the quorum disk is not
available (single point of failure). For this cluster, which has an odd number
of nodes, Node Majority is the recommended quorum mode.
No
Majority: Disk Only quorum configuration, three nodes
Before
configuring the quorum for a failover cluster you must of course meet the requirements
for the cluster itself. For information about cluster requirements, see http://go.microsoft.com/fwlink/?LinkId=114536. For information about cluster
validation, see http://go.microsoft.com/fwlink/?LinkId=114537 and http://go.microsoft.com/fwlink/?LinkId=114538.
For a cluster
using the Node Majority quorum mode (which includes almost all clusters with an
odd number of nodes), there are no additional requirements for the quorum. The
following sections provide guidelines for clusters using the Node and Disk
Majority quorum mode and the Node and File Share Majority quorum mode. (The
requirements and recommendations for the Node and Disk Majority mode also apply
to the No Majority: Disk Only mode.)
When using the
Node and Disk Majority mode, review the following requirements and
recommendations for the disk witness.
Note
|
These
requirements and recommendations also apply to the quorum disk for the No
Majority: Disk Only mode.
|
- Use a small Logical Unit Number (LUN) that is at least 512 MB in size.
- Choose a basic disk with a single volume.
- Make sure that the LUN is dedicated to the disk witness. It must not contain any other user or application data.
- Choose whether to assign a drive letter to the LUN based on the needs of your cluster. The LUN does not have to have a drive letter (to conserve drive letters for applications).
- As with other LUNs that are to be used by the cluster, you must add the LUN to the set of disks that the cluster can use. For more information, see http://go.microsoft.com/fwlink/?LinkId=114539.
- Make sure that the LUN has been verified with the Validate a Configuration Wizard.
- We recommend that you configure the LUN with hardware RAID for fault tolerance.
- In most situations, do not back up the disk witness or the data on it. Backing up the disk witness can add to the input/output (I/O) activity on the disk and decrease its performance, which could potentially cause it to fail.
- We recommend that you avoid all antivirus scanning on the disk witness.
- Format the LUN with the NTFS file system.
If there is a
disk witness configured, but bringing that disk online will not achieve quorum,
then it remains offline. If bringing that disk online will achieve quorum, then
it is brought online by the cluster software.
In certain
circumstances, you might use an asymmetric storage configuration, where only a
subset of cluster nodes have access to the storage array that contains the disk
witness. For information about how to work within an asymmetric storage
configuration and specify either Node and Disk Majority mode or No Majority:
Disk Only mode, see Changing the quorum
configuration in a failover cluster with asymmetric storage.
When using the
Node and File Share Majority mode, review the following recommendations for the
file share witness.
- Use a Server Message Block (SMB) share on a Windows Server 2003 or Windows Server 2008 file server.
- Make sure that the file share has a minimum of 5 MB of free space.
- Make sure that the file share is dedicated to the cluster and is not used in other ways (including storage of user or application data).
- Do not place the share on a node that is a member of this cluster or will become a member of this cluster in the future.
- You can place the share on a file server that has multiple file shares servicing different purposes. This may include multiple file share witnesses, each one a dedicated share. You can even place the share on a clustered file server (in a different cluster), which would typically be a clustered file server containing multiple file shares servicing different purposes.
- For a multi-site cluster, you can co-locate the external file share at one of the sites where a node or nodes are located. However, we recommend that you configure the external share in a separate third site.
- Place the file share on a server that is a member of a domain, in the same forest as the cluster nodes.
- For the folder that the file share uses, make sure that the administrator has Full Control share and NTFS permissions.
- Do not use a file share that is part of a Distributed File System (DFS) Namespace.
Note
|
After the
Quorum Configuration Wizard has been run, the computer object for the Cluster
Name will automatically be granted read and write permissions to the file
share.
|
If there is a
file share witness configured, but bringing that file share online will not
achieve quorum, then it remains offline. If bringing that file share online
will achieve quorum, then it is brought online by cluster software.
For more
information about file share witness recommendations, see:
When you install
a failover cluster, the cluster software automatically chooses an appropriate
quorum configuration for that cluster, based mainly on the number of nodes
(even or odd). You can easily view the quorum configuration of an existing
cluster using either the failover cluster snap-in or the command line.
- To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management (in Windows Server 2008) or Failover Cluster Manager (in Windows Server 2008 R2).If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.
- In the console tree, if the cluster that you want to view is not displayed, right-click Failover Cluster Management or Failover Cluster Manager, click Manage a Cluster, and then select the cluster you want to view.
- In the center pane, find Quorum Configuration, and view the description.
In
the following example, the quorum mode is Node and Disk Majority and the
disk witness is Cluster Disk 2.
- To open a Command Prompt window, on a cluster node, click Start, right-click Command Prompt, and then either click Run as administrator or click Open.
- If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.
- Review the configuration of the quorum by typing:
cluster /quorum
The procedure in
this section describes how you can configure quorum configuration in a failover
cluster by using the failover cluster snap-in. Additional subsections provide
information about quorum configurations for use in certain circumstances where
you want to use unequal node weight or asymmetric storage.
Important
|
Unless you have
changed the number of nodes in your cluster, it is usually best to use the
quorum configuration recommended by the quorum configuration wizard. We only
recommend changing the quorum configuration if you have determined that the
change is appropriate for your cluster.
|
The following procedure
describes how you can configure quorum configuration in a failover cluster by
using the failover cluster snap-in.
Membership in the
local Administrators group on each clustered server, or equivalent, is
the minimum permissions required to complete this procedure. Also, the account
you use must be a domain user account. Review details about using the
appropriate accounts and group memberships at http://go.microsoft.com/fwlink/?LinkId=83477.
- To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management (in Windows Server 2008) or Failover Cluster Manager (in Windows Server 2008 R2). If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.
- In the console tree, if the cluster you want to configure is not displayed, right-click Failover Cluster Management or Failover Cluster Manager, click Manage a Cluster, and select or specify the cluster you want.
- With the cluster selected, under Actions, click More Actions, and then click Configure Cluster Quorum Settings.
- Click Next. The following illustration shows the wizard page that displays for a cluster with an even number of nodes. Note that the text on this page varies, depending on whether the cluster has an even number or odd number of nodes. To view more information about the selections on this page, at the bottom of the page, click More about quorum configurations.
- Select a quorum mode from the list. For more information, see Choosing the quorum mode for a particular cluster, earlier in this guide.
- Click Next and then go to the appropriate step in this procedure:
- If you chose Node Majority, go to the last step in this procedure.
- If you chose Node and Disk Majority or No Majority, go to the next step in this procedure.
- If you chose Node and File Share Majority, skip to step 8 in this procedure.
- If you chose Node and Disk Majority or No Majority, a wizard page similar to the following appears. (For No Majority, the title of the page is Select Storage Resource.) Select the storage volume that you want to use for the disk witness (or if you chose No Majority, for the quorum resource), and then skip to step 9. For information about the requirements for the disk witness, see Requirements and recommendations for clusters using Node and Disk Majority.
If
you change disk assignments on this page, the former storage volume is no
longer assigned to the core Cluster Group and instead goes back to Available
Storage.
- If you chose Node and File Share Majority, the following wizard page appears. Specify the file share you want to use, or click the Browse button and use the standard browsing interface to select the file share. For information about the requirements for the file share, see Requirements and recommendations for clusters using Node and File Share Majority.
- Click Next. Use the confirmation page to confirm your selections, and then click Next.
- After the wizard runs and the Summary page appears, if you want to view a report of the tasks that the wizard performed, click View Report.
Note
|
The most recent
report will remain in the systemroot\Cluster\Reports folder
with the name QuorumConfiguration.mht.
|
In most failover
clusters, each node gets one vote. In certain circumstances, you might want to
install a hotfix that lets you select which nodes will have votes. This can be
useful with certain multi-site clusters, for example, where you want one site
to have more votes than other sites in a disaster recovery situation. Install
the hotfix to all nodes (not just the node that will not have a vote). To
download and install the hotfix, see http://support.microsoft.com/kb/2494036. To configure a node so that it does
not have a vote, at the command prompt, type:
cluster . node
<NodeName> /prop NodeWeight=0
This sets the
NodeWeight property to 0. Similarly, to return the node to having a vote, set
the NodeWeight property to 1.
After you have
applied the hotfix described in this section, you might want to start a node
but prevent it from achieving quorum and forming the cluster (to prevent a
"split" situation with two competing instances of the cluster
running). To do this, start the Cluster service with the /preventquorum
option, which can be abbreviated as /pq, as shown in the following
command:
net start clussvc /pq
In most failover
clusters, the disk witness or quorum disk can be accessed by all nodes. In
certain circumstances, you might want to configure a disk witness or quorum
disk that can be accessed by a subset of nodes only (asymmetric storage).
Before configuring this, if your cluster is running Windows Server 2008 R2,
make sure that it has Service Pack 1. Similarly, if your cluster is running
Windows Server 2008, make sure that it has Service Pack 2 and hotfix 976097,
which is described at http://support.microsoft.com/kb/976097.
To configure the
quorum in a failover cluster with asymmetric storage, first ensure that the
disk that will be the disk witness or quorum disk is online and is in the
Cluster Group (not in Available Storage). Then, at the command prompt, for a Node
and Disk Majority (disk witness) configuration, type:
cluster
<ClusterName> /quorum:"<DiskResourceName>"
For a No
Majority: Disk Witness configuration, type the same command, but add /diskonly
to the end of the command. With the No Majority: Disk Witness
configuration, the cluster may be unable to start unless a node that can access
the disk witness is available. If none of these nodes is available, it might be
necessary to start the cluster service with the net start clussvc
/forcequorum command.
When
troubleshooting, you might be in a situation where the cluster is offline
because it does not have quorum, but you want to bring it online. The first
thing to understand is your quorum mode and why you no longer have quorum. This
may provide some insight into how the cluster can achieve quorum and come
online automatically. If you need to force the Cluster service to start, you
can make all nodes which can communicate with each other begin working together
as a cluster by running the net start clussvc command with an option for
forcing quorum. The cluster will use the copy of the cluster configuration that
is on the node on which you run the command, and replicate it to all other
nodes. To force the cluster to start, on a node that contains a copy of the
cluster configuration that you want to use, type the following command:
net
start clussvc /fq
The command can
also be typed as net start clussvc /forcequorum.
Forcing a cluster
to start that does not have quorum may be especially useful in an unbalanced
multi-site cluster. If you have a five-node multi-site cluster and three nodes
at Site A fail, then the two nodes at Site B will go offline since they no
longer have quorum. If there is a genuine disaster at Site A, then it may take
a significant amount of time for the site to come online, and so you would
likely want to force Site B to come online, even though it does not have
quorum.
When a cluster is
forced to start without quorum it continually looks to add nodes to the cluster
and is in a special “forced” state. Once it has majority, the cluster moves out
of the forced state and behaves normally, which means it is not necessary to rerun
the cluster command without a startup switch. If the cluster then loses a node
and drops below quorum, it will go offline again because it is no longer in the
forced state. At that point, to bring it online again while it does not have
quorum would require running net start clussvc /fq again.
In some
situations, you might want to start a node but prevent it from achieving quorum
and forming the cluster. For more information, see Changing the quorum
configuration in a failover cluster for unequal node weight, earlier in this topic.
For more
information about disk witness recommendations, see:
http://go.microsoft.com/fwlink/?LinkId=115004 (Note that this is a Windows Server
2003 article, but the disk witness recommendations remain unchanged.)
For more
information about file share witness recommendations, see:
For a list of
technical documentation for failover clusters on the TechNet Web site, see:
For information
on Cluster Continuous Replication (CCR) in Microsoft Exchange Server 2007:
For information
about choosing and validating hardware for a failover cluster, see the TechNet
Web site at:
No comments:
Post a Comment