Configure SLES shared disk cluster for SQL Server
Applies to: SQL Server - Linux
This guide provides instructions to create a two-nodes shared disk cluster for SQL Server on SUSE Linux Enterprise Server (SLES). The clustering layer is based on SUSE High Availability Extension (HAE) built on top of Pacemaker.
For more information on cluster configuration, resource agent options, management, best practices, and recommendations, see SUSE Linux Enterprise High Availability Extension 12 SP5.
Prerequisites
To complete the following end-to-end scenario, you need two machines to deploy the two nodes cluster and another server to configure the NFS share. Below steps outline how these servers will be configured.
Setup and configure the operating system on each cluster node
The first step is to configure the operating system on the cluster nodes. For this walk through, use SLES with a valid subscription for the HA add-on.
Install and configure SQL Server on each cluster node
Install and setup SQL Server on both nodes. For detailed instructions, see Installation guidance for SQL Server on Linux.
Designate one node as primary and the other as secondary, for purposes of configuration. Use these terms for the following this guide.
On the secondary node, stop and disable SQL Server. The following example stops and disables SQL Server:
sudo systemctl stop mssql-server sudo systemctl disable mssql-server
Note
At setup time, a Server Master Key is generated for the SQL Server instance and placed at
/var/opt/mssql/secrets/machine-key
. On Linux, SQL Server always runs as a local account called mssql. Because it's a local account, its identity isn't shared across nodes. Therefore, you need to copy the encryption key from primary node to each secondary node so each local mssql account can access it to decrypt the Server Master Key.On the primary node, create a SQL Server login for Pacemaker and grant the login permission to run
sp_server_diagnostics
. Pacemaker uses this account to verify which node is running SQL Server.sudo systemctl start mssql-server
Connect to the SQL Server
master
database with thesa
account and run the following:USE [master] GO CREATE LOGIN [<loginName>] with PASSWORD= N'<password>' GRANT VIEW SERVER STATE TO <loginName>
Caution
Your password should follow the SQL Server default password policy. By default, the password must be at least eight characters long and contain characters from three of the following four sets: uppercase letters, lowercase letters, base-10 digits, and symbols. Passwords can be up to 128 characters long. Use passwords that are as long and complex as possible.
On the primary node, stop and disable SQL Server.
Follow the directions in the SUSE documentation to configure and update the hosts file for each cluster node. The
hosts
file must include the IP address and name of every cluster node.To check the IP address of the current node, run:
sudo ip addr show
Set the computer name on each node. Give each node a unique name that is 15 characters or less. Set the computer name by adding it to
/etc/hostname
using YAST or manually.The following example shows
/etc/hosts
with additions for two nodes namedSLES1
andSLES2
.127.0.0.1 localhost 10.128.18.128 SLES1 10.128.16.77 SLES2
All cluster nodes must be able to access each other via SSH. Tools like
hb_report
orcrm_report
(for troubleshooting) and Hawk's History Explorer require passwordless SSH access between the nodes, otherwise they can only collect data from the current node. In case you use a non-standard SSH port, use the-X
option (see Other Requirements and Recommendations). For example, if your SSH port is 3479, invokecrm_report
with:crm_report -X "-p 3479" [...]
For more information, see the Administration Guide.
In the next section you'll configure shared storage and move your database files to that storage.
Configure shared storage and move database files
There are various solutions for providing shared storage. This walk-through demonstrates configuring shared storage with NFS. We recommend following best practices and use Kerberos to secure NFS:
If you don't follow this guidance, anyone who can access your network and spoof the IP address of a SQL node will be able to access your data files. As always, make sure that you threat model your system before using it in production.
Another storage option is to use SMB file share:
Configure an NFS server
To configure an NFS server, see the following steps in the SUSE documentation: Configuring NFS Server.
Configure all cluster nodes to connect to the NFS shared storage
Before configuring the client NFS to mount the SQL Server database files path to point to the shared storage location, make sure you save the database files to a temporary location to be able to copy them later on the share:
On the primary node only, save the database files to a temporary location. The following script, creates a new temporary directory, copies the database files to the new directory, and removes the old database files. As SQL Server runs as local user mssql, you need to make sure that after data transfer to the mounted share, local user has read-write access to the share.
su mssql mkdir /var/opt/mssql/tmp cp /var/opt/mssql/data/* /var/opt/mssql/tmp rm /var/opt/mssql/data/* exit
Configure the NFS client on all cluster nodes:
Note
You should follow SUSE's best practices and recommendations regarding Highly Available NFS storage: Highly Available NFS Storage with DRBD and Pacemaker.
Validate that SQL Server starts successfully with the new file path. Do this on each node. At this point only one node should run SQL Server at a time. They can't both run at the same time because they will both try to access the data files simultaneously (to avoid accidentally starting SQL Server on both nodes, use a File System cluster resource, to make sure the share isn't mounted twice by the different nodes). The following commands start SQL Server, check the status, and then stop SQL Server.
sudo systemctl start mssql-server sudo systemctl status mssql-server sudo systemctl stop mssql-server
At this point, both instances of SQL Server are configured to run with the database files on the shared storage. The next step is to configure SQL Server for Pacemaker.
Install and configure Pacemaker on each cluster node
On both cluster nodes, create a file to store the SQL Server username and password for the Pacemaker login. The following command creates and populates this file:
sudo touch /var/opt/mssql/secrets/passwd sudo echo '<loginName>' >> /var/opt/mssql/secrets/passwd sudo echo '<password>' >> /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 600 /var/opt/mssql/secrets/passwd
Caution
Your password should follow the SQL Server default password policy. By default, the password must be at least eight characters long and contain characters from three of the following four sets: uppercase letters, lowercase letters, base-10 digits, and symbols. Passwords can be up to 128 characters long. Use passwords that are as long and complex as possible.
All cluster nodes must be able to access each other via SSH. Tools like hb_report or crm_report (for troubleshooting) and Hawk's History Explorer require passwordless SSH access between the nodes, otherwise they can only collect data from the current node. In case you use a non-standard SSH port, use the -X option (see man page). For example, if your SSH port is 3479, invoke an hb_report with:
crm_report -X "-p 3479" [...]
For more information, see System Requirements and Recommendations in the SUSE documentation.
Install the High Availability extension. To install the extension, follow the steps in the following SUSE article:
Install the FCI resource agent for SQL Server. Run the following commands on both nodes:
sudo zypper addrepo -fc https://packages.microsoft.com/config/sles/12/mssql-server-2017.repo sudo zypper --gpg-auto-import-keys refresh sudo zypper install mssql-server-ha
Automatically set up the first node. The next step is to set up a running one-node cluster by configuring the first node, SLES1. Follow the instructions in the SUSE article, Setting Up the First Node.
When finished, check the cluster status with
crm status
:crm status
It should show that one node, SLES1, is configured.
Add nodes to an existing cluster. Next join the SLES2 node to the cluster. Follow the instructions in the SUSE article, Adding the Second Node.
When finished, check the cluster status with crm status. If you have successfully added a second node, the output is similar to the following:
2 nodes configured 1 resource configured Online: [ SLES1 SLES2 ] Full list of resources: admin_addr (ocf::heartbeat:IPaddr2): Started SLES1
Note
admin_addr is the virtual IP cluster resource which is configured during initial one-node cluster setup.
Removal procedures. If you need to remove a node from the cluster, use the ha-cluster-remove bootstrap script. For more information, see Overview of the Bootstrap Scripts.
Configure the cluster resources for SQL Server
The following steps explain how to configure the cluster resource for SQL Server. There are two settings that you need to customize.
- SQL Server Resource Name: A name for the clustered SQL Server resource.
- Timeout Value: The timeout value is the amount of time that the cluster waits while a resource is brought online. For SQL Server, this is the time that you expect SQL Server to take to bring the
master
database online.
Update the values from the following script for your environment. Run on one node to configure and start the clustered service.
sudo crm configure
primitive <sqlServerResourceName> ocf:mssql:fci op start timeout=<timeout_in_seconds>
colocation <constraintName> inf: <virtualIPResourceName> <sqlServerResourceName>
show
commit
exit
For example, the following script creates a SQL Server clustered resource named mssqlha
.
sudo crm configure
primitive mssqlha ocf:mssql:fci op start timeout=60s
colocation admin_addr_mssqlha inf: admin_addr mssqlha
show
commit
exit
After the configuration is committed, SQL Server will start on the same node as the virtual IP resource.
For more information, see Configuring and Managing Cluster Resources (Command Line).
Verify that SQL Server is started
To verify that SQL Server is started, run the crm status command:
crm status
The following example shows the results when Pacemaker has successfully started as clustered resource.
2 nodes configured
2 resources configured
Online: [ SLES1 SLES2 ]
Full list of resources:
admin_addr (ocf::heartbeat:IPaddr2): Started SLES1
mssqlha (ocf::mssql:fci): Started SLES1
Manage cluster resources
To manage your cluster resources, see the following SUSE article: Managing Cluster Resources
Manual failover
Although resources are configured to automatically fail over (or migrate) to other nodes of the cluster in the event of a hardware or software failure, you can also manually move a resource to another node in the cluster using either the Pacemaker GUI or the command line.
Use the migrate command for this task. For example, to migrate the SQL resource to a cluster node names SLES2 execute:
crm resource
migrate mssqlha SLES2