High availability for NFS on Azure VMs on SUSE Linux Enterprise Server

Note

We recommend deploying one of the Azure first-party NFS services: NFS on Azure Files or NFS ANF volumes for storing shared data in a highly available SAP system. Be aware, that we are de-emphasizing SAP reference architectures, utilizing NFS clusters.

This article describes how to deploy the virtual machines, configure the virtual machines, install the cluster framework, and install a highly available NFS server that can be used to store the shared data of a highly available SAP system. This guide describes how to set up a highly available NFS server that is used by two SAP systems, NW1 and NW2. The names of the resources (for example virtual machines, virtual networks) in the example assume that you have used the SAP file server template with resource prefix prod.

Note

This article contains references to terms that Microsoft no longer uses. When the terms are removed from the software, we'll remove them from this article.

Read the following SAP Notes and papers first

Overview

To achieve high availability, SAP NetWeaver requires an NFS server. The NFS server is configured in a separate cluster and can be used by multiple SAP systems.

SAP NetWeaver High Availability overview

The NFS server uses a dedicated virtual hostname and virtual IP addresses for every SAP system that uses this NFS server. On Azure, a load balancer is required to use a virtual IP address. The presented configuration shows a load balancer with:

  • Frontend IP address 10.0.0.4 for NW1
  • Frontend IP address 10.0.0.5 for NW2
  • Probe port 61000 for NW1
  • Probe port 61001 for NW2

Set up a highly available NFS server

Deploy Linux manually via Azure portal

This document assumes that you've already deployed a resource group, Azure Virtual Network, and subnet.

Deploy two virtual machines for NFS servers. Choose a suitable SLES image that is supported with your SAP system. You can deploy VM in any one of the availability options - scale set, availability zone or availability set.

Configure Azure load balancer

Follow create load balancer guide to configure a standard load balancer for an NFS server high availability. During the configuration of load balancer, consider following points.

  1. Frontend IP Configuration: Create two frontend IP. Select the same virtual network and subnet as your NFS server.
  2. Backend Pool: Create backend pool and add NFS server VMs.
  3. Inbound rules: Create two load balancing rule, one for NW1 and another for NW2. Follow the same steps for both load balancing rules.
    • Frontend IP address: Select frontend IP
    • Backend pool: Select backend pool
    • Check "High availability ports"
    • Protocol: TCP
    • Health Probe: Create health probe with below details (applies for both NW1 and NW2)
      • Protocol: TCP
      • Port: [for example: 61000 for NW1, 61001 for NW2]
      • Interval: 5
      • Probe Threshold: 2
    • Idle timeout (minutes): 30
    • Check "Enable Floating IP"

Note

Health probe configuration property numberOfProbes, otherwise known as "Unhealthy threshold" in Portal, isn't respected. So to control the number of successful or failed consecutive probes, set the property "probeThreshold" to 2. It is currently not possible to set this property using Azure portal, so use either the Azure CLI or PowerShell command.

Note

When VMs without public IP addresses are placed in the backend pool of internal (no public IP address) Standard Azure load balancer, there will be no outbound internet connectivity, unless additional configuration is performed to allow routing to public end points. For details on how to achieve outbound connectivity see Public endpoint connectivity for Virtual Machines using Azure Standard Load Balancer in SAP high-availability scenarios.

Important

  • Don't enable TCP time stamps on Azure VMs placed behind Azure Load Balancer. Enabling TCP timestamps will cause the health probes to fail. Set the net.ipv4.tcp_timestamps parameter to 0. For details, see Load Balancer health probes.
  • To prevent saptune from changing the manually set net.ipv4.tcp_timestamps value from 0 back to 1, you should update saptune version to 3.1.1 or higher. For more details, see saptune 3.1.1 – Do I Need to Update?.

Create Pacemaker cluster

Follow the steps in Setting up Pacemaker on SUSE Linux Enterprise Server in Azure to create a basic Pacemaker cluster for this NFS server.

Configure NFS server

The following items are prefixed with either [A] - applicable to all nodes, [1] - only applicable to node 1 or [2] - only applicable to node 2.

  1. [A] Setup host name resolution

    You can either use a DNS server or modify the /etc/hosts on all nodes. This example shows how to use the /etc/hosts file. Replace the IP address and the hostname in the following commands

    sudo vi /etc/hosts
    

    Insert the following lines to /etc/hosts. Change the IP address and hostname to match your environment

    # IP address of the load balancer frontend configuration for NFS
    
    10.0.0.4 nw1-nfs
    10.0.0.5 nw2-nfs
    
  2. [A] Enable NFS server

    Create the root NFS export entry

    sudo sh -c 'echo /srv/nfs/ *\(rw,no_root_squash,fsid=0\)>/etc/exports'
    
    sudo mkdir /srv/nfs/
    
  3. [A] Install drbd components

    sudo zypper install drbd drbd-kmp-default drbd-utils
    
  4. [A] Create a partition for the drbd devices

    List all available data disks

    sudo ls /dev/disk/azure/scsi1/
    
    # Example output
    # lun0  lun1
    

    Create partitions for every data disk

    sudo sh -c 'echo -e "n\n\n\n\n\nw\n" | fdisk /dev/disk/azure/scsi1/lun0'
    sudo sh -c 'echo -e "n\n\n\n\n\nw\n" | fdisk /dev/disk/azure/scsi1/lun1'
    
  5. [A] Create LVM configurations

    List all available partitions

    ls /dev/disk/azure/scsi1/lun*-part*
    
    # Example output
    # /dev/disk/azure/scsi1/lun0-part1  /dev/disk/azure/scsi1/lun1-part1
    

    Create LVM volumes for every partition

    sudo pvcreate /dev/disk/azure/scsi1/lun0-part1
    sudo vgcreate vg-NW1-NFS /dev/disk/azure/scsi1/lun0-part1
    sudo lvcreate -l 100%FREE -n NW1 vg-NW1-NFS
    
    sudo pvcreate /dev/disk/azure/scsi1/lun1-part1
    sudo vgcreate vg-NW2-NFS /dev/disk/azure/scsi1/lun1-part1
    sudo lvcreate -l 100%FREE -n NW2 vg-NW2-NFS
    
  6. [A] Configure drbd

    sudo vi /etc/drbd.conf
    

    Make sure that the drbd.conf file contains the following two lines

    include "drbd.d/global_common.conf";
    include "drbd.d/*.res";
    

    Change the global drbd configuration

    sudo vi /etc/drbd.d/global_common.conf
    

    Add the following entries to the handler and net section.

    global {
         usage-count no;
    }
    common {
         handlers {
              fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
              after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
              split-brain "/usr/lib/drbd/notify-split-brain.sh root";
              pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
         }
         startup {
              wfc-timeout 0;
         }
         options {
         }
         disk {
              md-flushes yes;
              disk-flushes yes;
              c-plan-ahead 1;
              c-min-rate 100M;
              c-fill-target 20M;
              c-max-rate 4G;
         }
         net {
              after-sb-0pri discard-younger-primary;
              after-sb-1pri discard-secondary;
              after-sb-2pri call-pri-lost-after-sb;
              protocol     C;
              tcp-cork yes;
              max-buffers 20000;
              max-epoch-size 20000;
              sndbuf-size 0;
              rcvbuf-size 0;
         }
    }
    
  7. [A] Create the NFS drbd devices

    sudo vi /etc/drbd.d/NW1-nfs.res
    

    Insert the configuration for the new drbd device and exit

    resource NW1-nfs {
         protocol     C;
         disk {
              on-io-error       detach;
         }
         net {
             fencing  resource-and-stonith;  
         }
         on prod-nfs-0 {
              address   10.0.0.6:7790;
              device    /dev/drbd0;
              disk      /dev/vg-NW1-NFS/NW1;
              meta-disk internal;
         }
         on prod-nfs-1 {
              address   10.0.0.7:7790;
              device    /dev/drbd0;
              disk      /dev/vg-NW1-NFS/NW1;
              meta-disk internal;
         }
    }
    
    sudo vi /etc/drbd.d/NW2-nfs.res
    

    Insert the configuration for the new drbd device and exit

    resource NW2-nfs {
         protocol     C;
         disk {
              on-io-error       detach;
         }
         net {
             fencing  resource-and-stonith;  
         }
         on prod-nfs-0 {
              address   10.0.0.6:7791;
              device    /dev/drbd1;
              disk      /dev/vg-NW2-NFS/NW2;
              meta-disk internal;
         }
         on prod-nfs-1 {
              address   10.0.0.7:7791;
              device    /dev/drbd1;
              disk      /dev/vg-NW2-NFS/NW2;
              meta-disk internal;
         }
    }
    

    Create the drbd device and start it

    sudo drbdadm create-md NW1-nfs
    sudo drbdadm create-md NW2-nfs
    sudo drbdadm up NW1-nfs
    sudo drbdadm up NW2-nfs
    
  8. [1] Skip initial synchronization

    sudo drbdadm new-current-uuid --clear-bitmap NW1-nfs
    sudo drbdadm new-current-uuid --clear-bitmap NW2-nfs
    
  9. [1] Set the primary node

    sudo drbdadm primary --force NW1-nfs
    sudo drbdadm primary --force NW2-nfs
    
  10. [1] Wait until the new drbd devices are synchronized

    sudo drbdsetup wait-sync-resource NW1-nfs
    sudo drbdsetup wait-sync-resource NW2-nfs
    
  11. [1] Create file systems on the drbd devices

    sudo mkfs.xfs /dev/drbd0
    sudo mkdir /srv/nfs/NW1
    sudo chattr +i /srv/nfs/NW1
    sudo mount -t xfs /dev/drbd0 /srv/nfs/NW1
    sudo mkdir /srv/nfs/NW1/sidsys
    sudo mkdir /srv/nfs/NW1/sapmntsid
    sudo mkdir /srv/nfs/NW1/trans
    sudo mkdir /srv/nfs/NW1/ASCS
    sudo mkdir /srv/nfs/NW1/ASCSERS
    sudo mkdir /srv/nfs/NW1/SCS
    sudo mkdir /srv/nfs/NW1/SCSERS
    sudo umount /srv/nfs/NW1
    
    sudo mkfs.xfs /dev/drbd1
    sudo mkdir /srv/nfs/NW2
    sudo chattr +i /srv/nfs/NW2
    sudo mount -t xfs /dev/drbd1 /srv/nfs/NW2
    sudo mkdir /srv/nfs/NW2/sidsys
    sudo mkdir /srv/nfs/NW2/sapmntsid
    sudo mkdir /srv/nfs/NW2/trans
    sudo mkdir /srv/nfs/NW2/ASCS
    sudo mkdir /srv/nfs/NW2/ASCSERS
    sudo mkdir /srv/nfs/NW2/SCS
    sudo mkdir /srv/nfs/NW2/SCSERS
    sudo umount /srv/nfs/NW2
    
  12. [A] Setup drbd split-brain detection

    When using drbd to synchronize data from one host to another, a so called split brain can occur. A split brain is a scenario where both cluster nodes promoted the drbd device to be the primary and went out of sync. It might be a rare situation but you still want to handle and resolve a split brain as fast as possible. It is therefore important to be notified when a split brain happened.

    Read the official drbd documentation on how to set up a split brain notification.

    It is also possible to automatically recover from a split brain scenario. For more information, read Automatic split brain recovery policies

Configure Cluster Framework

  1. [1] Add the NFS drbd devices for SAP system NW1 to the cluster configuration

    Important

    Recent testing revealed situations, where netcat stops responding to requests due to backlog and its limitation of handling only one connection. The netcat resource stops listening to the Azure Load balancer requests and the floating IP becomes unavailable.
    For existing Pacemaker clusters, we recommended in the past replacing netcat with socat. Currently we recommend using azure-lb resource agent, which is part of package resource-agents, with the following package version requirements:

    • For SLES 12 SP4/SP5, the version must be at least resource-agents-4.3.018.a7fb5035-3.30.1.
    • For SLES 15/15 SP1, the version must be at least resource-agents-4.3.0184.6ee15eb2-4.13.1.

    Note that the change will require brief downtime.
    For existing Pacemaker clusters, if the configuration was already changed to use socat as described in Azure Load-Balancer Detection Hardening, there is no requirement to switch immediately to azure-lb resource agent.

    sudo crm configure rsc_defaults resource-stickiness="200"
    
    # Enable maintenance mode
    sudo crm configure property maintenance-mode=true
    
    sudo crm configure primitive drbd_NW1_nfs \
      ocf:linbit:drbd \
      params drbd_resource="NW1-nfs" \
      op monitor interval="15" role="Master" \
      op monitor interval="30" role="Slave"
    
    sudo crm configure ms ms-drbd_NW1_nfs drbd_NW1_nfs \
      meta master-max="1" master-node-max="1" clone-max="2" \
      clone-node-max="1" notify="true" interleave="true"
    
    sudo crm configure primitive fs_NW1_sapmnt \
      ocf:heartbeat:Filesystem \
      params device=/dev/drbd0 \
      directory=/srv/nfs/NW1  \
      fstype=xfs \
      op monitor interval="10s"
    
    sudo crm configure primitive nfsserver systemd:nfs-server \
      op monitor interval="30s"
    sudo crm configure clone cl-nfsserver nfsserver
    
    sudo crm configure primitive exportfs_NW1 \
      ocf:heartbeat:exportfs \
      params directory="/srv/nfs/NW1" \
      options="rw,no_root_squash,crossmnt" clientspec="*" fsid=1 wait_for_leasetime_on_stop=true op monitor interval="30s"
    
    sudo crm configure primitive vip_NW1_nfs IPaddr2 \
      params ip=10.0.0.4 op monitor interval=10 timeout=20
    
    sudo crm configure primitive nc_NW1_nfs azure-lb port=61000 \
      op monitor timeout=20s interval=10
    
    sudo crm configure group g-NW1_nfs \
      fs_NW1_sapmnt exportfs_NW1 nc_NW1_nfs vip_NW1_nfs
    
    sudo crm configure order o-NW1_drbd_before_nfs inf: \
      ms-drbd_NW1_nfs:promote g-NW1_nfs:start
    
    sudo crm configure colocation col-NW1_nfs_on_drbd inf: \
      g-NW1_nfs ms-drbd_NW1_nfs:Master
    
  2. [1] Add the NFS drbd devices for SAP system NW2 to the cluster configuration

    # Enable maintenance mode
    sudo crm configure property maintenance-mode=true
    
    sudo crm configure primitive drbd_NW2_nfs \
      ocf:linbit:drbd \
      params drbd_resource="NW2-nfs" \
      op monitor interval="15" role="Master" \
      op monitor interval="30" role="Slave"
    
    sudo crm configure ms ms-drbd_NW2_nfs drbd_NW2_nfs \
      meta master-max="1" master-node-max="1" clone-max="2" \
      clone-node-max="1" notify="true" interleave="true"
    
    sudo crm configure primitive fs_NW2_sapmnt \
      ocf:heartbeat:Filesystem \
      params device=/dev/drbd1 \
      directory=/srv/nfs/NW2  \
      fstype=xfs \
      op monitor interval="10s"
    
    sudo crm configure primitive exportfs_NW2 \
      ocf:heartbeat:exportfs \
      params directory="/srv/nfs/NW2" \
      options="rw,no_root_squash,crossmnt" clientspec="*" fsid=2 wait_for_leasetime_on_stop=true op monitor interval="30s"
    
    sudo crm configure primitive vip_NW2_nfs IPaddr2 \
      params ip=10.0.0.5 op monitor interval=10 timeout=20
    
    sudo crm configure primitive nc_NW2_nfs azure-lb port=61001 \
      op monitor timeout=20s interval=10
    
    sudo crm configure group g-NW2_nfs \
      fs_NW2_sapmnt exportfs_NW2 nc_NW2_nfs vip_NW2_nfs
    
    sudo crm configure order o-NW2_drbd_before_nfs inf: \
      ms-drbd_NW2_nfs:promote g-NW2_nfs:start
    
    sudo crm configure colocation col-NW2_nfs_on_drbd inf: \
      g-NW2_nfs ms-drbd_NW2_nfs:Master
    

    The crossmnt option in the exportfs cluster resources is present in our documentation for backward compatibility with older SLES versions.

  3. [1] Disable maintenance mode

    sudo crm configure property maintenance-mode=false
    

Next steps