共用方式為


5 Set Up the Hybrid Operating System Cluster Prototype

Applies To: Windows HPC Server 2008

This Chapter describes the general setup of the Hybrid Operating System Cluster (HOSC) defined in the previous Chapter. The initial idea was to install Windows HPC Server 2008 on an existing Linux cluster without affecting the existing Linux cluster installation. However, in our case it appeared that the installation procedure requires reinstalling the management node with the 2 virtual machines. So finally the installation procedure is given for an HOSC installation done from scratch.

5.1 Installation of the management nodes

Installation of the RHEL5.1 host OS with Xen

If you have an already configured XBAS cluster, do not forget to save the clusterdb (XBAS cluster data base) and all your data stored on the current management node before reinstalling it with RHEL5.1 and its virtualization software Xen.

Check that Virtual Technology (VT) is enabled in the BIOS settings of the server.Install Linux RHEL5.1 from the DVD on the management server and select “virtualization” when optional packages are proposed. SELinux must be disabled.

Erase all existing partitions and design your partition table so that enough free space is available in a volume group for creating logical volumes (LV). LVs are virtual partitions used for the installation of virtual machines (VM), each VM being installed on one LV. Volume groups and logical volumes are managed by the Logical Volume Manager (LVM). The advised size of a LV is 30-50GB: leave at least 100GB of free space on the management server for the creation of 2 LVs.

It is advisable to install an up-to-date gigabit driver. One is included on the XBAS 5v1.1 XHPC DVD.

[xbas0:root] rpm -i XHPC/RPMS/e1000-7.6.15.4-2.x86_64.rpm

Creation of 2 virtual machines

A good candidate for the easy management of Xen Virtual machines is the Bull Hypernova tool. Hypernova is an internal-to-Bull software environment based on RHEL5.1/Xen3 for the management of virtual machines (VM) on Xeon® and Itanium2® systems. HN-Master is the web graphical interface (see Figure 15) that manages VMs in the Hypernova environment. It can be used to create, delete, install and clone VM; modify VM properties (network interfaces, number of virtual CPUs, memory, etc.); start, pause, stop and monitor VM status.

2 virtual machines are needed to install the XBAS management node and the HPCS head node. Create these 2 Xen virtual machines on the management server. The use of HN-Master is optional and all operations done in the Hypernova environment could also be done with Xen commands in a basic Xen environment. For the use of HN-Master, httpd service must be started (type “chkconfig --level 35 httpd on” to start it automatically at boot time).

HN-Master user interface

Figure 15   HN-Master user interface

The following values are used to create each VM:

  • Virtualization mode: Full

  • Startup allocated memory: 2048

  • Virtual CPUs number: 45

    Note

    In case of problems when installing the OSs (e.g., #IRQ disabled while files are copied), select only 1 virtual CPU for the VM during the OS installation step.

  • Virtual CPUs affinity type: mapping

  • Logical Volume size: 50GB

Create 2 network interface bridges, xenbr0 and xenbr1, so that each VM can have 2 virtual network interfaces (one on the private network and one on the public network). Detailed instructions for configuring 2 network interface bridges are shown in Appendix D.2.4.

Installation of XBAS management node on a VM

Install XBAS on the first virtual machine. If applicable, use the clusterdb and the network configuration of the initial management node. Update the clusterdb with the new management node MAC addresses: the xenbr0 and xenbr1 MAC address of the VM. Follow the instructions given in the BAS for Xeon installation and configuration guide [22], and choose the following options for the MN setup:

[xbas0:root] cd /release/XBAS5V1.1
[xbas0:root] ./install -func MNGT IO LOGIN -prod RHEL XHPC XIB

Update the clusterdb with the new management node MAC-address (see [22] and [23] for details).

Installation of InfiniBand driver on domain 0

The InfiniBand (IB) drivers and libraries should be installed on domain 0. They are available on the XIB DVD included in the XBAS distribution. Assuming that the content of the XIB DVD is copied on the XBAS management node (with the IP address 192.168.0.1) in directory /release as it is requested in the installation guide [22], the following commands should be executed:

[xbas0:root] mkdir /release
[xbas0:root] mount 192.168.0.1:/release /release
[xbas0:root] scp root@192.168.0.1:/etc/yum.repos.d/*.repo /etc/yum.repos.d
[xbas0:root] yum install perl-DBI perl-XML-Parser perl-XML-Simple
[xbas0:root] yum install dapl infiniband-diags libibcm libibcommon libibmad libibumad libibverbs libibverbs-utils libmlx4 libmthca librdmacm librdmacm-utils mstflint mthca_fw_update ofed-docs ofed-scripts opensm-libs perftest qperf --disablerepo=local-rhel

Installation of HPCS head node on a VM

Install Windows Server 2008 on the second virtual machine as you would do on any physical server. Then the following instructions should be executed in this order:

  1. set the Head Node (HN) hostname

  2. configure the HN network interfaces

  3. enable remote desktop (this is recommended for a remote administration of the cluster)

  4. set “Internet Time Synchronization” so that the time is the same on the HN and the MN

  5. set “Internet Time Synchronization” so that the time is the same on the HN and the MN

  6. install the Active Directory (AD) Domain Services and create a new domain for your cluster with the wizard (dcpromo.exe), or configure the access to your existing AD on your local network

  7. install the Microsoft HPC Pack

Preparation for XBAS deployment on compute nodes

Check that there is enough space on the first device of the compute nodes for creating an additional primary partition (e.g., on /dev/sda). If not, make some space by reducing the existing partitions or by redeploying XBAS compute nodes with the right partitioning (using the preparenfs command and a dedicated kickstart file). Edit the kickstart file accordingly to an HOSC compatible disk partition. For example, /boot on /dev/sda1, / on /dev/sda2 and SWAP on /dev/sda3. An example is given in Appendix D.2.1.

Create a /opt/hosc directory and export it with NFS. Then mount it on every node of the cluster and install the HOSC files listed in Appendix D.2.3 in it:

  • switch_dhcp_host

  • activate_partition_HPCS.sh

  • fdisk_commands.txt

  • from_XBAS_to_HPCS.sh and from_HPCS_to_XBAS.sh

Preparation for HPCS deployment on compute nodes

First configure the cluster by following the instructions given in the HPC Cluster Manager MMC to-do list:

  1. Configure your network:

    1. Topology: compute nodes isolated on private and application networks (topology 3)

    2. DHCP server and NAT: not activated on the private interface

    3. Firewall is “off” on private network (this is for the compute nodes only because the firewall needs to be “on” for the head node)

  2. Provide installation credentials

  3. Configure the naming of the nodes (this step is mandatory even if it is not useful in our case: the new node names will be imported from an XML file that we will create later). You can specify: HPCS%1%

  4. Create a deployment template with operating system and “Windows Server 2008” image

Bring the HN online in the management console: click on “Bring Online” in the “Node Management” window of the “HPC Cluster Manager” MMC.

Add a recent network adapter Gigabit driver to the OS image that will be deployed: click on “Manage drivers” and add the drivers for Intel PRO/1000 version 13.1.2 or higher (PROVISTAX64_v13_1_2.exe can be downloaded from Intel web site).

Add a recent IB driver (see [27]) that supports NetworkDirect (ND). Then edit the compute node template and add a “post install command” task that configures IPoIB IP address and register ND on the compute nodes. The IPoIB configuration can be done by the script setIPoIB.vbs provided in Appendix D.1.2. The ND registration is done by the command:

C:\> ndinstall -i

Two files used by the installation template must be edited in order to keep existing XBAS partitions untouched on compute nodes while deploying HPCS. For example, choose the fourth partition (/dev/sda4) for the HPCS deployment (see Appendix D.1.1 for more details):

  • unattend.xml

  • diskpart.txt

Create a shared C:\hosc directory and install the HOSC files listed in Appendix D.1.3 in it:

  • activate_partition_XBAS.bat

  • diskpart_commands.txt

  • from_HPCS_to_XBAS.bat

Configuration of services on HPCS head node

Note

Thanks to Christian Terboven (research associate in the HPC group of the Center for Computing and Communication at RWTH Aachen University) for his helpful contribution to this configuration phase.

The DHCP service is disabled on HPCS head node (it was not activated during the installation step). The firewall must be enabled on the head node for the private network. It must be configured to drop all incoming network packets on local ports 67/UDP and 68/UDP in order to block any DHCP traffic that might be produced by the Windows Deployment Service (WDS). This is done by creating 2 inbound rules from the Server Manager MMC. Click on:

Server Manager → Configuration → Windows Firewall with Advanced Security→ Inbound Rules → New Rule

Then select the following options:

Rule type: Port

Protocol and ports: 67 (or 68 for the second rule)

Action: Block the connection

Name: UDP/67 blocked (or UDP/68 blocked for the second rule)

Instead of blocking these ports, it is also possible to disable all inbound rules that are enabled by default on UDP ports 67 and 68.

5.2 Deployment of the operating systems on the compute nodes

The order in which the OSs are deployed is not important but must be the same on every compute node. The order should thus be decided before starting any node installation or deployment. The installation scripts (such as diskpart.txt for HPCS or kickstart.<identifier> for XBAS) must be edited accordingly in the desired order. In this example, we chose to deploy XBAS first. The partition table we plan to create is:

/dev/sda1

/boot

100MB

ext3

(Linux)

/dev/sda2

/

50GB

ext3

(Linux)

/dev/sda3

SWAP

16GB

 

(Linux)

/dev/sda4

C:\

50GB

ntfs

(Windows)

First, check that the BIOS settings of all CNs are configured for PXE boot (and not local hard disk boot). They should boot on the eth0 Gigabit Ethernet (GE) card. For example, the following settings are correct:

Boot order:

  1. USB key

  2. USB disk

  3. GE card

  4. SATA disk

Deployment of XBAS on compute nodes

Follow the instructions given in the BAS5 for Xeon installation & configuration guide [22]. Here is the information that must be entered to the preparenfs tool in order to generate a kickstart file (the kickstart file could also be written manually with this information on other Linux distribution systems):

  • RHEL DVD is copied in: /release/RHEL5.1

  • partitioning method is: automatic (i.e., a predefined partitioning is used)ReplaceThisText

  • interactive mode is: not used (the installation is unattended)

  • VNC is: not enabled

  • BULL HPC installer is in: /release/XBAS5V1.1

  • node function is: COMPUTEX

  • optional BULL HPC software is: XIB

  • IP of NFS server is the default: 192.168.0.99

  • the nodes to be installed are: xbas[1-4]

  • hard reboot done by preparenfs: No

Once generated, the kickstart file needs a few modifications in order to fulfill the HOSC disk partition requirements: see an example of these modifications in Appendix D.2.1.

When the modifications are done, boot the compute nodes and the PXE mechanisms will start to install XBAS on the compute nodes with the information stored in the kickstart file. Figure 16 shows the console of a CN while it is PXE booting for its XBAS deployment.

XBAS compute node console during PXE boot

Figure 16 - XBAS compute node console while the node starts to PXE boot

It is possible to install every CN with the preparenfs tool or to install a single CN with the preparenfs tool and then duplicate it on every other CN servers with the help of the ksis deployment tool. However, the use of ksis is only possible if XBAS is the first installed OS, since ksis overwrites all existing partitions. So it is advisable to only use the preparenfs tool for CN installation on a HOSC.

Check that the /etc/hosts file is consistent on XBAS CNs (see Appendix D.2.5). Configure the IB interface on each node by editing file ifcfg-ib0 (see Appendix D.2.6) and enable the IB interface by starting the openibd service:

xbas1:root] service openibd start

In order to be able to boot Linux with the Windows MBR (after having installed HPCS on the CNs), install the GRUB boot loader on the first sector of the /boot partition by typing on each CN:

[xbas1:root] grub-install /dev/sda1

The last step is to edit all PXE files in /tftboot directory and set both TIMEOUT and PROMPT variables to 0 in order to boot compute nodes quicker.

Deployment of HPCS on compute nodes

On the XBAS management node, change the DHCP configuration file so the compute nodes point to the Windows WDS server when they PXE boot. Edit the DHCP configuration file /etc/dhcpd.conf for each CN host section and change the fields as shown in Appendix D.2.2 (filename, fixed-address, host-name next-server and server-name). The DHCP configuration file changes can be done by using the switch_dhcp_host script (see Appendix D.2.3) for each compute node. Once the changes are done in the file, the dhcpd service must be restarted in order to take changes into account. For example, type:

[xbas0:root] switch_dhcp_host xbas1
File /etc/dhcp.conf is updated with host hpcs1
[xbas0:root] switch_dhcp_host xbas2
File /etc/dhcp.conf is updated with host hpcs2
[...]
[xbas0:root] service dhcpd restart
Shutting down dhcpd:  [ OK ]
Starting dhcpd:  [ OK ]

Now prepare the deployment of the nodes for the HPCS management console: get the MAC address of all new compute nodes and create an XML file with the MAC address, compute node name and domain name of each node. An example of such an XML file (my_cluster_nodes.xml) is given in Appendix D.1.1. Import this XML file from the administrative console (see Figure 17) and assign a deployment “compute node template” to the nodes.

Import node XML interface

Figure 17 - Import node XML interface

Boot the compute nodes. Figure 18 shows the console of a CN while it is PXE booting for its HPCS deployment with a DHCP server on the XBAS management node (192.168.0.1) and a WDS server on the HPCS head node (192.168.1.1).

HPCS compute node console during PXE boot

Figure 18 - HPCS compute node console while the node starts to PXE boot

The nodes will appear with the “provisioning” state in the management console as shown in Figure 19.

Console showing provisioning compute nodes

Figure 19 - Management console showing the “provisioning” compute nodes

After a while the compute node console shows that the installation is complete as in Figure 20.

Console showing installation complete

Figure 20 - Compute node console shows that its installation is complete

At the end of the deployment, the compute node state is “offline” in the management console. The last step is to click on “Bring online” in order to change the state to “online”. The HPCS compute nodes can now be used.

5.3 Linux-Windows interoperability environment

In order to enhance the interoperability between the two management nodes, we set up a Unix/Linux environment on the HPCS head node using the Subsystem for Unix-based Applications (SUA). We also install SUA supplementary tools such as openssh that can be useful for HOSC administration tasks (e.g., ssh can be used to execute commands from a management node to the other in a safe manner).

The installation of SUA is not mandatory for setting up an HOSC and many tools can also be found from other sources but it is a rather easy and elegant way to have a homogeneous HOSC environment: firstly, it provides a lots of Unix tools on Windows systems, and secondly it provides a framework for porting and running Linux applications in a Windows environment.

The installation is done in the following three steps:

1. Installation of the Subsystem for Unix-based Applications (SUA)

The Subsystem for Unix-based Applications (SUA) is part of Windows Server 2008 distribution. To turn the SUA features on, open the “Server Manager” MMC, select the “Features” section in the left frame of the MMC and click on “Add Features”. Then check the box for “Subsystem for UNIX_based Applications” and click on “Next” and “Install”.

2. Installation of the Utilities and SDK for Unix-based Applications

Download “Utilities and SDK for UNIX-based Applications_AMD64.exe” from Microsoft web site [28]. Run the custom installation and select the following packages in addition to those included in the default installation: “GNU utilities” and “GNU SDK”.

3. Installation of add-on tools

Download the “Power User” add-on bundle available from Interops Systems [29] on the SUA community web site [30]. The provided installer pkg-current-bundleuser60.exe handles all package integration, environment variables and dependencies. Install the bundle on the HPCS head node. This will install, configure and start an openssh server daemon (sshd) on the HPCS HN.

Other tools, such as proprietary compilers, can also be installed in the SUA environment.

5.4 User accounts

Users must have the same login name on all nodes (XBAS and HPCS). As mentioned in Section 4.4, we decided not to use LDAP on our prototype but it is advised to use it on larger clusters.

User home directories should at least be shared on all compute nodes running the same OS: for example, an NFS exported directory /home_nfs/test_user/ on XBAS CNs and a shared CIFS directory C:\Users\test_user\ on HPCS CNs for user test_user.

It is also be possible (and even recommended) to have a unique home directory for both OSs by configuring Samba [36] on XBAS nodes.

5.5 Configuration of ssh

RSA key generation

Generate your RSA keys (or DSA keys, depending on your security policy) on the XBAS MN (see [23]):

[xbas0:root] ssh-keygen -t rsa -N ′′

This should also be done for each user account.

RSA key

Configure ssh so that it does not request to type a password when the root user connects from the XBAS MN to the other nodes. For the XBAS CNs (and the HPCS HN if openssh is installed with the SUA), copy the keys (private and public) generated on the XBAS MN.

For example, type:

[xbas0:root] cd /root/.ssh
[xbas0:root] cp id_rsa.pub authorized_keys
[xbas0:root] scp id_rsa id_rsa.pub authorized_keys root@hpcs0:.ssh/
[xbas0:root] scp id_rsa id_rsa.pub authorized_keys root@xbas1:.ssh/

Enter root password when requested (it will never be requested anymore later).

This should also be done for each user account.

For copying the RSA key on the HPCS CNs see “Configuration of freeSSHd on HPCS compute nodes” below.

By default, the first time a server connects to a new host it checks if its “server” RSA public key (stored in /etc/ssh/) is already known and it asks the user to validate the authenticity of this new host. In order to avoid typing the “yes” answer for each node of the cluster different ssh configurations are possible:

  • The easiest, but less secure, solution is to disable the host key checking in file /etc/ssh/ssh_config by setting: StrictHostKeyChecking no

  • Another way is to merge the RSA public key of all nodes in a file that is copied on each node: the /etc/ssh/ssh_known_hosts file. A trick is to duplicate the same server private key (stored in file /etc/ssh/ssh_host_rsa_key) and thus the same public key (stored in file /etc/ssh/ssh_host_rsa_key.pub) on every node. The generation of the ssh_known_hosts file is then easier since each node has the same public key. An example of such an ssh_known_hosts file is given in Appendix D.2.7.

Installation of freeSSHd on HPCS compute nodes

If you want to use PBS Professional and the OS balancing feature that was developed for our HOSC prototype, a ssh server daemon is required on each compute node. The sshd daemon is already installed by default on the XBAS CNs and it should be installed on the HPCS CNs: we chose the freeSSHd [34] freeware. This software can be downloaded from [34] and its installation is straight-forward: execute freeSSHd.exe, keep all default values proposed during the installation process and accept to “run FreeSSHd as a system service”.

Configuration of freeSSHd on HPCS compute nodes

In the freeSSHd configuration window:

  1. add the user “root'”

    1. select “Authorization: Public key (SSH only)“

    2. select “User can use: Shell”

  2. select “Password authentication: Disabled”

  3. select “Public key authentication: Required“

The configuration is stored in file C:\Program Files (x86)\freeSSHd\FreeSSHDService.ini so you can copy this file on each HPCS CN instead of configuring them one by one with the configuration window. You must modify the IP address field (SSHListenAddress=<CN_IP_address>) in the FreeSSHDService.ini file for each CN. The freeSSHd system service needs to be restarted to take the new configuration into account.

Then finish the setup by copying the RSA key file /root/.ssh/id_rsa.pub from the XBAS MN to file C:\Program Files (x86)\freeSSHd\ssh_authorized_keys\root on the HPCS CNs. Edit this file (C:\Program Files (x86)\freeSSHd\ssh_authorized_keys\root) and remove the @xbas0 string at the end of the file: it should end with the string root instead of root@xbas0.

5.6 Installation of PBS Professional

Note

Thanks to Laurent Aumis (SEMEA GridWorks Technical Manager at ALTAIR France) for his valuable help and expertise in setting up this PBS Professional configuration.

For installing PBS Professional on the HOSC cluster, first install the PBS Professional server on a management node (or at least on a server that shares the same subnet as all the HOSC nodes), then install PBS MOM (Machine Oriented Mini-server) on each CN (HPCS and XBAS). The basic information is given in this Section. For more detailed explanations follow the instructions of the PBS Professional Administrator’s Guide [31].

PBS Professional Server setup

Install PBS server on the XBAS MN: during the installation process, select “PBS Installation: 1. Server, execution and commands” (see [31] for detailed instructions). By default, the MOM (Machine Oriented Mini-server) is installed with the server. Since the MN should not be used as a compute node, stop PBS with “/etc/init.d/pbs stop”, disable the MOM by setting PBS_START_MOM=0 in file /etc/pbs.conf (see Appendix D.3.1) and restart PBS with “/etc/init.d/pbs start”.

If you want to use a UID/GID on Windows and Linux nodes without UID unified, you need to set the flag flatuid=true with the qmgr tool. UID/GID of PBS server will be used. Type:

[xbas0:root] qmgr
[xbas0:root] Qmgr: set server flatuid=True
[xbas0:root] Qmgr: exit

PBS Professional setup on XBAS compute nodes

Install PBS MOM on the XBAS CNs: during the installation process, select “PBS Installation: 2. Execution only” (see [31]). Add PBS_SCP=/usr/bin/scp in file /etc/pbs.conf (see Appendix D.3.1) and restart PBS MOM with “/etc/init.d/pbs restart”.

It would also be possible to use $usecp in PBS to move files around instead of scp. Samba [36] could be configured on Linux systems to allow the HPCS compute nodes to drop files directly to Linux servers.

PBS Professional setup on HPCS nodes

First, log on the HPCS HN and create a new user in the cluster domain for PBS administration: pbsadmin. Create file lmhosts on each HPCS node with PBS server hostname and IP address (as shown in Appendix D.3.2). Then install PBS Professional on each HPCS node:

  1. select setup type “Execution” (only) on CNs and “Commands” (only) on the HN

  2. enter pbsadmin user password (as defined on the PBS server: on XBAS MN in our case)

  3. enter PBS server hostname (xbas0 in our case)

  4. keep all other default values that are proposed by the PBS installer

  5. reboot the node

5.7 Meta-Scheduler queues setup

Create queues for each OS type and set the default_chunk.arch accordingly (it must be consistent with the resources_available.arch field of the nodes). Here is a summary of the PBS Professional configuration on our HOSC prototype. The following is a selection of the most representative information reported by the PBS queue manager (qmgr):

Qmgr: print server
# Create and define queue windowsq
create queue windowsq
set queue windowsq queue_type = Execution
set queue windowsq default_chunk.arch = windows
set queue windowsq enabled = True
set queue windowsq started = True
# Create and define queue linuxq
create queue linuxq
set queue linuxq queue_type = Execution
set queue linuxq default_chunk.arch = linux
set queue linuxq enabled = True
set queue linuxq started = True
# Set server attributes.
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 60
set server flatuid = True
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 3600
set server license_count = "Avail_Global:0 Avail_Local:1024 Used:0 High_Use:8"
set server eligible_time_enable = False
Qmgr: print node xbas1
# Create and define node xbas1
create node xbas1
set node xbas1 state = free
set node xbas1 resources_available.arch = linux
set node xbas1 resources_available.host = xbas1
set node xbas1 resources_available.mem = 16440160kb
set node xbas1 resources_available.ncpus = 4
set node xbas1 resources_available.vnode = xbas1
set node xbas1 resv_enable = True
set node xbas1 sharing = default_shared
Qmgr: print node hpcs2
# Create and define node hpcs2
create node hpcs2
set node hpcs2 state = free
set node hpcs2 resources_available.arch = windows
set node hpcs2 resources_available.host = hpcs2
set node hpcs2 resources_available.mem = 16775252kb
set node hpcs2 resources_available.ncpus = 4
set node hpcs2 resources_available.vnode = hpcs2
set node hpcs2 resv_enable = True
set node hpcs2 sharing = default_shared

Just in time provisioning setup

This paragraph describes the implementation of a simple example of “just in time” provisioning (see Section 3.6). We developed a Perl script (see pbs_hosc_os_balancing.pl in Appendix D.3.2) that gets PBS server information about queues, jobs and nodes for both OSs (e.g., number of free nodes, number of nodes requested by jobs in queues, number of nodes requested by the smallest job). Based on this information, the script checks a simple rule that defines the cases when the OS type of CNs should be switched. If the rule is “true” then the script selects free CNs and switches their OS type. In our example, we defined a conservative rule (i.e., the number of automatic OS switches is kept low):

“Let us define η as the smallest number of nodes requested by a queued job for a given OS type A. Let us define α (respectively β) as the number of free nodes with the OS type A (respectively B). If η>α (i.e., there are not enough free nodes to run the submitted job with OS type A) and if β≥η-α (at least η-α nodes are free with the OS type B) then the OS type of η-α nodes should be switched from B to A”.

The script is run periodically based on the schedule defined by the crontab of the PBS server host. The administrator can also switch more OSs manually if necessary at any time (see Sections 6.3 and 6.4). The crontab setup can be done by editing the following lines with the crontab command.

Note

crontab -e” opens the /var/spool/cron/root file in a vi mode and restarts the cron service automatically.

[xbas0:root] crontab -e
# run HOSC Operating System balancing script every 10 minutes (noted */10)
*/10 * * * * /opt/hosc/pbs_hosc_os_balancing.pl

The OS distribution balancing is then controlled by this cron job. Instead of running the pbs_hosc_os_balancing.pl script as a cron job, it would also be possible to call it as an external scheduling resource sensor (see [31] for information about PBS Professional scheduling resources), or to call it with PBS Professional hooks (see [31]). For developing complex OS balancing rules, the Perl script could be replaced by a C program (for details about PBS Professional API see [33]).

This simple script could be further developed in order to be more reliable. For example:

  • check that the script is only run once at a time (by setting a lock file for example)

  • allow to switch the OS type of more than η-α nodes at once if the number of free nodes and the number of queued jobs is high (this can happen when many small jobs are submitted)

  • impose a delay between two possible switches of OS type on each compute node

Calendar provisioning setup

This paragraph just gives the main ideas to setup calendar provisioning (see Section 3.6). As for the previous provisioning example, the setup should rely on the cron mechanism. A script that can switch the OS type of a given number of compute nodes could easily be written (by slightly modifying the scripts provided in Appendix of this paper). This script could be run hourly as a cron job and it could read the requested number of nodes with each OS type from a configuration file written by the administrator.