Appendix D: Files Used in Examples
Here are the files (scripts, configuration files, etc.) written or modified to build the HOSC prototype and to validate information given in this document.
D.1 Windows HPC Server 2008 files
The first 2 files are used by the deployment template and they need to be modified in order to fulfill the HOSC requirements. The 3rd XML file is used for template deployment based on CN MAC addresses.
D.1.1 Files used for compute node deployment
C:\Program Files\Microsoft HPC Pack\Data\InstallShare\unattend.xml
C:\Program Files\Microsoft HPC Pack\Data\InstallShare\Config\diskpart.txt
my_cluster_nodes.xml
<?xml version="1.0" encoding="utf-8"?>
<Nodes xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="https://www.w3.org/2001/XMLSchema" xmlns="https://schemas.microsoft.com/HpcNodeConfigurationFile/2007/12">
<Node Name="hpcs1" Domain="WINISV">
<MacAddress>003048334cf6</MacAddress>
</Node>
<Node Name="hpcs2" Domain="WINISV">
<MacAddress>003048334d04</MacAddress>
</Node>
<Node Name="hpcs3" Domain="WINISV">
<MacAddress>003048334d3c</MacAddress>
</Node>
<Node Name="hpcs4" Domain="WINISV">
<MacAddress>003048347990</MacAddress>
</Node>
</Nodes>
D.1.2 Script for IPoIB setup
setIPoIB.vbs
set objargs=wscript.arguments
Set fs=CreateObject("Scripting.FileSystemObject")
Set WshNetwork = WScript.CreateObject("WScript.Network")
wscript.sleep(10000)
hostname=WshNetwork.ComputerName
ip=GetIP(hostname)
Set logFile = fs.opentextfile("c:\netconfig.log",8,True)
WScript.Echo "Computername: " & hostname
WScript.Echo "IP: " & ip
logfile.writeline("Computername: " & hostname)
logfile.writeline("IP: " & ip)
res=setIPoIB(ip)
logfile.writeline(res)
wscript.echo res
'-------------------------------------------------------------------------
Function GetIP(hostname)
set sh = createobject("wscript.shell")
set fso = createobject("scripting.filesystemobject")
workfile = "c:\PrivateIPadress.txt"
sh.run "%comspec% /c netsh interface ip show addresses private > " & workfile,0,true
Set ts = fso.opentextfile(workfile)
data = split(ts.readall,vbcr)
ts.close
fso.deletefile workfile
for n = 0 to ubound(data)
if instr(data(n),"Address") then
parts = split(data(n),":")
GetIP= trim(cstr(parts(1)))
end if
IP = "could not resolve IP address"
Next
End Function
'---------------------------------------------------------------------
Function setIPoIB(IPAddress)
PartialIP=Split(ipaddress,".")
strIPAddress = Array("10.1.0." & PartialIP(3))
strSubnetMask = Array("255.255.255.0")
strGatewayMetric = Array(1)
WScript.Echo "IB: " & strIPAddress(0)
strComputer = "."
Set objWMIService = GetObject("winmgmts:" _
& "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colNetAdapters = objWMIService.ExecQuery _
("select * from win32_networkadapterconfiguration where IPEnabled=true and description like 'Mellanox%'")
For Each objNetAdapter in colNetAdapters
errEnable = objNetAdapter.EnableStatic(strIPAddress, strSubnetMask)
If errEnable = 0 Then
SetIPoIB="The IP address on Infiniband has been changed"
Else
SetIPoIB="The IP address on IB could not be changed. Error: " & errEnable
End If
Next
End Function
D.1.3 Scripts used for OS switch
Here are the scripts developed on the HPCS head node to switch the OS type of a compute node from HPCS to XBAS:
C:\hosc\activate_partition_XBAS.bat
@echo off
rem the argument is the head node hostname for shared file system mount. For example: \\HPCS0
echo ... Partitioning disk...
diskpart.exe /s %1\hosc\diskpart_commands.txt
echo ... Shutting down node %COMPUTERNAME% ...
shutdown /r /f /t 20 /d p:2:4
C:\hosc\diskpart_commands.txt
select disk 0
select partition 1
active
C:\hosc\from_HPCS_to_XBAS.bat
@echo off
rem the argument is the node hostname. For example: hpcs1
echo Check that file dhcpd.conf is updated on the XBAS management node !
if NOT "%1"=="" clusrun /nodes:%1 %LOGONSERVER%\hosc\activate_partition_XBAS.bat %LOGONSERVER%
if "%1"=="" echo "usage: from_HPCS_to_XBAS.bat <hpcs_hostname>"
D.2 XBAS files
D.2.1 Kickstart and PXE files
Here is an example of modifications that must be done in the kickstart file generated by the preparenfs tool in order to fulfill the HOSC requirements:
/release/ks/kickstart.<identifier> (for example kickstart.22038)
…
part / --asprimary --fstype="ext3" --ondisk=sda --size=10000
part /usr --asprimary --fstype="ext3" --ondisk=sda --size=10000
part /opt --fstype="ext3" --ondisk=sda --size=10000
part /tmp --fstype="ext3" --ondisk=sda --size=10000
…
part /boot --asprimary --fstype="ext3" --ondisk=sda --size=100
part / --asprimary --fstype="ext3" --ondisk=sda --size=50000
Here is an example of a PXE file generated by preparenfs for node xbas1. Before deployment, the DEFAULT label is set to ks and after deployment the DEFAULT label is set to local_primary
automatically.
/tftboot/C0A80002 (complete file before compute node deployment)
# GENERATED BY PREPARENFS SCRIPT
TIMEOUT 10
DEFAULT ks
PROMPT 1
LABEL local_primary
KERNEL chain.c32
APPEND hd0
LABEL ks
KERNEL RHEL5.1/vmlinuz
APPEND console=tty0 console=ttyS1,115200 ksdevice=eth0 lang=en ip=dhcp ks=nfs:192.168.0.99:/release/ks/kickstart.22038 initrd=RHEL5.1/initrd.img driverload=igb
LABEL rescue
KERNEL RHEL5.1/vmlinuz
APPEND console=ttyS1,115200 ksdevice=eth0 lang=en ip=dhcp method=nfs:192.168.0.99:/release/RHEL5.1 initrd=RHEL5.1/initrd.img rescue driverload=igb
/tftboot/C0A80002 (head of the file after compute node deployment)
# GENERATED BY PREPARENFS SCRIPT
TIMEOUT 10
DEFAULT local_primary
PROMPT 1
The remainder of the file is unchanged. Set TIMEOUT and PROMPT to 0 in order to boot nodes quicker.
D.2.2 DHCP configuration
The initial DHCP configuration file must be changed for HPCS CN deployment: the global next-server field must be deleted and each CN host section must be modified as shown below:
/etc/dhcpd.conf
next-server 192.168.0.99;
########### END GLOBAL PARAMETERS
subnet 192.168.0.0 netmask 255.255.0.0{
authoritative;
host xbas1 {
filename "pxelinux.0";
fixed-address 192.168.0.2;
hardware ethernet
# global “next-server” entry is removed.
########### END GLOBAL PARAMETERS
subnet 192.168.0.0 netmask 255.255.0.0{
authoritative;
host hpcs1 {
filename "Boot\\x64\\WdsNbp.com";
fixed-address 192.168.1.2;
hardware ethernet 00:30:48:33:4c:f6;
option host-name "hpcs1";
next-server 192.168.1.1;
Note
This modification can be done by the switch_dhcp_host
script.
The NBP file path must be written with double \\ in order to be correctly interpreted during the PXE boot.
D.2.3 Scripts used for OS switch
Here are the scripts developed on the XBAS management node to switch the OS of a compute node:
/opt/hosc/switch_dhcp_host
#!/usr/bin/python -t
import os, os.path, sys
############## Cluster characteristics must be written here ################
xbas_hostname_base='xbas'
hpcs_hostname_base='hpcs'
field_dict = {hpcs_hostname_base:{'filename':'"Boot\\\\x64\\\\WdsNbp.com";\n',
'fixed-address':'192.168.1.',
'next-server':'192.168.1.1;\n',
'server-name':'"192.168.1.1";\n'},
xbas_hostname_base:{'filename':'"pxelinux.0";\n',
'fixed-address':'192.168.0.',
'next-server':'192.168.0.1;\n',
'server-name':'"192.168.0.1";\n'}}
if (len(sys.argv) <> 2):
print ('usage: switch_dhcp_host <current compute node hostname>')
sys.exit(1)
elif (len(str(sys.argv[1]))>1) and (str(sys.argv[1])[-2:].isdigit()):
node_base = str(sys.argv[1])[:-2]
node_rank = str(sys.argv[1])[-2:]
else:
node_base = str(sys.argv[1])[:-1]
node_rank = str(sys.argv[1])[-1:]
if (node_base == xbas_hostname_base ):
old_hostname= xbas_hostname_base + node_rank
new_hostname=hpcs_hostname_base + node_rank
new_node_base = hpcs_hostname_base
elif (node_base == hpcs_hostname_base):
old_hostname=hpcs_hostname_base + node_rank
new_hostname= xbas_hostname_base + node_rank
new_node_base = xbas_hostname_base
else:
print ('unknown hostname: ' + sys.argv[1])
sys.exit(1)
file_name = '/etc/dhcpd.conf'
if not os.path.isfile(file_name):
print file_name + ' does not exists !'
sys.exit(1)
status = 'File ' + file_name + ' was not modified'
file_name_save = file_name + '.save'
file_name_temp = file_name + '.temp'
old_file = open(file_name,'r')
new_file = open(file_name_temp,'w')
S = old_file.readline()
while S:
if (S[0:11] == 'next-server'): S = old_file.readline() # Removes global next-server line
if (S.find('host ' + old_hostname) <> -1):
while (S.find('hardware ethernet') == -1):
S = old_file.readline() # Skips old host section lines
hardware_ethernet=S.split()[2] # Gets host Mac address
while (S.find('}') == -1):
S = old_file.readline() # Skips old host section lines
# Writes new host section lines:
new_file.write(' host ' + new_hostname + ' {\n')
new_file.write(' filename ' + field_dict[new_node_base]['filename'])
new_file.write(' fixed-address ' + field_dict[new_node_base]['fixed-address']
+ str(int(node_rank)+1) + ';\n')
new_file.write(' hardware ethernet ' + hardware_ethernet + '\n')
new_file.write(' option host-name ' + '"' + new_hostname + '";\n')
new_file.write(' next-server ' + field_dict[new_node_base]['next-server'])
new_file.write(' server-name ' + field_dict[new_node_base]['server-name'])
if (new_node_base == hpcs_hostname_base):
new_file.write('option domain-name-servers '+field_dict[new_node_base]['next-server'])
new_file.write(' }\n')
status = 'File ' + file_name + ' is updated with host ' + new_hostname
else: new_file.write(S) # Copies the line from the original file without modifications
S = old_file.readline()
# End while loop
old_file.close()
new_file.close()
if os.path.isfile(file_name_save): os.remove(file_name_save)
os.rename(file_name,file_name_save)
os.rename(file_name_temp,file_name)
print status
print ('Do not forget to validate changes by typing: service dhcpd restart')
sys.exit(0)
# End of switch_dhcp_host script
/opt/hosc/activate_partition_HPCS.sh
#!/bin/sh
#the argument is the node hostname. For example: xbas1
ssh $1 fdisk /dev/sda < /opt/hosc/fdisk_commands.txt
/opt/hosc/fdisk_commands.txt
a
4
a
1
w
q
/opt/hosc/from_XBAS_to_HPCS.sh
#!/bin/sh
#the argument is the node hostname. For example: xbas1
/opt/hosc/switch_dhcp_host $1
/sbin/service dhcpd restart
/opt/hosc/activate_partition_HPCS.sh $1
ssh $1 shutdown -r -t 20 now
/opt/hosc/from_HPCS_to_XBAS.sh
#!/bin/sh
#this script requires a ssh server daemon to be installed on the HPCS compute nodes
#the argument is the compute node hostname. For example: hpcs1
#HPCS head node hostname is hard coded in this script as: hpcs0
/opt/hosc/switch_dhcp_host $1
/sbin/service dhcpd restart
ssh $1 -l root cmd /c \\\\hpcs0\\hosc\\activate_partition_XBAS.bat \\\\hpcs0
D.2.4 Network interface bridge configuration
For configuring 2 network interface bridges, xenbr0 and xenbr1, replace the following line in file:
/etc/xen/xen-config.sxp
(network-script network-bridge)
(network-script my-network-bridges)
Then create file:
/etc/xen/scripts/my-network-bridges
#!/bin/bash
XENDIR="/etc/xen/scripts"
$XENDIR/network-bridge "$@" netdev=eth0 bridge=xenbr0 vifnum=0
$XENDIR/network-bridge "$@" netdev=eth1 bridge=xenbr1 vifnum=1
D.2.5 Network hosts
The hosts file declares the IP addresses of the network interfaces of Linux nodes. XBAS CNs needs to have the same hosts file. Here is an example for our HOSC cluster:
/etc/hosts
127.0.0.1localhost.localdomainlocalhost
192.168.0.1xbas0
192.168.0.2xbas1
192.168.0.3xbas2
192.168.0.4xbas3
192.168.0.5xbas4
172.16.0.1xbas0-ic0
172.16.0.2xbas1-ic0
172.16.0.3xbas2-ic0
172.16.0.4xbas3-ic0
172.16.0.5xbas4-ic0
D.2.6 IB network interface configuration
For configuring the IB interface on each node, create/edit the following file with the right IP address. Here is an example for the compute node xbas1:
/etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
ONBOOT=yes
BOOTPROTO=static
NETWORK=192.168.220.0
IPADDR=192.168.220.2
D.2.7 ssh host configuration
/etc/ssh/ssh_known_hosts
xbas0,192.168.0.1 ssh-rsa AAAB3NzaC1yc2EAAABIwAAAQE/yiPG/x5gl+dq5XXhffF456fggDFt … lC92dxQUE5qQ==
xbas1,192.168.0.2 ssh-rsa AAAB3NzaC1yc2EAAABIwAAAQE/yiPG/x5gl+dq5XXhffF456fggDFt … lC92dxQUE5qQ==
xbas2,192.168.0.3 ssh-rsa AAAB3NzaC1yc2EAAABIwAAAQE/yiPG/x5gl+dq5XXhffF456fggDFt … lC92dxQUE5qQ==
xbas3,192.168.0.4 ssh-rsa AAAB3NzaC1yc2EAAABIwAAAQE/yiPG/x5gl+dq5XXhffF456fggDFt … lC92dxQUE5qQ==
xbas4,192.168.0.5 ssh-rsa AAAB3NzaC1yc2EAAABIwAAAQE/yiPG/x5gl+dq5XXhffF456fggDFt … lC92dxQUE5qQ==
D.3 Meta-scheduler setup files
D.3.1 PBS Professional configuration files on XBAS
Here is an example of PBS Professional configuration file for PBS server on the XBAS MN:
/etc/pbs.conf
PBS_EXEC= /opt/pbs/default
PBS_HOME=/var/spool/PBS
PBS_START_SERVER=1
PBS_START_MOM=0
PBS_START_SCHED=1
PBS_SERVER=xbas0
PBS_SCP=/usr/bin/scp
Here is an example of PBS Professional configuration file for PBS MOM on the XBAS CNs:
/etc/pbs.conf
PBS_EXEC= /opt/pbs/default
PBS_HOME=/var/spool/PBS
PBS_START_SERVER=0
PBS_START_MOM=1
PBS_START_SCHED=0
PBS_SERVER=xbas0
PBS_SCP=/usr/bin/scp
D.3.2 PBS Professional configuration files on HPCS
Here is an example of the lmhosts
file needed on HPCS nodes:
C:\Windows\System32\drivers\etc\lmhosts
|
|
|
D.3.3 OS load balancing files
This script gets information from the PBS server and switches the OS type of compute nodes according to the rule defined in Section 5.7:
“Let us define η as the smallest number of nodes requested by a queued job for a given OS type A. Let us define α (respectively β) as the number of free nodes with the OS type A (respectively B). If η>α (i.e., there are not enough free nodes to run the submitted job with OS type A) and if β≥η-α (at least η-α nodes are free with the OS type B) then the OS type of η-α nodes should be switched from B to A”.
/opt/hosc/pbs_hosc_os_balancing.pl
#!/usr/bin/perl
#use strict;
#Gets information with pbsnodes about free nodes
$command_pbsnodes = "/usr/pbs/bin/pbsnodes -a |";
open (PBSC, $command_pbsnodes ) or die "Failed to run command: $command_pbsnodes";
@cmd_output = <PBSC>;
close (PBSC);
foreach $line (@cmd_output) {
if (($line !~ /^(\s+)\w+/) && ($line !~ /^(\s+)$/) &&($line =~ /^(.*)\s+/)) {
$nodename = $1;
push (@pbsnodelist, $nodename);
$pbsnodes->{$nodename}->{state} = 'unknown';
$pbsnodes->{$nodename}->{arch} = 'unknown';
} elsif ($line =~ "state") {
$pbsnodes->{$nodename}->{state} = (split(' ', $line))[2];
} elsif ($line =~ "arch") {
$pbsnodes->{$nodename}->{arch} = (split(' ', $line))[2];
}
}
foreach my $node (@pbsnodelist) {
if ($pbsnodes->{$node}->{state}=~"free") {
if ($pbsnodes->{$node}->{arch}=~"linux") {
push (@free_linux_nodes, $node);
} else {
push (@free_windows_nodes, $node);
}
}
}
#Gets information with qstat about the number of nodes requested by queued jobs
$command_qstat = "/usr/pbs/bin/qstat -a |";
open (PBSC, $command_qstat ) or die "Failed to run command: $command_qstat";
@cmd_output = <PBSC>;
close (PBSC);
$nb_windows_nodes_of_smallest_job = 1e09;
$nb_linux_nodes_of_smallest_job = 1e09;
foreach $line (@cmd_output) {
if ((split(' ', $line))[9] =~ "Q") {
$nb_nodes = (split(' ', $line))[5];
if ($line =~ "windowsq") {
$nb_windows_nodes_queued += $nb_nodes;
if ($nb_nodes < $nb_windows_nodes_of_smallest_job) {
$nb_windows_nodes_of_smallest_job = $nb_nodes;
}
} elsif ($line =~ "linuxq") {
$nb_linux_nodes_queued += $nb_nodes;
if ($nb_nodes < $nb_linux_nodes_of_smallest_job) {
$nb_linux_nodes_of_smallest_job = $nb_nodes;
}
}
}
}
#STDOUT is redirected to a LOG file
open LOG, ">>/tmp/pbs_hosc_log.txt";
select LOG;
#Compute the number of possible requested nodes whose OS type should be switched
$requested_windows_nodes = $nb_windows_nodes_of_smallest_job - scalar @free_windows_nodes;
$requested_linux_nodes = $nb_linux_nodes_of_smallest_job - scalar @free_linux_nodes;
#The decision rule based on previous information is applied
if (($nb_windows_nodes_of_smallest_job > scalar @free_windows_nodes) &&
(scalar @free_linux_nodes >= $requested_windows_nodes)){
#switch $requested_windows_nodes nodes from XBAS to HPCS
for ($i = 0; $i < $requested_windows_nodes; $i++) {
$command_offline = "/usr/pbs/bin/pbsnodes -o $free_linux_nodes[$i]";
system ($command_offline);
$command_switch_to_HPCS = "/opt/hosc/from_XBAS_to_HPCS.sh $free_linux_nodes[$i]";
system ($command_switch_to_HPCS);
($new_node = $free_linux_nodes[$i]) =~ s/xbas/hpcs/;
$command_online = "/usr/pbs/bin/pbsnodes -c $new_node";
system ($command_online);
print "switch OS type from XBAS to HPCS: $free_linux_nodes[$i] -> $new_node\n";
}
} elsif (($nb_linux_nodes_of_smallest_job > scalar @free_linux_nodes) &&
(scalar @free_windows_nodes >= $requested_linux_nodes)) {
#switch $requested_linux_nodes nodes from HPCS to XBAS
for ($i = 0; $i < $requested_linux_nodes; $i++) {
$command_offline = "/usr/pbs/bin/pbsnodes -o $free_windows_nodes[$i]";
system ($command_offline);
$command_switch_to_XBAS= "/opt/hosc/from_HPCS_to_XBAS.sh $free_windows_nodes[$i]";
system ($command_switch_to_XBAS);
($new_node = $free_windows_nodes[$i]) =~ s/hpcs/xbas/;
$command_online = "/usr/pbs/bin/pbsnodes -c $new_node";
system ($command_online);
print "switch OS type from HPCS to XBAS: $free_windows_nodes[$i] -> $new_node\n";
}
}
close LOG;
The above script is run periodically every 10 minutes as defined by the crontab file:
/var/spool/cron/root
# run HOSC Operating System balancing script every 10 minutes (noted */10)
*/10 * * * * /opt/hosc/pbs_hosc_os_balancing.pl