Skip to content

SLURM Multinode on Azure

Launch Login Node

Prepare User Data

When launching a login node it is worth considering what user data options to provide. While it is not required, user data can provide powerful customisation at launch that can further streamline the cluster build process.

There are several options that can be added to change how a compute node will contact nodes on startup.

  • Sharing public ssh key to clients:
    • Instead of manually obtaining and sharing the root public SSH key (passwordless root ssh is required for flight profile) this can be shared over the local network with SHAREPUBKEY=true
  • Add an auth key:
    • Add the line AUTH_KEY=<string>. This means that the node will only accept incoming flight hunter nodes that provide a matching authorisation key
An example of all mentioned lines in a single cloud init script.
#cloud-config
write_files:
  - content: |
      SHAREPUBKEY=true
      AUTH_KEY=banana
    path: /opt/flight/cloudinit.in
    permissions: '0600'
    owner: root:root

Info

More information on available user data options for Flight Solo via the user data documentation

Deploy

To set up a cluster, you will need to import a Flight Solo image.

  1. Go to the Microsoft Azure portal.

  2. Go to Virtual Machines, and click "Create".

  3. Select "Azure virtual machine", which will load this page:

  4. On the Basics page:

    1. Set Subscription to your subscription type.
    2. Set Resource Group to your desired resource group (where the vm will be kept after creation).
    3. Set Virtual machine name to any suitable name. (- does not work in a name)
    4. Set Image to the imported Flight Solo Image.
      1. It may be necessary to open the drop-down and/or see all images in order to find the imported image.
      2. Scroll down to see more options
    5. Set Size to your choice of size.
    6. Set Authentication type to SSH public key
    7. Set Username to any suitable username.
    8. Set SSH public key source to the most suitable option, but remember what key was used if creating compute nodes later.
    9. Fill in the Key pair name/Stored key/Use existing key as appropriate to the chosen public key source.
    10. Allow traffic to selected ports, and select SSH(22), HTTP(80) and HTTPS(443) as the allowed ports.
    11. Set the most appropriate license type.
  5. Continuing on to the next page, Disks, all necessary details should already be filled out, so this page can be skipped (unless you know what you want to change). However, it is recommended to select Delete with VM.

  6. Go on to the networking tab and fill out the necessary options.

    1. Set Virtual Network or create a new one by pressing "Create new" and setting a name. Remember what this is for if you create compute nodes.
    2. Set Subnet to one of the options in the drop-down menu, if it isn't already set. Remember what this is for if you create compute nodes.
    3. Set Public IP to an existing public IP or create a new one by pressing "Create new" and setting a name.
    4. Set NIC network security group to "Advanced", and press "Create new" to create a new security group.
      1. Click on "Add an inbound rule" to open the inbound rule creator
      2. Create rules to allow HTTP, HTTPS and SSH traffic from your IP address to the security group.
      3. When complete, press "OK" at the bottom left of the screen to return to image creation.
  7. The Management, Monitoring and Tags tabs have more options that aren't necessary for setup. Skip to the tab Advanced

  8. In the Custom data and cloud init section, there is a text box. This is where your user data can be specified

  9. Azure will take some time to review your settings. If there are no issues click "Create" to finish creation.

Launch Compute Nodes

Prepare User Data

Setting up compute nodes is done slightly differently than a login node. The basic steps are the same except subnets, networks and security groups need to match the ones used for the login node.

This is the smallest amount of cloud init data necessary. It allows the login node to find the compute nodes as long as they are on the same network, and ssh into them from the root user (which is necessary for setup).

#cloud-config
users:
  - default
  - name: root
    ssh_authorized_keys:
      - <Content of ~/.ssh/id_alcescluster.pub from root user on login node>

Tip

The above is not required if the SHAREPUBKEY option was provided to the login node. If this was the case then the SERVER option provided to the compute node will be enough to enable root access from the login node.

There are several options that can be added to change how a compute node will contact nodes on startup.

  • Sending to a specific server:
    • Instead of broadcasting across a range, add the line SERVER=<private server IP> to send to specifically that node, which would be your login node.
  • Add an auth key:
    • Add the line AUTH_KEY=<string>. This means that the compute node will send it's flight hunter packet with this key. This must match the auth key provided to your login node
An example of all mentioned lines in a single cloud init script.
#cloud-config
write_files:
  - content: |
      SERVER=10.10.0.1
      AUTH_KEY=banana
    path: /opt/flight/cloudinit.in
    permissions: '0600'
    owner: root:root
users:
  - default
  - name: root
    ssh_authorized_keys:
      - <Content of ~/.ssh/id_alcescluster.pub from root user on login node>

Info

More information on available user data options for Flight Solo via the user data documentation

Deploy

  1. Go to the Microsoft Azure portal.

  2. Go to Virtual Machines, and click "Create".

  3. Select "Azure virtual machine", which will load this page:

  4. On the Basics page:

    1. Set Subscription to your subscription type.
    2. Set Resource Group to the same as the login node
    3. Set Virtual machine name to any suitable name.
    4. Set Image to the imported Flight Solo Image.
      1. It may be necessary to open the drop-down and/or see all images in order to find the imported image.
      2. Scroll down to see more options
    5. Set Size to your choice of size.
    6. Set Authentication type to SSH public key
    7. Set Username to the same username as with the login node.
    8. Set SSH public key source to the same key that was used for the login node.
    9. Fill in the Key pair name/Stored key/Use existing key as appropriate to the chosen public key source.
    10. Allow traffic to selected ports, and select SSH(22, HTTP(80) and HTTPS(443) as the allowed ports.
    11. Set the most appropriate license type.
  5. Continuing on to the next page, Disks, all necessary details should already be filled out, so this page can be skipped (unless you know what you want to change). However, it is recommended to select Delete with VM.

  6. Go on to the networking tab and fill out the necessary options.

    1. Set Virtual Network to the same network that was used for the login node.
    2. Set Subnet to the same subnet that was used for the login node.
    3. Set NIC network security group to the same subnet that was used for login node.
    4. When complete, press "OK" at the bottom left of the screen to return to image creation.
  7. The Management and Monitoring tabs have more options that aren't necessary for setup. Skip to the Advanced tab.

  8. In the Custom data and cloud init section, there is a text box. Write a cloud init script as prepared earlier in the custom data section

  9. Skip to the Review + Create section. Azure will take some time to review your settings. If there are no issues click "Create" to finish creation.

General Configuration

Create Node Inventory

  1. Parse your node(s) with the command flight hunter parse.

    1. This will display a list of hunted nodes, for example

      [flight@login-node.novalocal ~]$ flight hunter parse
      Select nodes: (Scroll for more nodes)  login-node.novalocal - 10.10.0.1
         compute-node-1.novalocal - 10.10.101.1
      

    2. Select the desired node to be parsed with Space, and you will be taken to the label editor

      Choose label: login-node.novalocal
      

    3. Here, you can edit the label like plain text

      Choose label: login1
      

      Tip

      You can clear the current node name by pressing Down in the label editor.

    4. When done editing, press Enter to save. The modified node label will appear next to the ip address and original node label.

      Select nodes: login-node.novalocal - 10.10.0.1 (login1) (Scroll for more nodes)  login-node.novalocal - 10.10.0.1 (login1)
         compute-node-1.novalocal - 10.10.101.1
      

    5. From this point, you can either hit Enter to finish parsing and process the selected nodes, or continue changing nodes. Either way, you can return to this list by running flight hunter parse.

    6. Save the node inventory before moving on to the next step.

      Tip

      See flight hunter parse -h for more ways to parse nodes.

Add genders

  1. Optionally, you may add genders to the newly parsed node. For example, in the case that the node should have the gender cluster and all then run the command:
    flight hunter modify-groups --add cluster,all login1
    

SLURM Multinode Configuration

  1. Configure profile

    flight profile configure
    
    1. This brings up a UI, where several options need to be set. Use up and down arrow keys to scroll through options and enter to move to the next option. Options in brackets coloured yellow are the default options that will be applied if nothing is entered.
      • Cluster type: The type of cluster setup needed, in this case Slurm Multinode.
      • Cluster name: The name of the cluster.
      • Setup Multi User Environment with IPA?: Boolean value to determine whether to configure a multi-user environment with IPA. If set to true then the following will need to be filled in
        • IPA domain: The domain for the IPA server to use.
        • IPA secure admin password: The password to be used by the admin user of the IPA installation to manage the server.
      • Default user: The user that you log in with.
      • Set user password: Set a password to be used for the chosen default user.
      • IP or FQDN for Web Access: As described here, this could be the public IP or public hostname.
      • IP range of compute nodes: The IP range of the compute nodes used, remember to add the netmask. E.g. 172.31.16.0/20
  2. Apply identities by running the command flight profile apply

    1. First apply an identity to the login node

      flight profile apply login1 login
      

    2. Wait for the login node identity to finish applying. You can check the status of all nodes with flight profile list.

      Tip

      You can watch the progress of the application with flight profile view login1 --watch

    3. Apply an identity to the each of the compute nodes (in this example, genders-style syntax is used to apply to node01 and node02)

      flight profile apply node[01-02] compute
      

      Tip

      You can check all available identities for the current profile with flight profile identities

Success

Congratulations, you've now created a SLURM Multinode environment! Learn more about SLURM in the HPC Environment docs.

Verifying Functionality

  1. Create a file called simplejobscript.sh, and copy this into it:

    #!/bin/bash -l
    echo "Starting running on host $HOSTNAME"
    sleep 30
    echo "Finished running - goodbye from $HOSTNAME"
    

  2. Run the script with sbatch simplejobscript.sh, and to test all your nodes try queuing up enough jobs that all nodes will have to run.

  3. In the directory that the job was submitted from there should be a slurm-X.out where X is the Job ID returned from the sbatch command. This will contain the echo messages from the script created in step 1