Frequently Asked Question

4.) Getting Started on Arctur2: system overview and job management

Nazadnje posodobljeno pred 2 leti

Getting Started on Arctur2

This tutorial will guide you through your first steps on Arctur-2.

Before proceeding:

make sure you have an account (if not, follow this procedure), and an SSH client.
ensure you operate from a Linux / Mac environment. Most commands below assumes running in a Terminal in this context. If you're running Windows, you can use Putty and simillar tools, yet it's probably better that you familiarize "natively" with a Linux-based environment by having a Linux Virtual Machine (consider VirtualBox for that).

Arctur-2 system overview.

Firstly, we will take a quick look how Arctur-2 is organized and go over the general specifications.
Currently, we have 3 different partitions:

compute
gpu

The ' compute' partition is made out of 14 nodes, with the names node01 to node14. This is the default partition and your jobs will run on it if not specified otherwise.
Each of these nodes have 2 Intel Xeon E5-2690v4 processors, together having 28 cores clocked at 2.60 GHz. Every node also has 512GB of fast DDR4 RAM. They also have 480GB of local SSD storage.

The ' gpu' partition consists of 8 nodes (gpu01 to gpu08). The only difference with compute nodes is that they each have 4 NVIDIA Tesla M60 GPUs. The Tesla M60 is a very powerfull GPU as it is made out of two physical NVIDIA Maxwell GPUs with combined 16GB of memory. Many aplications perceive the card as two separate GPUs, essentially having 8 GPUs per node.

Your home folder is located on a shared NFS, which consists of SSD cached HDD disks. The default limits are 100GB per user, but if you need more feel free to contact support.

All the nodes and the filesystem are interconnected with a 2x25GbE connection.

SLURM basics

Before we dive in into SLURM, it would be good to consult the official quickstart guide: https://slurm.schedmd.com/quickstart.html
We are going to explain and show how to use SLURM and some of our custom tools.

SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used on Arctur-2.

It allocates exclusive access to the resources (compute nodes) to users during a job or reservation so that they can perform they work
It provides a framework for starting, executing and monitoring work
It arbitrates contention for resources by managing a queue of pending work
it permits to schedule jobs for users on the cluster resource

Commonly used SLURM commands

sacct is used to report job or job step accounting information about active or completed jobs.

salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.

srun is used to run a parallel job, it will also first create a resource allocation if necessary.

There are two types of jobs:

interactive: you get a shell on the first reserve node
passive: classical batch job where the script passed as argument to sbatch is executed

We will now see the basic commands of Slurm.

Connect to Arctur2. You can request resources in interactive mode like this:

 $ srun --pty bash

You should now be directly connected to the node you reserved with an interactive shell. Keep in mind that only you have access to the node, and it will be billed as you are running a job. Now exit the reservation:

  $ exit  # or CTRL-D

When you run exit, you are disconnected and your reservation is terminated (billing stops).

Currently, there are not time limits enfored on the reservetions or jobs.

To run a passive job, use srun or sbatch.

One example is the following:

  $ srun -N2

If you use the command 'hostname' (which prints the hostname of the host it is running on) on two nodes, you should see which nodes were allocated to you, this should be a very short job.

Be sure to check out all optional arguments srun can take by typing in 'man srun' or by looking at the official documentation on https://slurm.schedmd.com/srun.html

Using srun like this will give the job output in you terminal session, and you can't really do anything else in that session until the job is done. A better approach for submitting jobs is to use sbatch.
The command sbatch takes a batch script as an argument and submits the job. In the script you specify all options such as the partition you want resources from, the number of nodes and similar.

An example of a simple batch script which runs a command (for example the command 'hostname') on 2 compute nodes is this:

#!/bin/bash -l
#SBATCH --account=EXAMPLE
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1 #can be up to 24
#SBATCH --time=00:20:00
#SBATCH --job-name=my_job
srun

Using your favourite text editor, save this as myjob.sh and use sbatch to run it:

  $ sbatch myjob.sh

You will get a jobID back from sbatch, which you can use to control your job (we cover this later). Unless specified otherwise, the output sill be stored in a text file in the same folder in which the script is.

Running jobs on different partitions

Just define the partition name in the appropriate place in the job submission script like this:

#SBATCH --partition=gpu

As shown before, available partitions are: compute and gpu.

Job management

To check the state of the cluster (idle and allocated nodes) run:

  $ sinfo

This is useful to see the state of the resources, and how many are available to you immediately. All the idle nodes are ready for use. If you need more nodes than currently available(if some other jobs are running in the system), just submit your job and it will wait in queue until requested resources are available.

Sometimes, we will run our internal low priority jobs on the cluster too. They will run in a low priority queue, and will be suspended when you start your jobs. Unfortunately, with ' sinfo' you won't be able to determine how many nodes run low priority jobs. For that, we have developed another tool called 'savail'. Try it and check if there are any nodes running low priority jobs:

  $ savail

You can check the status of your(nad only your) running jobs using squeue command:

  $ squeue

Then you can delete your job by running the command:

  $ scancel JOBID

You can see your system-level utilization (memory, I/O, energy) of a *running* job using:

  $ sstat

In all remaining examples of reservation in this section, remember to delete the reserved jobs afterwards (using scancel or CTRL-C)

Pausing, resuming jobs

To stop a waiting job from being scheduled and later to allow it to be scheduled:

  $ scontrol hold 
  $ scontrol release

To pause a running job and then resume it:

  $ scontrol suspend 
  $ scontrol resume

For obvious reasons non-root users have only a subset of all SLURM commands available for them to use (most of them will work fine, but display only data for the user who runs it).