Skip to main content

How to Build a Submission Script

Step-by-step guide

Jobs are submitted through a job script, which is a shell script (usually written in bash). Since it is a shell script, it must begins with a "shebang":

#!/bin/bash
#!/bin/bash
---

This is followed by a preamble describing the resource requests the job is making. Each request begins with #SBATCH followed by an option.

-------------------------------------------------------------------------------------------------------------------------------------------

#SBATCH --time=00-02:00:00
#SBATCH --time=00-02:00:00
---

This option gives the wall-clock time. It limits the running time. It is written in a format 

#SBATCH --time=day-hour:minute:second
#SBATCH --time=day-hour:minute:second
---

-------------------------------------------------------------------------------------------------------------------------------------------

#SBATCH --partition=general
#SBATCH --partition=general
---

This option stipulates the partition requested. Here, we use the partition named general. What is a partition? In SLURM, multiple nodes can be grouped into partitions which are sets of 

nodes with associated limits for wall-clock time, job size, etc. To see status and limits of different partitions, use the sinfo command:

From here, you can see TIMELIMIT of different partitions. If your wall-clock time is bigger than the TIMELIMIT of the partition you choose, your codes will not run.

-------------------------------------------------------------------------------------------------------------------------------------------

#SBATCH --ntasks=8
#SBATCH --ntasks=8
---

This option stipulates the requests number of CPU. Here we use 8 CPUs. This option is important if you run a parallel code. 

-------------------------------------------------------------------------------------------------------------------------------------------

These are the basic options. There are numerous options available. But for a code to run, we may not need that much. But they may help if you have other needs. 

See below to check if you need others.

#SBATCH --nodes= #number of nodes
#SBATCH --nodes= #number of nodes
---

You may not need this since when you stipulate number of CPUs, SLURM system will automatically assign the needed nodes for you.

#SBATCH --mem= #total memory per node in megabytes
#SBATCH --mem= #total memory per node in megabytes
---

System error file:

#SBATCH -e slurm%j.err
#SBATCH -e slurm%j.err
---

System output file:

#SBATCH -o slurm%j.out
#SBATCH -o slurm%j.out
---

Another thing we need to mention is that most SLURM options have two forms, a short (single-letter) form that is preceded by a single hyphen and followed by a space,

and a longer form preceded by a double hyphen and followed by an equal sign. For example, we can either use

#SBATCH --partition=general
#SBATCH --partition=general
---

or 

#SBATCH -p partition
#SBATCH -p partition
---

. And we can either use

#SBATCH --ntasks=
#SBATCH --ntasks=
---

or

#SBATCH -n
#SBATCH -n
---

.

Here is a sample script called jobscript.sh for submitting an MPI job.

#!/bin/bash #SBATCH -t 00-02:00:00 #SBATCH -p general #SBATCH -n 120 RUN=/path_of_directory/ mpirun ./YourCode
#!/bin/bash #SBATCH -t 00-02:00:00 #SBATCH -p general #SBATCH -n 120 RUN=/path_of_directory/ mpirun ./YourCode
---

Note that the RUN=/path_of_directory/ stipulates where your executable file locates.

**submit a job**

Job scripts are submitted with the sbatch command, e.g.:

[user@acres-login0~]$ sbatch jobscript.sh
[user@acres-login0~]$ sbatch jobscript.sh
---

The job identification number is returned when you submit the job, e.g.:

[user@acres-login0~]$ sbatch jobscript.sh

Submitted batch job 831
[user@acres-login0~]$ sbatch jobscript.sh

Submitted batch job 831
---

**display job status*

The squeue command is used to obtain status information about jobs submitted to all queues, like:

The TIME field indicates the elapsed walltime. JOBID is the job identification number which is returned when you submit the job.

The ST field lists the state of the job. Commonly listed states include:

PD: Pending, job is waiting for idle CPUs

R: Running, job has the allocated CPUs and is running

S: Suspended, job has the allocated resources, but execution has been suspended

CG: Completing, job nearly completes and cannot be terminated probably because of an I/O operation

**cancel a job**

SLURM provides the scancel command for deleting jobs from the system using the job identification number:

[user@acres-login0~]$ scancel 831
[user@acres-login0~]$ scancel 831
---

Error rendering macro 'contentbylabel'

parameters should not be empty