How to Build a Submission Script
Step-by-step guide
Jobs are submitted through a job script, which is a shell script (usually written in bash). Since it is a shell script, it must begins with a "shebang":
#!/bin/bash |
---|
#!/bin/bash |
--- |
This is followed by a preamble describing the resource requests the job is making. Each request begins with #SBATCH followed by an option.
-------------------------------------------------------------------------------------------------------------------------------------------
#SBATCH --time=00-02:00:00 |
---|
#SBATCH --time=00-02:00:00 |
--- |
This option gives the wall-clock time. It limits the running time. It is written in a format
#SBATCH --time=day-hour:minute:second |
---|
#SBATCH --time=day-hour:minute:second |
--- |
-------------------------------------------------------------------------------------------------------------------------------------------
#SBATCH --partition=general |
---|
#SBATCH --partition=general |
--- |
This option stipulates the partition requested. Here, we use the partition named general. What is a partition? In SLURM, multiple nodes can be grouped into partitions which are sets of
nodes with associated limits for wall-clock time, job size, etc. To see status and limits of different partitions, use the sinfo command:
From here, you can see TIMELIMIT of different partitions. If your wall-clock time is bigger than the TIMELIMIT of the partition you choose, your codes will not run.
-------------------------------------------------------------------------------------------------------------------------------------------
#SBATCH --ntasks=8 |
---|
#SBATCH --ntasks=8 |
--- |
This option stipulates the requests number of CPU. Here we use 8 CPUs. This option is important if you run a parallel code.
-------------------------------------------------------------------------------------------------------------------------------------------
These are the basic options. There are numerous options available. But for a code to run, we may not need that much. But they may help if you have other needs.
See below to check if you need others.
#SBATCH --nodes= #number of nodes |
---|
#SBATCH --nodes= #number of nodes |
--- |
You may not need this since when you stipulate number of CPUs, SLURM system will automatically assign the needed nodes for you.
#SBATCH --mem= #total memory per node in megabytes |
---|
#SBATCH --mem= #total memory per node in megabytes |
--- |
System error file:
#SBATCH -e slurm%j.err |
---|
#SBATCH -e slurm%j.err |
--- |
System output file:
#SBATCH -o slurm%j.out |
---|
#SBATCH -o slurm%j.out |
--- |
Another thing we need to mention is that most SLURM options have two forms, a short (single-letter) form that is preceded by a single hyphen and followed by a space,
and a longer form preceded by a double hyphen and followed by an equal sign. For example, we can either use
#SBATCH --partition=general |
---|
#SBATCH --partition=general |
--- |
or
#SBATCH -p partition |
---|
#SBATCH -p partition |
--- |
. And we can either use
#SBATCH --ntasks= |
---|
#SBATCH --ntasks= |
--- |
or
#SBATCH -n |
---|
#SBATCH -n |
--- |
.
Here is a sample script called jobscript.sh for submitting an MPI job.
#!/bin/bash #SBATCH -t 00-02:00:00 #SBATCH -p general #SBATCH -n 120 RUN=/path_of_directory/ mpirun ./YourCode |
---|
#!/bin/bash #SBATCH -t 00-02:00:00 #SBATCH -p general #SBATCH -n 120 RUN=/path_of_directory/ mpirun ./YourCode |
--- |
Note that the RUN=/path_of_directory/ stipulates where your executable file locates.
submit a job
Job scripts are submitted with the sbatch command, e.g.:
[user@acres-login0~]$ sbatch jobscript.sh |
---|
[user@acres-login0~]$ sbatch jobscript.sh |
--- |
The job identification number is returned when you submit the job, e.g.:
[user@acres-login0~]$ sbatch jobscript.sh Submitted batch job 831 |
---|
[user@acres-login0~]$ sbatch jobscript.sh Submitted batch job 831 |
--- |
display job status
The squeue command is used to obtain status information about jobs submitted to all queues, like:
The TIME field indicates the elapsed walltime. JOBID is the job identification number which is returned when you submit the job.
The ST field lists the state of the job. Commonly listed states include:
PD: Pending, job is waiting for idle CPUs
R: Running, job has the allocated CPUs and is running
S: Suspended, job has the allocated resources, but execution has been suspended
CG: Completing, job nearly completes and cannot be terminated probably because of an I/O operation
cancel a job
SLURM provides the scancel command for deleting jobs from the system using the job identification number:
[user@acres-login0~]$ scancel 831 |
---|
[user@acres-login0~]$ scancel 831 |
--- |
Related articles
Error rendering macro 'contentbylabel'
parameters should not be empty