To create a batch job, follow the example in the demo section above. First, create a script which starts up the job. It will be a plain shell script file containing commands to setup and startup your job. The following environment variables will be available to the script (and the job):
A script will look something like:
#!/bin/sh
# This finds out the number of nodes we have
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo "nodes ($NP cpu total):"
sort $PBS_NODEFILE | uniq
echo
cd $PBS_O_WORKDIR
# Make the MPI call
lamboot $PBS_NODEFILE
mpirun -np 8 ./yourMPIProgram
lamhalt
where yourMPIProgram is the MPI executable to be run via PBS.
After creating the script in (say) myscript.pbs, use the following command to submit the job to PBS:
qsub -lnodes=n myscript.pbs
where n is the number of nodes required. The default request is 1 node, if -lnodes=n is omitted.
If you need to run in interactive mode, use the following command:
qsub -lnodes=x -I
You will be presented with a new shell in one of the compute nodes. Do a
cat $PBS_NODEFILE
and you can find out which nodes have been assigned to your interactive session. You can use this PBS_NODEFILE as your machinefile in MPI runs
Here is a list of useful arguments to qsub
You can use qstat to check the status of a job.
A call to qstat without any arguments give you something like:
Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 424.frontend-0 W.3.12.bwci reyes 00:01:06 R short 425.frontend-0 W.3.13.bwci reyes 00:01:09 R short 426.frontend-0 W.3.14.bwci reyes 00:01:11 R short 427.frontend-0 W.3.15.bwci reyes 0 Q short 428.frontend-0 W.3.16.bwci reyes 0 Q short 438.frontend-0 W.3.26.bwci reyes 0 Q short 439.frontend-0 W.3.normci reyes 0 H short
The first column gives the job identifier. 424.frontend-0 means that the job is number 424 on frontend-0. The ".frontend-0" part can usually be ignored.
The second column identifies the name of the job, as given by the user through the -N argument passed to qsub. If -N is not given, PBS will assign the name of the script submitted as the name of the job.
The third column is the username of the owner of the job.
The fourth column gives how long (in CPU time) has the job been running.
The fifth column gives the state of the job, which can be one of the following:
The sixth column gives the queue in which the job resides.
A call to qstat with arguments -f job identifier gives you the attributes for the job, which is very useful for debugging PBS problems
qstat -q gives you the status of all the queues:
server: frontend-0
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
dqueue -- -- -- -- 12 1 -- E R
--- ---
12 1
Deleting a job is simple. First find the job identifier with qstat. Then do: