On personal computer, deploying a job on multiple CPU’s usually requires extra coding work with parallel packages, e.g., {foreach}
, {parallel}
. However, in HPC system, doing parallel can be extensively simplified by setting array jobs in a .slurm
script. The latter can additional bring bonus that break the MPI limitations.
1. Parallel by Array Jobs
#!/bin/sh
#SBATCH --job-name=JOBNAME
#SBATCH --array=1-10
#SBATCH --account=ACCOUNT
#SBATCH --qos=QOS
#SBATCH --mail-type=ALL
#SBATCH --mail-user=USER@ufl.edu
#SBATCH --ntasks=1 # Run on a single machine (node)
#SBATCH --cpus-per-task 16 # Run on 16 CPUs
#SBATCH --mem=32gb # Memory limit
#SBATCH --time=48:00:00 # RunTimeLimit: hrs:min:sec
#SBATCH --output=JOBNAME.out # Output and error log
pwd; hostname; date
module load R
echo "Running"
R CMD BATCH --vanilla JOBNAME.R impute-by-folds.$SLURM_ARRAY_TASK_ID.Rout
date
2. Parallel by {foreach}
Pros
- Can be programmed and delivered with a package and not need to manually set up.
Cons
The total number of the working CPU’s are limited because on one node (computer), the number of the CPU’s are up to 128. The MPI technique should be required if additional CPU’s are requested.
It seems that the nested
for
loop in{foreach}
(with%:%
) can only parallelize one loop and faces a question: which loop to parallelize?. Also%:%
will output aforeach
object to the next loop, which means that between operations are not allowed for these twofor
loops.