1 Commands usually used in HiPerGator

  • squeue -A [accoundname] Show jobs of an account
  • sbatch *.slurm Submit a job
  • dos2unix *.slurm Convert windows slurm to unix slurm
  • showAssoc [username] Show the affiliation of the username
  • showQos [QOSname]
  • slurmInfo -g [groupname] Show the information of the group

2 HPC Slurm

2.1 Parallel Categories

Applications can be paralleled in different ways, knowing which method was used for your application is critical.

Common categories of parallelization:

  • OpenMP, Threaded, Pthreads
    • All cores on one sever, shared memory
  • MPI (Message Passing Interface)
    • Can use multiple servers (cross-machine)

2.2 Threaded Jobs

For threaded applications, all cores need to be on a single node.

Cited from HiPerGator Doc: Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R ‘parallel’ module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

```bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

```

This would request 1 node (physical computer in HiPerGator) with 1 task and 8 cores for that task—so 8 cores all together for the application to use.


2.3 Slurm Options

  • --nodes If you are not using MPI, just set it to 1. One node represents a single real computer in the HPC: A lot of times jobs will be submitted to a single node. For example, in 2018 year, HiPerGator added 32 Dell C6420 Nodes (Computers) which has 2 sockets, 32 Cores and 192GB RAM. It means one node will have 2 * 32 = 64 cores.

    • socket is a hardware on the motherboard which accepts the CPU for its mounting on the motherboard. The Socket thereby connects the CPU with the motherboard circuitry. One socket accepts exactly one CPU so that socket is referring to a physical CPU.
  • --ntasks If you are not using MPI, just set it to 1. Specifies how many instances of your command are executed. See here for details. “For a common cluster setup and if you start your command with”srun" this corresponds to the number of MPI ranks. In contrast the option “–cpus-per-task” specify how many CPUs each task can use."

  • --cpus-per-task USE THIS FOR SPECIFYING THE NUMBER of CORES in ONE CLUSETER! generally this refers to core or thread do you want to use in task. And this is the relevant option when using things like OpenMP or functions that allow creating cluster objects in R (e.g. makeCluster()).

  • --partition A partition represents a group of nodes in HPC. Generally large nodes may have multiple partitions, meaning that nodes may be grouped in various ways. For example, nodes belonging to a single group of users may be in a single partition, nodes dedicated to work with large data may be in another partition. Usually, partitions are associated with account privileges, so users may need to specify which account are they using when telling Slurm what partition they plan to use.

  • --account Accounts may be associated with partitions. Accounts can have privileges to use a partition or set of nodes. Often, users need to specify the account when submitting jobs to a particular partition.

  • --array Slurm supports job arrays. A job array is in simple terms a job that is repeated multiple times by Slurm, this is, replicates a single job as requested per the user. In the case of R, when using this option, a single R script is spanned in multiple jobs, so the user can take advantage of this and parallelize jobs across multiple nodes. Besides from the fact that jobs within a Job Array may be spanned across multiple nodes, each job in that array has a unique ID that is available to the user via environment variables, in particular SLURM_ARRAY_TASK_ID.

    • Within R, and hence the Rscript submitted to Slurm, users can access this environment variable with Sys.getenv(“SLURM_ARRAY_TASK_ID”). Some of the functionalities of slurmR rely on Job Arrays. More information on Job Arrays can be found here.

    • More information regarding CPUs in Slurm can be found here. Information regarding how Slurm counts CPUs/cores/threads can be found here.

LS0tDQp0aXRsZTogIkhpUGVyR2F0b3IgQ29tbWFuZHMiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6DQogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMNCiAgICB0b2M6IHllcw0KICAgIHRvY19mbG9hdDogeWVzDQogICAgdGhlbWU6IHVuaXRlZA0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogeWVzDQogICAgZGZfcHJpbnQ6IHBhZ2VkDQotLS0NCg0KIyBDb21tYW5kcyB1c3VhbGx5IHVzZWQgaW4gSGlQZXJHYXRvcg0KDQotIGBzcXVldWUgLUEgW2FjY291bmRuYW1lXWAgU2hvdyBqb2JzIG9mIGFuIGFjY291bnQNCi0gYHNiYXRjaCAqLnNsdXJtYCBTdWJtaXQgYSBqb2INCi0gYGRvczJ1bml4ICouc2x1cm1gIENvbnZlcnQgd2luZG93cyBzbHVybSB0byB1bml4IHNsdXJtDQotIGBzaG93QXNzb2MgW3VzZXJuYW1lXWAgU2hvdyB0aGUgYWZmaWxpYXRpb24gb2YgdGhlIHVzZXJuYW1lDQotIGBzaG93UW9zIFtRT1NuYW1lXWANCi0gYHNsdXJtSW5mbyAtZyBbZ3JvdXBuYW1lXWAgU2hvdyB0aGUgaW5mb3JtYXRpb24gb2YgdGhlIGdyb3VwDQoNCioqKiANCg0KIyBIUEMgU2x1cm0NCg0KIyMgUGFyYWxsZWwgQ2F0ZWdvcmllcw0KDQpBcHBsaWNhdGlvbnMgY2FuIGJlIHBhcmFsbGVsZWQgaW4gZGlmZmVyZW50IHdheXMsIGtub3dpbmcgd2hpY2ggbWV0aG9kIHdhcyB1c2VkIGZvciB5b3VyDQphcHBsaWNhdGlvbiBpcyBjcml0aWNhbC4NCg0KQ29tbW9uIGNhdGVnb3JpZXMgb2YgcGFyYWxsZWxpemF0aW9uOg0KDQotIE9wZW5NUCwgVGhyZWFkZWQsIFB0aHJlYWRzDQogIC0gQWxsIGNvcmVzIG9uIG9uZSBzZXZlciwgc2hhcmVkIG1lbW9yeQ0KICANCi0gTVBJIChNZXNzYWdlIFBhc3NpbmcgSW50ZXJmYWNlKQ0KICAtIENhbiB1c2UgbXVsdGlwbGUgc2VydmVycyAoY3Jvc3MtbWFjaGluZSkNCiAgDQojIyBUaHJlYWRlZCBKb2JzDQoNCkZvciB0aHJlYWRlZCBhcHBsaWNhdGlvbnMsIGFsbCBjb3JlcyBuZWVkIHRvIGJlIG9uIGEgc2luZ2xlIG5vZGUuDQoNCj5bQ2l0ZWQgZnJvbSBIaVBlckdhdG9yIERvY10oaHR0cHM6Ly9oZWxwLnJjLnVmbC5lZHUvZG9jL1IpOiBQYXJhbGxlbCB0aHJlYWRzIGluIGFuIFIgam9iIHdpbGwgYmUgYm91bmQgdG8gdGhlIHNhbWUgQ1BVIGNvcmUgZXZlbiBpZiBtdWx0aXBsZSBudGFza3MgYXJlIHNwZWNpZmllZCBpbiB0aGUgam9iIHNjcmlwdC4gVXNlIGNwdXMtcGVyLXRhc2sgdG8gdXNlIFIgJ3BhcmFsbGVsJyBtb2R1bGUgY29ycmVjdGx5LiBGb3IgZXhhbXBsZSwgZm9yIGFuIDgtdGhyZWFkIHBhcmFsbGVsIGpvYiB1c2UgdGhlIGZvbGxvd2luZyByZXNvdXJjZSByZXF1ZXN0IGluIHlvdXIgam9iIHNjcmlwdDoNCg0KYGBge2Jhc2h9DQojU0JBVENIIC0tbm9kZXM9MQ0KI1NCQVRDSCAtLW50YXNrcz0xDQojU0JBVENIIC0tY3B1cy1wZXItdGFzaz04DQpgYGANClRoaXMgd291bGQgcmVxdWVzdCAxIG5vZGUgKHBoeXNpY2FsIGNvbXB1dGVyIGluIEhpUGVyR2F0b3IpIHdpdGggMSB0YXNrIGFuZCA4IGNvcmVzIGZvciB0aGF0IHRhc2vigJRzbyA4IGNvcmVzIGFsbCB0b2dldGhlciBmb3IgdGhlIGFwcGxpY2F0aW9uIHRvIHVzZS4NCiAgDQoqKioNCg0KDQojIyBTbHVybSBPcHRpb25zDQoNCi0gYC0tbm9kZXNgICoqSWYgeW91IGFyZSBub3QgdXNpbmcgTVBJLCBqdXN0IHNldCBpdCB0byAxLioqIE9uZSBub2RlIHJlcHJlc2VudHMgYSBzaW5nbGUgcmVhbCBjb21wdXRlciBpbiB0aGUgSFBDOiBBIGxvdCBvZiB0aW1lcyBqb2JzIHdpbGwgYmUgc3VibWl0dGVkIHRvIGEgc2luZ2xlIG5vZGUuIEZvciBleGFtcGxlLCBpbiAyMDE4IHllYXIsIEhpUGVyR2F0b3IgYWRkZWQgMzIgRGVsbCBDNjQyMCBOb2RlcyAoQ29tcHV0ZXJzKSB3aGljaCBoYXMgMiBzb2NrZXRzLCAzMiBDb3JlcyBhbmQgMTkyR0IgUkFNLiBJdCBtZWFucyBvbmUgbm9kZSB3aWxsIGhhdmUgMiAqIDMyID0gNjQgY29yZXMuIA0KDQogIC0gYHNvY2tldGAgaXMgYSBoYXJkd2FyZSBvbiB0aGUgbW90aGVyYm9hcmQgd2hpY2ggYWNjZXB0cyB0aGUgQ1BVIGZvciBpdHMgbW91bnRpbmcgb24gdGhlIG1vdGhlcmJvYXJkLiBUaGUgU29ja2V0IHRoZXJlYnkgY29ubmVjdHMgdGhlIENQVSB3aXRoIHRoZSBtb3RoZXJib2FyZCBjaXJjdWl0cnkuIE9uZSBzb2NrZXQgYWNjZXB0cyBleGFjdGx5IG9uZSBDUFUgc28gdGhhdCBzb2NrZXQgaXMgcmVmZXJyaW5nIHRvIGEgcGh5c2ljYWwgQ1BVLg0KICANCg0KDQotIGAtLW50YXNrc2AgKipJZiB5b3UgYXJlIG5vdCB1c2luZyBNUEksIGp1c3Qgc2V0IGl0IHRvIDEuKiogU3BlY2lmaWVzIGhvdyBtYW55IGluc3RhbmNlcyBvZiB5b3VyIGNvbW1hbmQgYXJlIGV4ZWN1dGVkLiBTZWUgW2hlcmVdKGh0dHBzOi8vc3RhY2tvdmVyZmxvdy5jb20vcXVlc3Rpb25zLzM5MTg2Njk4L3doYXQtZG9lcy10aGUtbnRhc2tzLW9yLW4tdGFza3MtZG9lcy1pbi1zbHVybSkgZm9yIGRldGFpbHMuICJGb3IgYSBjb21tb24gY2x1c3RlciBzZXR1cCBhbmQgaWYgeW91IHN0YXJ0IHlvdXIgY29tbWFuZCB3aXRoICJzcnVuIiB0aGlzIGNvcnJlc3BvbmRzIHRvIHRoZSBudW1iZXIgb2YgTVBJIHJhbmtzLiBJbiBjb250cmFzdCB0aGUgb3B0aW9uICItLWNwdXMtcGVyLXRhc2siIHNwZWNpZnkgaG93IG1hbnkgQ1BVcyBlYWNoIHRhc2sgY2FuIHVzZS4iDQoNCi0gYC0tY3B1cy1wZXItdGFza2AgKipVU0UgVEhJUyBGT1IgU1BFQ0lGWUlORyBUSEUgTlVNQkVSIG9mIENPUkVTIGluIE9ORSBDTFVTRVRFUiEqKiBnZW5lcmFsbHkgdGhpcyByZWZlcnMgdG8gY29yZSBvciB0aHJlYWQgZG8geW91IHdhbnQgdG8gdXNlIGluIHRhc2suIEFuZCB0aGlzIGlzIHRoZSByZWxldmFudCBvcHRpb24gd2hlbiB1c2luZyB0aGluZ3MgbGlrZSBPcGVuTVAgb3IgZnVuY3Rpb25zIHRoYXQgYWxsb3cgY3JlYXRpbmcgY2x1c3RlciBvYmplY3RzIGluIFIgKGUuZy4gYG1ha2VDbHVzdGVyKClgKS4gDQoNCg0KLSBgLS1wYXJ0aXRpb25gIEEgcGFydGl0aW9uIHJlcHJlc2VudHMgYSBncm91cCBvZiBub2RlcyBpbiBIUEMuIEdlbmVyYWxseSBsYXJnZSBub2RlcyBtYXkgaGF2ZSBtdWx0aXBsZSBwYXJ0aXRpb25zLCBtZWFuaW5nIHRoYXQgbm9kZXMgbWF5IGJlIGdyb3VwZWQgaW4gdmFyaW91cyB3YXlzLiBGb3IgZXhhbXBsZSwgbm9kZXMgYmVsb25naW5nIHRvIGEgc2luZ2xlIGdyb3VwIG9mIHVzZXJzIG1heSBiZSBpbiBhIHNpbmdsZSBwYXJ0aXRpb24sIG5vZGVzIGRlZGljYXRlZCB0byB3b3JrIHdpdGggbGFyZ2UgZGF0YSBtYXkgYmUgaW4gYW5vdGhlciBwYXJ0aXRpb24uIFVzdWFsbHksIHBhcnRpdGlvbnMgYXJlIGFzc29jaWF0ZWQgd2l0aCBhY2NvdW50IHByaXZpbGVnZXMsIHNvIHVzZXJzIG1heSBuZWVkIHRvIHNwZWNpZnkgd2hpY2ggYWNjb3VudCBhcmUgdGhleSB1c2luZyB3aGVuIHRlbGxpbmcgU2x1cm0gd2hhdCBwYXJ0aXRpb24gdGhleSBwbGFuIHRvIHVzZS4NCg0KLSBgLS1hY2NvdW50YCBBY2NvdW50cyBtYXkgYmUgYXNzb2NpYXRlZCB3aXRoIHBhcnRpdGlvbnMuIEFjY291bnRzIGNhbiBoYXZlIHByaXZpbGVnZXMgdG8gdXNlIGEgcGFydGl0aW9uIG9yIHNldCBvZiBub2Rlcy4gT2Z0ZW4sIHVzZXJzIG5lZWQgdG8gc3BlY2lmeSB0aGUgYWNjb3VudCB3aGVuIHN1Ym1pdHRpbmcgam9icyB0byBhIHBhcnRpY3VsYXIgcGFydGl0aW9uLg0KDQotIGAtLWFycmF5YCBTbHVybSBzdXBwb3J0cyBqb2IgYXJyYXlzLiBBIGpvYiBhcnJheSBpcyBpbiBzaW1wbGUgdGVybXMgYSBqb2IgdGhhdCBpcyByZXBlYXRlZCBtdWx0aXBsZSB0aW1lcyBieSBTbHVybSwgdGhpcyBpcywgcmVwbGljYXRlcyBhIHNpbmdsZSBqb2IgYXMgcmVxdWVzdGVkIHBlciB0aGUgdXNlci4gSW4gdGhlIGNhc2Ugb2YgUiwgd2hlbiB1c2luZyB0aGlzIG9wdGlvbiwgYSBzaW5nbGUgUiBzY3JpcHQgaXMgc3Bhbm5lZCBpbiBtdWx0aXBsZSBqb2JzLCBzbyB0aGUgdXNlciBjYW4gdGFrZSBhZHZhbnRhZ2Ugb2YgdGhpcyBhbmQgcGFyYWxsZWxpemUgam9icyBhY3Jvc3MgbXVsdGlwbGUgbm9kZXMuIEJlc2lkZXMgZnJvbSB0aGUgZmFjdCB0aGF0IGpvYnMgd2l0aGluIGEgSm9iIEFycmF5IG1heSBiZSBzcGFubmVkIGFjcm9zcyBtdWx0aXBsZSBub2RlcywgZWFjaCBqb2IgaW4gdGhhdCBhcnJheSBoYXMgYSB1bmlxdWUgSUQgdGhhdCBpcyBhdmFpbGFibGUgdG8gdGhlIHVzZXIgdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcywgaW4gcGFydGljdWxhciBgU0xVUk1fQVJSQVlfVEFTS19JRGAuIA0KDQogIC0gV2l0aGluIFIsIGFuZCBoZW5jZSB0aGUgUnNjcmlwdCBzdWJtaXR0ZWQgdG8gU2x1cm0sIHVzZXJzIGNhbiBhY2Nlc3MgdGhpcyBlbnZpcm9ubWVudCB2YXJpYWJsZSB3aXRoIFN5cy5nZXRlbnYoIlNMVVJNX0FSUkFZX1RBU0tfSUQiKS4gU29tZSBvZiB0aGUgZnVuY3Rpb25hbGl0aWVzIG9mIHNsdXJtUiByZWx5IG9uIEpvYiBBcnJheXMuIE1vcmUgaW5mb3JtYXRpb24gb24gSm9iIEFycmF5cyBjYW4gYmUgZm91bmQgW2hlcmVdKGh0dHBzOi8vc2x1cm0uc2NoZWRtZC5jb20vam9iX2FycmF5Lmh0bWwpLg0KDQogIC0gTW9yZSBpbmZvcm1hdGlvbiByZWdhcmRpbmcgQ1BVcyBpbiBTbHVybSBjYW4gYmUgZm91bmQgW2hlcmVdKGh0dHBzOi8vc2x1cm0uc2NoZWRtZC5jb20vY3B1X21hbmFnZW1lbnQuaHRtbCkuIEluZm9ybWF0aW9uIHJlZ2FyZGluZyBob3cgU2x1cm0gY291bnRzIENQVXMvY29yZXMvdGhyZWFkcyBjYW4gYmUgZm91bmQgW2hlcmVdKGh0dHBzOi8vc2x1cm0uc2NoZWRtZC5jb20vZmFxLmh0bWwjY3B1X2NvdW50KS4NCg0KDQoNCiMgUmVmZXJlbmNlDQoNCi0gW3NsdXJtUi93b3JraW5nLXdpdGgtc2x1cm1dKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9zbHVybVIvdmlnbmV0dGVzL3dvcmtpbmctd2l0aC1zbHVybS5odG1sKQ0KLSBodHRwczovL3NjaXdpa2kuZnJlZGh1dGNoLm9yZy9zY2ljb21wdXRpbmcvY29tcHV0ZV9wYXJhbGxlbC8NCg0KLSBbQmlvbWVkaWNhbCBEYXRhIFNjaWVuY2UgV2lraTogUGFyYWxsZWwgQ29tcHV0aW5nIG9uIFNsdXJtIENsdXN0ZXJzXShodHRwczovL3NjaXdpa2kuZnJlZGh1dGNoLm9yZy9zY2ljb21wdXRpbmcvY29tcHV0ZV9wYXJhbGxlbC8pDQoNCi0gW1VGIEhQQ10oaHR0cHM6Ly90cmFpbmluZy5pdC51ZmwuZWR1L21lZGlhL3RyYWluaW5naXR1ZmxlZHUvZG9jdW1lbnRzL3Jlc2VhcmNoLWNvbXB1dGluZy9TbHVybS1NUEktam9icy5wZGYpDQoNCg==