Resources
Different compilers are available on HPC as modules.(Refer Module Tutorials)
Here is the list of some of them.
GNU Compilers for Serial Code
C Compiler: gcc, C++ Compiler : g++ , Fortran Compiler : gfortran
compiler/gcc/6.5.0/compilervars
compiler/gcc/7.1.0/compilervars
compiler/gcc/7.3.0/compilervars
compiler/gcc/7.4.0/compilervars
compiler/gcc/9.1.0
Compilers for Parallel Code (OpenMPI/MPICH)
MPI C Compiler: mpicc, MPI C++ Compiler : mpic++, MPI Fortran Compiler : mpifort
compiler/gcc/6.5/openmpi/4.0.2
compiler/gcc/9.1/openmpi/4.0.2
compiler/gcc/9.1/mpich/3.3.1
Intel Compilers (Recommended)
Compilers for Serial Code :
C Compiler: icc, C++ Compiler : icpc, Fortran Compiler : ifort
Compilers for Parallel Code :
MPI C Compiler: mpiicc, MPI C++ Compiler : mpiicpc, MPI Fortran Compiler : mpiifort
suite/intel/parallelStudio/2015
suite/intel/parallelStudio/2018
suite/intel/parallelStudio/2019
suite/intel/parallelStudio/2020
Cuda Compiler (nvcc)
compiler/cuda/11.0/compilervars
compiler/cuda/10.1/compilervars
compiler/cuda/10.2/compilervars
compiler/cuda/10.0/compilervars
compiler/cuda/9.2/compilervars
compiler/cuda/8.0/compilervars
compiler/cuda/7.5/compilervars
compiler/cuda/7.0/compilervars
compiler/cuda/6.5/compilervars
compiler/cuda/6.0/compilervars
NVIDIA HPC SDK
suite/nvidia-hpc-sdk/20.11/cuda11.0
suite/nvidia-hpc-sdk/20.7/cuda11.0
suite/nvidia-hpc-sdk/20.9/cuda11.0
suite/nvidia-hpc-sdk/21.7/cuda11.0
C, C++, Fortran compiler names in case of Intel MPI are mpiicc, mpiicpc, mpiifort & in case of OpenMPI/MPICH/MVAPICH are mpicc, mpic++, mpifort respectively, which are available with their respective modules.
To compile code using Intel MPI (Recommended)
Load any intel parallel studio module.
Ex.
module purge
module load suite/intel/parallelStudio/2018
General Compilation Command :
[compiler name] [name of file with MPI code] -o [name of the executable to generate]
e.g. MPI C code compilation :
mpiicc matrix_multi.c -o intelmpi.exec
Successful compilation will create executable file with name :
intelmpi.exec
Sample MPI C program (Matrix Multiplication) is available at :
/home/apps/skeleton/examples/MPI/matrix_multi.c
(Compilation Steps are same in case of C++ & Fortran Code, simply change the compiler name, input file etc.)
To run the above generated executable on Multiple Cores via PBS job load the respective module & run command as :
mpirun -np $PBS_NTASKS [path to the executable e.g. intelmpi.exec]
To compile code using OpenMPI/MPICH/MVAPICH follow the same steps mentioned above using respective modules & their respective compiler commands.
Sample Batch Job Submission Scripts are also available at :
/home/apps/skeleton/examples/MPI
Refer PBS Tutorials
ssh to each allocated node of your job. (Use qstat -n to get the list of allocated nodes.)
Use top -u $USER command (Exit using Q), to check whether job processes are running on requested no. of cpu cores & their cpu utilization on particular node
If you requested for gpu nodes for your job, in addition to above, you can check the gpu utilization on allocated node using nvidia-smi command, which will show you gpu utilization as well as processes running on respective gpu cards.
Use nvidia-smi-l 1 will print output for interval of 1s. (Exit using Ctrl + C)
To avoid the out of memory issue , basic step can be submit job with #PBS -l place=scatter option, submit job with select value more than ncpus value & with centos=skylake option.
(Availability of memory on skylake nodes is more as compared to haswell , this may increase waiting time for your job as skylake nodes are less in no. as compared to haswell)
e.g., For high memory intensive job, 120 cores can be requested as
#PBS -l select=12:ncpus=10:mpiprocs=10:centos=skylake
#PBS -l place=scatter
More advanced step can be, use of memory monitoring script i.e., check the memory requirement of your job on each node by using memory monitoring script inside your batch script, get a clear idea about memory requirement and use mem option in select statement.
Usage :
### Put this before setting up environment for the job i.e., before module load commands ####
export NODES=`cat ${PBS_NODEFILE}|sort|uniq|tr '\n' ','|sed 's:,$::g'`
echo ${NODES}
pdsh -w ${NODES} "/home/apps/mem_monitor_script/monitorStats.sh ${PBS_O_WORKDIR}/MEM_MONITOR_${PBS_JOBID}" 2> /dev/null &
export mem_check_pid=$!
#####
## Execution commands here e.g. mpirun -np ...... etc ##
## Put this at the end of the script ##
kill -9 $mem_check_pid
##################################
Output :
The script will create a folder with name MEM_MONITOR_ containing files for each node having memory metrics for CPU RAM, also detect if the gpu is allocated to the job or not, if found will also create a file containing gpu metrics.
------------
After getting the exact idea about the memory , you can use mem option in the select statement.
e.g.,
Overall memory requirement for your job is 120 gb, select statement can be #PBS -l select=12:ncpus=10:mpiprocs=10:mem=10gb i.e 12 * 10gb = 120gb
To avoid unreliable connection specifically in case of interactive jobs, screen command is one of the possible solution.Using screen command you can maintain a virtual session corresponding to the present working terminal assumed that you are running the command inside the screen session. The screen session will be available even after you get disconnected from the current session because of some reason.
How to Use :
Note : screen sessions are specific to particular node, screen can be reattached/detached/removed only from the node from where it was created. Hence it is required to keep a note of login node from where it was created.
From any login node on HPC.
1. Create a screen session
screen -S [session name you want to give]
2. Execute command inside screen session.
Example : Submit your interactive job
qsub -I -P [project code] -lselect=1:ncpus=4:mpiprocs=4 -lwalltime=01:00:00
Execute the operations which you want to perform.
The session will be available even after the connection is lost.
You can also detached from the session using following command inside the screen session.
Ctrl A + Ctrl D OR screen -d
Note : Executing quit/exit command inside the screen session will terminate the session.
Please make sure that you are not creating multiple screens inside the screen session.
3. To reattach particular screen session use command.
screen -r [screen session name]
4. If there are multiple screen sessions running on the particular node. You can get the list using command :
screen -ls
5. To close/terminate a particular screen session use command.
screen -XS [screen session name] quit
Note : screen command execution needs practice, its users responsibility to make sure that all the unwanted screen sessions are closed.
The above mentioned examples are basic use case of screen command, to know more check with command
man screen OR screen --help
Transfer of data from outside IIT Delhi to HPC :
For small data size (* less than 100MB)
- Use the Proxy Server to access the internet. Please follow the steps given here.
For large data size but one time download (* 100MB to 1TB)
- Use the IIT Delhi Download server. Steps are given here.
- Use the Proxy Server to access the internet. Please follow the steps given here.
For large data size but multiple time downloads (* greater than 1TB)
- Use the Proxy Server to access the internet (Not Recommended). Please follow the steps given here.
- Use the IIT Delhi Download server. Steps are given here.
- Use of tunneling (Highly Recommended). Steps are given here.
* Approximate