Last modified: September 19 2017.

hpchelp@iitd.ac.in

Frequently Asked Questions

  • How to Cite/acknowledge

    Acknowledgement text: The authors thank IIT Delhi HPC facility for computational resources.
  • What are the best practices for users?

    Best practices are listed here.
  • Access of HPC Cluster outside IITD

    For IITD Users
    For Non IITD users: Please have a look at the Usage Charges. Proposal form is available on request.
  • Using "LOW" queue for Job submission

    At the time of job submission use:
    #PBS -q low

    Within your job submission script use :
    qsub -q low <job script>

    If your job is in queued (Q) state then use :
    qmove low <job id>
  • How to check disk quota

    Currently, users are limited to 30GB of space on /home and 200TB on /scratch. Users can check their quota usage :
    lfs quota -hu $USER /home
    lfs quota -hu $USER /scratch
    Users can request more diskspace for their home directories via their supervisors. Current upper limit is 900GB.
  • How to check the older files

    The lists all regular files in a user directory more than 30 days old.
     lfs find $HOME -mtime +30 -type f -print | xargs du -ch 
     lfs find $SCRATCH -mtime +30 -type f -print | xargs du -ch 
  • How To Set Up SSH Keys

    Step One

    Create the RSA Key Pair : The first step is to create the key pair on the client machine. In the case of HPC cluster, it will be any one of the login nodes.
    ssh-keygen -t rsa

    Step Two

    Store the Keys and Passphrase Once you have entered the keygen command, you will get a few more questions:
    Enter file in which to save the key (/home/demo/.ssh/id_rsa): You can press enter here, saving the file to the user home (in this case, my example user is called demo).

    Enter passphrase (empty for no passphrase):
    The entire key generation process looks like this:
    ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/demo/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/demo/.ssh/id_rsa.
    Your public key has been saved in /home/demo/.ssh/id_rsa.pub.
    The key fingerprint is:
    4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 demo@a
    The key's randomart image is:
    +--[ RSA 2048]----+
    |          .oo.   |
    |         .  o.E  |
    |        + .  o   |
    |     . = = .     |
    |      = S = .    |
    |     o + = +     |
    |      . o + o .  |
    |           . o   |
    |                 |
    +-----------------+
    The public key is now located in /home/demo/.ssh/id_rsa.pub The private key (identification) is now located in /home/demo/.ssh/id_rsa

    Step Three

    Copy the Public Key to the server. Once the key pair is generated, it's time to place the public key on the server that we want to use.

    Transfer the generated key
    ssh-copy-id hpc.iitd.ac.in
    and enter your password.
    -- OR --
    Alternatively, you can add the key manually:
    cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys

    Test passwordless login :
    ssh login01
    Password should NOT be required. If password is asked, something is wrong with your setup. Please repeat the steps carefully.
  • Compiling and testing GPU and Xeon Phi programs

    Two login nodes for GPU and Xeon Phi are available. For GPUs, users can login to gpu.hpc.iitd.ac.in and for Xeon Phi mic.hpc.iitd.ac.in. These nodes have two accelerator cards each.
  • Accessing HPC facility using Windows/Linux

    Please see How to access.
  • Environment Modules


    What are Environment Modules.?

    The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. modulefiles may be shared by many users on a system and users may have their own collection to supplement or replace the shared modulefiles. Modules can be loaded and unloaded dynamically and atomically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl and python. Modules are useful in managing different versions of applications. Modules can also be bundled into metamodules that will load an entire suite of different applications. Examples of usage:
    • List of available modules:
      $ module avail
    • Loads specific module:
      $ module load apps/lammps/gpu-mixed
    • Provide a brief description of the module:
      $ module whatis apps/lammps/gpu-mixed
      apps/lammps/gpu-mixed: LAMMPS MIXED PRECISION GPU 9 Dec 2014

  • Gtk-WARNING **: cannot open display
    -OR-
    Error: Can't open display:

    If you are using Linux system, use ssh -X @hpc.iitd.ac.in. Windows users: please see "X11 Forwarding"
  • X11 Forwarding in Windows


    We can run graphical programs on Linux/Solaris machines on IITD HPC remotely and display them on your desktop computer running Windows. We can do this by using running two applications together on your Windows machine: Xming and PuTTY.

    What is Xming?

    Xming is a PC X Window Server. This enables programs being run remotely to be displayed on your desktop. Download and run the installation program from: http://sourceforge.net/projects/xming/


    Navigate to the Files section and download:
     a)Xming setup from the Xming folder
     b)the fonts package installer from the Xming-fonts folder Note:

    1.) By default both programs will be installed into the same location, so don't the worry about over writing files. We cannot work without both packages.
    2.) Once installed, running All Programs > Xming > XLaunch is a good idea to see what the configuration looks like. In most cases, the default options should be just fine.
    3.) Finally run All Programs > Xming > Xming to start the PC X Server. The "X" icon should be visible on the Windows Taskbar. The X Server must be started before setting up a SSH connection to the HPC facility.

    What is PuTTY?


    PuTTY is a free SSH client. Through PuTTY connect to HPC facility. Download the single Windows command file from: http://www.putty.org

    Configuring PuTTY


    Under Session, enter the hostname you want to connect to: hpc.iitd.ac.in on port 22. Make sure the connection type is ssh.
    1. Next, scroll to Connection > SSH > X11. Check the box next to Enable X11 Forwarding. By default the X Display location is empty. You can enter localhost:0. The remote authentication should be set to MIT-Magic-Cookie-1
    2. Finally go back to Session. You can save your session too, and load it each time you want to connect.
    3. Click Open to bring up the terminal and login using your username/password .
  • SCP not functional

    Sometimes scp (or rsync) breaks "suddenly". Here is a list of things to check:
    1. Do you have enough disk space?
    2. Are you copying to the correct path?
    3. Is there any entry in your ~/.bashrc files which is generating messages?
  • Old Nodes

    The old K20 nodes can be accessed by the flag "K20GPU=true". e.g.
    1. CPU job: 2 nodes, 16 cores each from the "cc" project
      qsub -IP cc -l select=2:ncpus=16:K20GPU=true
    2. GPU job: 2 nodes, 16 cores, 2 GPUs each from the "cc" project
      qsub -IP cc -l select=2:ncpus=16:ngpus=2:K20GPU=true
  • Large Jobs

    When submitting jobs spanning multiple nodes, you can be assigned any node - CPU, GPU, Xeon Phi or the old K20 node(s). If you want to explicitly exlcude old nodes from your submission, use the K20GPU=false flag:
    1. CPU job: 2 nodes, 16 cores each from the "cc" project, exclude old nodes
      qsub -IP cc -l select=2:ncpus=16:K20GPU=false
    2. GPU job: 2 nodes, 16 cores, 2 GPUs each from the "cc" project, exclude old nodes
      qsub -IP cc -l select=2:ncpus=16:ngpus=2:K20GPU=false
  • Bad Interpreter

    Issue:
     -bash: /var/spool/PBS/mom_priv/jobs/58524.hn1.hpc.iitd.ac.in.SC: /bin/bash^M: bad interpreter: No such file or directory 
    When submitting jobscript from Windows system to HPC (Linux environment) use dos2unix, the program that converts plain text files in DOS format to UNIX format.
    Example : dos2unix submit.sh submitscript.sh
    
  • Accessing Internet

    By default, HPC users have access to IITD intranet from login nodes. In the following procedure we are trying to get internet access from the login02 node with lynx web browser. The procedure will work on any IITD HPC login node :-

    IITD proxy login page can be accessed via the terminal based lynx web browser. Please set the SSL_CERT_FILE variable to the path of your IITD CA certificate.
    [user1$login02]$ export SSL_CERT_FILE=$HOME/mycerts/CCIITD-CA.crt
    
    
    Access the proxy login URL via lynx or firefox (ssh -X) browser after logging in to the IITD HPC account.
    [user1@login02]$ lynx https://proxy82.iitd.ernet.in/cgi-bin/proxy.cgi
                                                       
    
                                              IIT Delhi Proxy Login
    
    
                                          User ID:  ____________________
    
                                          Password: ____________________
    
                                                    Log on
    
    
    NOTE: The URL varies per user basis. For staff the URL is https://proxy21.iitd.ernet.in/cgi-bin/proxy.cgi

    After successful authentication, you should be able to see the following output on your terminal :-
    
                                              IIT Delhi Proxy Login
    
    
    
                         You are logged in successfully as user1 from xx.xx.xx.x
    
    
    
                                        Date and Time for your last Kerberos
    
                         Password Change   Successful Authenticaton Unsuccessful Authentication
    
                       10-11-2015 10:22:04   18-03-2016 10:52:56       16-03-2016 10:27:34
    
                             *Please change your password at least once in three months*
    
                             Click to continue browsing: http://www.cc.iitd.ernet.in/
    
                                         Check your Proxy Usage/Quota here
    
         For non-browser Applications (Proxy_Name: proxy82.iitd.ernet.in Proxy_IP: 10.10.79.29 Proxy_port: 3128)
    
        	         			 * Click "Log out" to logout:
    
    
                                                     Log out
    
    
                           Please keep this page open and browse from another window/tab
    
    
    Notedown the proxy ip & port ( Proxy_IP is 10.10.79.29 and Proxy_port is 3128) & the login node's hostname (login02).

    From a new terminal , log in to the hpc account & go to the same login node ( login02 ) where lynx is running & set http_proxy , https_proxy environment variable within your terminal as:
    [user1@login02]$ export http_proxy=10.10.79.29:3128
    [user1@login02]$ export https_proxy=10.10.79.29:3128
    
    
    Now you can use commands like wget & git clone to access internet.
  • How to install python packages

    setup internet connectivity (check FAQ entry: Accessing internet) & load python compiler in your environment
    example:
    module load compiler/python/2.7.10/compilervars
    
    now using pip python command specify the directory for package installation as:
    pip install --ignore-installed --install-option="--prefix=${HOME}/MYPYTHONMODULES" package_name
    
    Set the environment variable PYTHONPATH as:
    export PYTHONPATH=${HOME}/MYPYTHONMODULES/lib/python2.7/site-packages:$PYTHONPATH
    
    now you should be able to import & use installed python modules.
    
  • How to check Budget Status

    LC_ALL=en_IN /opt/alloc_db/user_scripts/budget_status -P cc -p 2016.q2 -v
    Where :
    -P is Project Name
    -p is Year with Quarter
  • How to check Project Summary

    LC_ALL=en_IN /opt/alloc_db/user_scripts/project_summary -P cc -p 2016.q2 -s hn1
    Where :
    -P is Project Name
    -p is Year with Quarter
    -s is System
  • Out of memory / segmentation fault

    It is possible that the program runs out of the default memory available. Users are advised to update the "ulimit" using the following commands:
    $ ulimit -s unlimited
    
    If this resolves the issue, the same commands should be added to the ~/.bashrc file.
  • PBS Error: Alloc DB reservation failed

    ERROR: Not Running: PBS Error: Alloc DB reservation failed, holding job
    Resolution: 1. Submit job in low queue    (e.g  qsub -q low scriptname.sh)
     	    2. move your job in low queue (e.g. qmove low jobid)  	
    
  • Non-availability of slots and jobs on hold

    ERROR: Not Running: Either request fewer slots for your application, or make more slots available for use
    Resolution: please mention the "mpiprocs" resource in your job submission script.
    
  • qsub: Error: Insufficient balance

    ERROR: Not Running: qsub:
    Error: insufficient balance
    The project has insufficient funds to run the requested job. Users should check the allocation budget and contact the HPC representative in case the allocation in insufficient. The job may be submitted to the "low" queue. 
    
  • Intel MPI Infiniband Usage

    Intel MPI Library enables you to select a communication fabric at runtime without having to recompile your application.
    By default, it automatically selects the most appropriate fabric based on your software and hardware configuration.
    This means that in most cases you do not have to bother about manually selecting a fabric.
    

    I_MPI_FABRICS

    Fabric/Network

    Network hardware and software used

    shm Shared memory(for intra-node communication only)
    dapl Direct Access Programming Library* (DAPL)-capable network fabrics, such as InfiniBand*
    ofa OpenFabrics Alliance* (OFA)-capable network fabrics, such as InfiniBand* (through OFED* verbs)
    tcp TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*)
    Default Fabric selection is in following order :
    dapl,ofa,tcp,tmi,ofi => ie first INTEL MPI library will check, is the available network "dapl" is appropriate/fast enough to run the code/application, if fails then ofa, if fails then tcp and so on.
    
    To force infiniband network to be used :
    set the following environment variable in your job script or ${HOME}/.bashrc by executing below command :-
    
    For MPI:
    export I_MPI_FALLBACK="0" [ Do not switch to other available network ]
    export I_MPI_FABRICS="ofa"
    or 
    export I_MPI_FABRICS="dapl" ; export I_MPI_DAPL_PROVIDER="ofa-v2-mlx5_0-1u" (if using DAPL and ofa-v2-mlx5_0-1u valid for IITD HPC Cluster).
    
    For OpenMP+MPI :
    if your application is hybrid :
    export I_MPI_FALLBACK="0" [ Do not switch to other available network ]
    export I_MPI_FABRICS="shm:ofa"
    export I_MPI_FABRICS="shm:dapl" (if using DAPL)
    
    For OpenMP:
    export I_MPI_FABRICS="shm:shm"
    To check which fabric is currently used, you can set the I_MPI_DEBUG environment variable to 2:
               mpirun –np n -genv I_MPI_DEBUG=2  your_command/command_path ;  where "n" => number of processes.
    For Ex. : mpirun -np 48 -genv I_MPI_DEBUG=2 myprog 
    You can also specify above variables in your mpirun command :
              mpirun –n n -genv I_MPI_FALLBACK=0 -genv I_MPI_FABRICS="shm:ofa" your_command/command_path
    For Ex. : mpirun –n 48 -genv I_MPI_FALLBACK=0 -genv I_MPI_FABRICS="shm:ofa" myprog
    For more information please visit below link :
    
    https://software.intel.com/en-us/node/535584
  • OpenMPI infiniband flags

    For MPI :
    Tell Open MPI to include *only* the components listed here and implicitly ignore all the rest
             mpirun --mca btl openib,self -np n  your_executable/command_path
    For Ex.: mpirun --mca btl openib,self -np 48 myprog
    
    For (OpenMP + MPI):
              mpirun --mca btl sm,openib,self -np n   your_command/command_path
    For Ex. : mpirun --mca btl sm,openib,self -np 48 myprog
    For more information, please visit(see section :  How do I select which components are used?) :
    
    https://www.open-mpi.org/faq/?category=tuning
  • MVAPICH Infiniband command

              mpirun -iface ib0 -np n  your_command/command_path
    For Ex. : mpirun -iface ib0 -np 48 myprog
    
  • Granting Access to Specific Users

    The onus of controlling access to the $HOME and $SCATCH directories is on the user. One of the ways to allow access to other users is via linux file mode bits, modified using chmod - which provides read/write/execute permissions to a group of users. There is no way, however, to control which users. E.g. You can provide access to all students of your batch or no student of your batch. But cannot provide access to a specific user, without providing access to everyone else.

    ACL (Access control list)
    Grant user student1 read access to file1.
    setfacl -m u:student1:r file1
    Revoke read access from user student1 for file1
    setfacl -x  u:student1:r file1
    Delete acl from file1
    setfacl -bn file1
    Aforementioned commands can be used for directories too!

    Example - Allowing full access to specific directories for specific users by user faculty1.
    #student1 will have full access within STUDENT1 directory inside $HOME of faculty1
    #student2 will have full access within STUDENT2 directory inside $HOME of faculty1

    1. The following command disallows all students/users from accessing your $HOME directory's contents -
    2. [faculty1@login01 ~]$ chmod -R og-rwx $HOME
      [faculty1@login01 ~]$ ll ${HOME}/..|grep ${USER}
       drwx------ 4 faculty1 faculty_group     4096 Sep  7 21:15 HOME
      
    3. Create working directories for users and disallowing access to everyone else (except faculty1)
    4. [faculty1@login01 ~]$ mkdir -p ${HOME}/MYSTUDENTS/STUDENT1 ${HOME}/MYSTUDENTS/STUDENT2
      [faculty1@login01 ~]$ chmod go-rwx ${HOME}/MYSTUDENTS/STUDENT1 ${HOME}/MYSTUDENTS/STUDENT2
      
    5. Now, Allow student1 and student2 - read and execute permissions on your ${HOME} and ${HOME}/MYSTUDENTS directory as -
    6. [faculty1@login01 ~]$ setfacl -m u:student1:rx ${HOME}
      [faculty1@login01 ~]$ setfacl -m u:student2:rx ${HOME}
      
      [faculty1@login01 ~]$ setfacl -m u:student1:rx ${HOME}/MYSTUDENTS
      [faculty1@login01 ~]$ setfacl -m u:student2:rx ${HOME}/MYSTUDENTS
      
    7. Finally, allow full access on STUDENT1 and STUDENT2 directories for student1 and student2 as -
    8. [faculty1@login01 ~]$ setfacl -m u:student1:rwx ${HOME}/MYSTUDENTS/STUDENT1
      [faculty1@login01 ~]$ setfacl -m u:student2:rwx ${HOME}/MYSTUDENTS/STUDENT2
      

    The student1 and student2 should be able to submit jobs from STUDENT1 and STUDENT2 directories respectively (in your $HOME) from their login sessions. Example-
    [student1@login01 ~]$ cd ~faculty1/MYSTUDENTS/STUDENT1
    [student1@login01 STUDENT1]$ cp -r $HOME/MY_INPUT .
    [student1@login01 STUDENT1]$ qsub myJob.sh
    
    NOTE: The disk quota used is charged to user faculty1!