Frequently Asked Questions
How to Cite/acknowledge
Acknowledgement text: The authors thank IIT Delhi HPC facility for computational resources.How to access the cluster?
Please see this link.What are the best practices for users?
Best practices are listed here.Which IP address should I use to access HPC?
Do NOT use IP address(es) to access HPC! Please use hpc.iitd.ac.in.Access of HPC Cluster outside IITD
For IITD Users
For Non IITD users: Please have a look at the Usage Charges. Proposal form is available on request.I have graduated/my IIT Delhi account has been deactivated. How do I retrieve my data
Please have your supervisor contact hpchelp. The data may be transferred to the PI's account.Can I run MS Windows applications on HPC?
No.How is the job priority calculated?
The job priority depends on the job's current wait time, the queue priority, size of the job (jobs smaller than 3 nodes or 6 GPUs have lower priority), and the job walltime.How do I access "high" queue?
Supervisors and PIs can request access to higher priority ("high" queue) by transferring funds to HPC. Details are available on request.How do I access "top" queue?
This is done on a case to case basis for specific jobs. Supervisors/PIs can write to hpchelp.How to check disk quota
Currently, users are limited to 100GB of space on /home and 25TB on /scratch. Users can check their quota usage :lfs quota -hu $USER /home
lfs quota -hu $USER /scratch
Please see policies page for request for increase in HOME quota.
How to check the older files
The lists all regular files in a user directory more than 30 days old.lfs find $HOME -mtime +30 -type f -print | xargs du -ch
lfs find $SCRATCH -mtime +30 -type f -print | xargs du -ch
How To Set Up SSH Keys
Step One
Create the RSA Key Pair : The first step is to create the key pair on the client machine. In the case of HPC cluster, it will be any one of the login nodes.ssh-keygen -t rsa
Step Two
Store the Keys and Passphrase Once you have entered the keygen command, you will get a few more questions:
Enter file in which to save the key (/home/demo/.ssh/id_rsa): You can press enter here, saving the file to the user home (in this case, my example user is called demo).Enter passphrase (empty for no passphrase):
The entire key generation process looks like this:ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/demo/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/demo/.ssh/id_rsa. Your public key has been saved in /home/demo/.ssh/id_rsa.pub. The key fingerprint is: 4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 demo@a The key's randomart image is: +--[ RSA 2048]----+ | .oo. | | . o.E | | + . o | | . = = . | | = S = . | | o + = + | | . o + o . | | . o | | | +-----------------+
The public key is now located in /home/demo/.ssh/id_rsa.pub The private key (identification) is now located in /home/demo/.ssh/id_rsa
Step Three
Copy the Public Key to the server. Once the key pair is generated, it's time to place the public key on the server that we want to use.
Transfer the generated keycat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Test passwordless login :ssh login01
Password should NOT be required. If password is asked, something is wrong with your setup. Please repeat the steps carefully.SSH key error during login
Sometimes users may face the ssh key error while login to HPC.
Error: Unable to activate conda environment in case of "batch Job"
Compiling and testing GPU
Two login nodes for K40 GPUs are available. For GPUs, users can login to gpu.hpc.iitd.ac.in. These nodes have two accelerator cards each.Accessing HPC facility using Windows/Linux
Please see How to access.-
Environment Modules
What are Environment Modules?
The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. modulefiles may be shared by many users on a system and users may have their own collection to supplement or replace the shared modulefiles. Modules can be loaded and unloaded dynamically and atomically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl and python. Modules are useful in managing different versions of applications. Modules can also be bundled into metamodules that will load an entire suite of different applications. Examples of usage:- List of available modules:
$ module avail
- Search module for keyword
$ module -i apropos gromacs apps/gromacs/4.6.2/intel: Gromacs-4.6.2(intel MPI+CUDA-6.0) apps/gromacs/4.6.2/intel1: Gromacs-4.6.2(intel MPI+CUDA-6.0+plumed-1.3) apps/gromacs/4.6.5/intel: Gromacs-4.6.5 + IntelMKL + CUDA-6.0 apps/gromacs/4.6.7/intel: Gromacs-4.6.7(intel MPI+CUDA-7.0) apps/gromacs/5.1.1/gnu: Gromacs-5.1.1(intel MPI+CUDA-7.5) . . . apps/test/gromacs-5.1.1: Gromacs-5.1.1(intel MPI+OpenMP+CUDA-7.0)
will search for all modules with "gromacs", case insensitive using "-i" - Search module for keyword It will search for all the module files having the given keyword. e.g., tensorflow
$ module -i keyword tensorflow apps/tensorflow/1.1.0/gpu: tensorflow - 0.11 apps/tensorflow/1.3.1/gpu: tensorflow-1.3.1 ( cuda-8.0.44 + python-2.7.13 ) apps/tensorflow/1.5.0/gpu: tensorflow-1.5.0 ( cuda-8.0 + cuDNN-6.0.21 + python-2.7.13 ) pythonpackages/2.7.13/tensorflow_tensorboard/0.1.2/gnu: tensorflow_tensorboard-0.1.2 pythonpackages/2.7.13/tensorflow_tensorboard/1.5.0/gnu: tensorflow_tensorboard-1.5.0
- List of available modules:
- Loads specific module:
$ module load apps/lammps/gpu-mixed
- Provide a brief description of the module:
$ module whatis apps/lammps/gpu-mixed apps/lammps/gpu-mixed: LAMMPS MIXED PRECISION GPU 9 Dec 2014
Resolution: Please clear the mention key from the known_hosts file as indicated in the message. The correct key will be added when you login next time.
Solution: Copy the lines between # >>> conda initialize >>> section of your .bashrc file which is present in your HPC home directory, to a separate file. Source the file inside your batch script, before conda activate command. e.g. if you copied the # >>> conda initialize >>> section to file with name condaBaseSetup. Then conda environment activation command, inside batch script need to like:
source "full path to condaBaseSetup file"conda activate "environment name"
Gtk-WARNING **: cannot open display
-OR-
Error: Can't open display:
If you are using Linux system, use ssh -X X11 Forwarding in Windows
We can run graphical programs on Linux/Solaris machines on IITD HPC remotely and display them on your desktop computer running Windows. We can do this by using running two applications together on your Windows machine: Xming and PuTTY.
What is Xming?
Xming is a PC X Window Server. This enables programs being run remotely to be displayed on your desktop. Download and run the installation program from: https://sourceforge.net/projects/xming/Navigate to the Files section and download:
a)Xming setup from the Xming folder
b)the fonts package installer from the Xming-fonts folder Note:
1.) By default both programs will be installed into the same location, so don't the worry about over writing files. We cannot work without both packages.
2.) Once installed, running All Programs > Xming > XLaunch is a good idea to see what the configuration looks like. In most cases, the default options should be just fine.
3.) Finally run All Programs > Xming > Xming to start the PC X Server. The "X" icon should be visible on the Windows Taskbar. The X Server must be started before setting up a SSH connection to the HPC facility.
What is PuTTY?
PuTTY is a free SSH client. Through PuTTY connect to HPC facility. Download the single Windows command file from: https://www.putty.org
Configuring PuTTY
Under Session, enter the hostname you want to connect to: hpc.iitd.ac.in on port 22. Make sure the connection type is ssh.
- Next, scroll to Connection > SSH > X11. Check the box next to Enable X11 Forwarding. By default the X Display location is empty. You can enter localhost:0. The remote authentication should be set to MIT-Magic-Cookie-1
- Finally go back to Session. You can save your session too, and load it each time you want to connect.
- Click Open to bring up the terminal and login using your username/password .
SCP not functional
Sometimes scp (or rsync) breaks "suddenly". Here is a list of things to check:- Do you have enough disk space?
- Are you copying to the correct path?
- Is there any entry in your ~/.bashrc files which is generating messages?
Bad Interpreter
Issue:-bash: /var/spool/PBS/mom_priv/jobs/58524.hn1.hpc.iitd.ac.in.SC: /bin/bash^M: bad interpreter: No such file or directoryWhen submitting jobscript from Windows system to HPC (Linux environment) use dos2unix, the program that converts plain text files in DOS format to UNIX format.
Example : dos2unix submitscript.sh
Job is in the Q state but getting job start mail ?
This may happen when there is a hardware failure/some other issue faced by the job after the resource allocation completed by PBS. Hence the Job is sent back to the Q state. This is because rerun parameter of job is set to true by default. To avoid this you can use #PBS -r n (i.e setting rerun parameter to false) in your batch job submission script.
External User : Not getting PBS Job status mails on the provided mail id with #PBS -M option
External user will not get any PBS job status mails, because the use of #PBS -M is restricted to IIT mail ids only.
Accessing Internet
NOTE: Do not use the normal proxy for downloads.
By default, HPC users have access to IITD intranet from login nodes. In the following procedure we are trying to get internet access from the login02 node with lynx web browser. The procedure will work on any IITD HPC node:-
IITD proxy login page can be accessed via the terminal based lynx web browser. Please set the SSL_CERT_FILE variable to the path of your IITD CA certificate.
[user1$login02]$ export SSL_CERT_FILE=$HOME/mycerts/CCIITD-CA.crtAccess the proxy login URL via lynx or firefox (ssh -X) browser after logging in to the IITD HPC account.
[user1@login02]$ lynx https://proxy82.iitd.ernet.in/cgi-bin/proxy.cgi IIT Delhi Proxy Login User ID: ____________________ Password: ____________________ Log onNOTE: The URL varies per user basis. For staff the URL is https://proxy21.iitd.ernet.in/cgi-bin/proxy.cgi
After successful authentication, you should be able to see the following output on your terminal :-
IIT Delhi Proxy Login You are logged in successfully as user1 from xx.xx.xx.x Date and Time for your last Kerberos Password Change Successful Authenticaton Unsuccessful Authentication 10-11-2015 10:22:04 18-03-2016 10:52:56 16-03-2016 10:27:34 *Please change your password at least once in three months* Click to continue browsing: https://www.cc.iitd.ernet.in/ Check your Proxy Usage/Quota here For non-browser Applications (Proxy_Name: proxy82.iitd.ernet.in Proxy_IP: 10.10.79.29 Proxy_port: 3128) * Click "Log out" to logout: Log out Please keep this page open and browse from another window/tabNotedown the proxy ip & port ( Proxy_IP is 10.10.79.29 and Proxy_port is 3128) & the node's hostname (login02).
From a new terminal , log in to the hpc account & go to the same node (login02) where lynx is running & set http_proxy , https_proxy environment variable within your terminal as:
[user1@login02]$ export http_proxy=10.10.79.29:3128 [user1@login02]$ export https_proxy=10.10.79.29:3128Now you can use commands like wget to access internet.
NOTE: Only one user can access the internet at a time on any particular node (with the same proxy). If the user finds that already a different user accessing the internet on the same node in which the user is trying to access the internet, then please shift to different nodes for the internet access.
NOTE: Do not use the normal proxy for downloads.
- Get the download server access. Please read the instructions here Download Server Access Instructions
- Login to the download server e.g., ssh youruserid@download.iitd.ac.in
- Create a directory on download server e.g., mkdir -p $HOME/hpc.scratch
- Now mount your desired HPC folder on download server using sshfs e.g., sshfs hpc:${HOME/home/scratch} hpc.scratch
- Now your HPC folder is mounted on hpc.scratch folder. You can start downloading the data directly from the outside server to the HPC
- After finishing your work, please unmount the HPC folder e.g., fusermount -u $HOME/hpc.scratch
Procedure to download the data from outside server to HPC (via sshfs)
Important:Whatever the changes you made in the folder (on which you mounted your HPC folder), it directly links with your HPC folder. So be careful while doing any changes.
Proxy Table
Users can check their authorized proxy from the table below and use it accordingly while accessing the internet on HPC.
Category | faculty | retfaculty | phd | mtech | btech | dual | integrated | staff | irdstaff | mba | mdes | msc | msr | pgdip | diit | research |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Proxy | 82 | 82 | 61 | 62 | 22 | 21 | 21 | 21 | 21 | 21 | 21 | 21 | xen03 |
How to check your project
First you need to login into amgr using your kerberos password.[jrahul.vfaculty@login03 ~] $ amgr login Password: $ amgr ls project cc
Error during amgr login
Error Type 1
Note:Those who are using Mac systems or their locale is not set properly, please try setting your locale explicitly.
Error Message:
e.g., Click discovered that you exported a UTF-8 locale but the locale system could not pick up from it because it does not exist. The exported locale is "en_IN.UTF-8" but it is not supported
Resolution:
export LC_ALL=en_IN.UTF-8 export LANG=en_IN.UTF-8 export LANGUAGE=en_IN.UTF-8
Error Type 2
Error Message:e.g., urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:719)> Unable to connect to Allocation Manager
Resolution: Unset http_proxy and https_proxy environment variable
How to check Deadline Project Budget Status
First you need to login into amgr using your kerberos password.[pbsadmin@login01 ~] $ amgr login Password: $ amgr checkbalance project -n tapannayak.spons {'Cost': 140214.06}Where :
"tapannayak.spons" is the deadline project name.Please use your authorized deadline project after "-n".
qsub: Budgets: Required budget is not available for this transaction.
The project does not have sufficient funds to run the job. Please contact your supervisor/PI.Not Running: Job would conflict with reservation or top job
This would indicate that your job would run after the next higher priority job(s) has/have started.Permission denied error while installing software/application ?
HPC users don't have access to locations such as /home/apps/.. /home/soft/.. /opt/.. etc. Please use any location in your allocated HOME or SCRATCH location for installation of software/application etc.(Hint : In case of most of the software/application installation path is generally provided using --prefix=
HPC account disabled because of running jobs on login nodes ?
Please go through : https://supercomputing.iitd.ac.in/?pols#misuseMy HPC account expired, how to renew HPC account ?
The steps are exactly same which you followed while applying for account. Reapply for account using this link : (Accessible within IIT Delhi Network or over VPN only)PBS Jobs Rerun
To make the job rerunnable after any crash/failure happened:#PBS -r yTo make the job not rerunnable:
#PBS -r n
Out of memory / segmentation fault
It is possible that the program runs out of the default memory available. Users are advised to update the "ulimit" using the following commands:$ ulimit -s unlimitedIf this resolves the issue, the same commands should be added to the ~/.bashrc file.
Non-availability of slots and jobs on hold
ERROR: Not Running: Either request fewer slots for your application, or make more slots available for useResolution: please mention the "mpiprocs" resource value as the same as ncpus value in your job submission script.
qsub: Error: Insufficient balance
ERROR: Not Running: qsub:Error: insufficient balance
The project has insufficient funds to run the requested job. Users should check the allocation budget and contact the supervisor/PI or HPC representative in case the allocation is insufficient.
Intel MPI Infiniband Usage
Intel MPI Library enables you to select a communication fabric at runtime without having to recompile your application. By default, it automatically selects the most appropriate fabric based on your software and hardware configuration. This means that in most cases you do not have to bother about manually selecting a fabric.
I_MPI_FABRICS | ||||||
Fabric/Network |
Network hardware and software used |
|||||
shm | Shared memory(for intra-node communication only) | |||||
dapl | Direct Access Programming Library* (DAPL)-capable network fabrics, such as InfiniBand* | |||||
ofa | OpenFabrics Alliance* (OFA)-capable network fabrics, such as InfiniBand* (through OFED* verbs) | |||||
tcp | TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*) |
dapl,ofa,tcp,tmi,ofi => ie first INTEL MPI library will check, is the available network "dapl" is appropriate/fast enough to run the code/application, if fails then ofa, if fails then tcp and so on.To force infiniband network to be used :
set the following environment variable in your job script or ${HOME}/.bashrc by executing below command :-For MPI:
export I_MPI_FALLBACK="0" [ Do not switch to other available network ] export I_MPI_FABRICS="ofa" or export I_MPI_FABRICS="dapl" ; export I_MPI_DAPL_PROVIDER="ofa-v2-mlx5_0-1u" (if using DAPL and ofa-v2-mlx5_0-1u valid for IITD HPC Cluster).For OpenMP+MPI :
if your application is hybrid : export I_MPI_FALLBACK="0" [ Do not switch to other available network ] export I_MPI_FABRICS="shm:ofa" export I_MPI_FABRICS="shm:dapl" (if using DAPL)For OpenMP:
export I_MPI_FABRICS="shm:shm" To check which fabric is currently used, you can set the I_MPI_DEBUG environment variable to 2: mpirun –np n -genv I_MPI_DEBUG=2 your_command/command_path ; where "n" => number of processes. For Ex. : mpirun -np 48 -genv I_MPI_DEBUG=2 myprog You can also specify above variables in your mpirun command : mpirun –n n -genv I_MPI_FALLBACK=0 -genv I_MPI_FABRICS="shm:ofa" your_command/command_path For Ex. : mpirun –n 48 -genv I_MPI_FALLBACK=0 -genv I_MPI_FABRICS="shm:ofa" myprog For more information please visit below link :https://software.intel.com/en-us/node/535584
OpenMPI infiniband flags
For MPI :Tell Open MPI to include *only* the components listed here and implicitly ignore all the rest mpirun --mca btl openib,self -np n your_executable/command_path For Ex.: mpirun --mca btl openib,self -np 48 myprogFor (OpenMP + MPI):
mpirun --mca btl sm,openib,self -np n your_command/command_path For Ex. : mpirun --mca btl sm,openib,self -np 48 myprog For more information, please visit(see section : How do I select which components are used?) :https://www.open-mpi.org/faq/?category=tuning
MVAPICH Infiniband command
mpirun -iface ib0 -np n your_command/command_path For Ex. : mpirun -iface ib0 -np 48 myprog
Granting Access to Specific Users
The onus of controlling access to the $HOME and $SCATCH directories is on the user. One of the ways to allow access to other users is via linux file mode bits, modified using chmod - which provides read/write/execute permissions to a group of users. There is no way, however, to control which users. E.g. You can provide access to all students of your batch or no student of your batch. But cannot provide access to a specific user, without providing access to everyone else.ACL (Access control list)
Grant user student1 read access to file1.
setfacl -m u:student1:r file1Revoke read access from user student1 for file1
setfacl -x u:student1:r file1Delete acl from file1
setfacl -bn file1Aforementioned commands can be used for directories too!
Example - Allowing full access to specific directories for specific users by user faculty1.
#student1 will have full access within STUDENT1 directory inside $HOME of faculty1
#student2 will have full access within STUDENT2 directory inside $HOME of faculty1
- The following command disallows all students/users from accessing your $HOME directory's contents -
- Create working directories for users and disallowing access to everyone else (except faculty1)
- Now, Allow student1 and student2 - read and execute permissions on your ${HOME} and ${HOME}/MYSTUDENTS directory as -
- Finally, allow full access on STUDENT1 and STUDENT2 directories for student1 and student2 as -
[faculty1@login01 ~]$ chmod -R og-rwx $HOME [faculty1@login01 ~]$ ll ${HOME}/..|grep ${USER} drwx------ 4 faculty1 faculty_group 4096 Sep 7 21:15 HOME
[faculty1@login01 ~]$ mkdir -p ${HOME}/MYSTUDENTS/STUDENT1 ${HOME}/MYSTUDENTS/STUDENT2 [faculty1@login01 ~]$ chmod go-rwx ${HOME}/MYSTUDENTS/STUDENT1 ${HOME}/MYSTUDENTS/STUDENT2
[faculty1@login01 ~]$ setfacl -m u:student1:rx ${HOME} [faculty1@login01 ~]$ setfacl -m u:student2:rx ${HOME} [faculty1@login01 ~]$ setfacl -m u:student1:rx ${HOME}/MYSTUDENTS [faculty1@login01 ~]$ setfacl -m u:student2:rx ${HOME}/MYSTUDENTS
[faculty1@login01 ~]$ setfacl -m u:student1:rwx ${HOME}/MYSTUDENTS/STUDENT1 [faculty1@login01 ~]$ setfacl -m u:student2:rwx ${HOME}/MYSTUDENTS/STUDENT2
The student1 and student2 should be able to submit jobs from STUDENT1 and STUDENT2 directories respectively (in your $HOME) from their login sessions. Example-
[student1@login01 ~]$ cd ~faculty1/MYSTUDENTS/STUDENT1 [student1@login01 STUDENT1]$ cp -r $HOME/MY_INPUT . [student1@login01 STUDENT1]$ qsub myJob.shNOTE: The disk quota used is charged to user faculty1!
GUI/Visualization
Use of GPU nodes for visualization is being released on a test basis for use on HPC. Use cases include GUI based interactive jobs, Heavy rendering and visualization etc. Currently tested with VMD, Ansys Fluent and MATLAB. Instructions are as follows:NOTE: GUI/Visualization is not available on skylake nodes.
Visualization without GPU Rendering
- Login with -X to HPC.
- Submit an interactive job with -X for haswell nodes ,
- For ex. qsub -I -X -P cc -q standard -lselect=1:ncpus=1:centos=haswell -lwalltime=01:00:00
- Load the particular module
- Run the command
Visualization with OpenGL/GPU Rendering ,use any of below mentioned method
NOTE: If application requires GPU rendering, do submit the job for GPU.
STEP 1 : Get interactive session for GPU Node ( Job submission)
$ qsub -I -P Project_name -l select=1:ncpus=8:ngpus=2 -l walltime=HH:MM:SS ex. [user@login01 ~]$ qsub -I -P cc -l select=1:ncpus=8:ngpus=2 -l walltime=01:00:00 qsub: waiting for job 513670.hn1.hpc.iitd.ac.in to start qsub: job 513670.hn1.hpc.iitd.ac.in ready cd /scratch/pbs/pbs.513670.hn1.hpc.iitd.ac.in.x8z [user@khas001 /scratch/pbs/pbs.513670.hn1.hpc.iitd.ac.in.x8z]$STEP 2 : login to HPC (with X support) from diffrent window and Go to GPU Node ( Node you got in interactive session) via any of following three method.
Method 1: VGLCONNECT
$ vglconnect -f Hostname_of_GPU_Node ex. [user@login01 ~]$ vglconnect -f khas001 VirtualGL Client 64-bit v2.4 (Build 20150505) vglclient is already running on this X display and accepting unencrypted connections on port 4242. Last login: Mon Dec 4 14:53:00 2017 from login01.hpc.iitd.ac.in [user@khas001 ~]$Method 2: X-11 forwarding
$ ssh -X Hostname_of GPU_Node ex. [user@login01 ~]$ ssh -X khas001 Last login: Mon Dec 4 14:53:50 2017 from login01.hpc.iitd.ac.in [user@khas001 ~]$Method 3: VNC
1. Perform following on GPU Node ( Machine on which you took interactive session) to start vnc server
Load VNC module.
$ module load apps/turbovnc/1.2.1/precompiled ex. [user@khas001 ~]$ module load apps/turbovnc/1.2.1/precompiled [user@khas001 ~]$Set password of vnc if you haven't set this yet (this is one time activity). The password is a 6 character alpha numeric string. Do NOT use your IITD kerberos password! Once set, the same passowrd will be used for all your future VNC sessions.
$ vncpasswd ex. [user@khas001 ~]$ vncpasswd Password: Verify: Would you like to enter a view-only password (y/n)? n [user@khas001 ~]$Start vnc server.
$ vncserver ex. [user@khas001 ~]$ vncserver Desktop 'TurboVNC: khas001:2 (user)' started on display khas001:2 Starting applications specified in /home/cc/faculty/user/.vnc/xstartup.turbovnc Log file is /home/cc/faculty/user/.vnc/khas001:2.log [user@khas001 ~]$NOTE: Notedown the port "khas001:2"
2. Open a new window and login to HPC and perform following.
Load module of vnc.
$ module load apps/turbovnc/1.2.1/precompiled ex. [user@login01 ~]$ module load apps/turbovnc/1.2.1/precompiled [user@login01 ~]$connect to vnc server through vncviewer.
$ vncviewer Hostname_of_GPU_Node: number_got_at_time_of_start_vnc_server ex. [user@login01 ~]$ vncviewer khas001:2 libjawt.so path: /usr/local/java/jre1.7.0_79/lib/amd64 CConn: connected to host khas001 port 5902 CConnection: Server supports RFB protocol version 3.8 CConnection: Using RFB protocol version 3.8 CConn: Using pixel format depth 24 (32bpp) little-endian rgb888 CConn: Requesting Tight encoding TurboVNC Helper: Disabling X11 full-screen mode for window 0x0100002a CConn: Enabling GII No extended input devices. CConn: Enabling continuous updates TurboVNC Helper: Disabling X11 full-screen mode for window 0x01000054 No extended input devices.NOTE: this step will ask for vnc password.
STEP 3 : Load the module of appropriate visualization application.
ex. [user@khas001 ~]$ module load apps/visualization/paraview/3.12.0/gnu [user@khas001 ~]$Run it with vglrun command
ex. [user@khas001 ~]$ vglrun paraviewNOTE: After visualization, safely shutdown the vnc server on the GPU Node
ex. vncserver -kill khas001:2
qsub: You are not allowed to use skylake Nodes
- The user must have passed the advanced HPC test. If the user has missed the exam, another one will be anonounced soon.
The user must be associated with either an internal HPC proposal, or an externally funded proposal.
NOTE: Some skylake nodes are available for phase1 irrespective of any internal/external proposal but the user must clear the HPC Advanced test to access these nodes.
qsub: Maximum number of jobs for project "P1" already in queue high
qsub: Job violates queue and/or server resource limits
qsub: would exceed queue generic's per-user limit
Queued jobs limit is already utilized for this queue.
High Memory Flag
"highmem" flag has been deprecated. Users can use "mem" and "place=excl" flag.
e.g.,qsub -P cc -I -l select=2:ncpus=24:mem=30gb -l place=excl
Job Submission on Skylake Nodes
You have to use "centos=skylake" flag in your select statement to submit the job on skylake nodes.
e.g.,qsub -P cc -I -l select=2:ncpus=40:centos=skylake
What happens when I graduate?
- HPC access is provided on the basis of your central kerberos authenticated account. The access, and data is directly linked to this account.
- When the account expires/or is deleted, your HOME data will be backedup, and deleted. Your scratch data will be deleted. There is no backup for scratch.
- In order to preserve the data, please have it transferred to your supervisor's account (an email from your supervisor is sufficient). If continuation of work is needed, your supervisor can create a visitor account and provide the HPC access.
Cannot see Apply for HPC account on IITD supercomputing website
You need to be on IITD network to apply for HPC account, as well as use HPC. If you are outside campus, please use VPN. Your supervisor can provide you with VPN access, if you do not have it currently. Faculty can use the link below to apply for VPN and for their advisees.
How to apply for VPNFaculty members not having access to VPN, and are outside the IITD network may please write to sysadm[@]cc.iitd.ac.in
Proxy error while using conda
In order to avoid the proxy related issue associated with conda, please follow the below mentioned steps
1. Login to the proxy on particular node using lynx in one terminal i.e
How to access internet ? Proxy Tablelynx https://proxy[your proxy no.].iitd.ernet.in/cgi-bin/proxy.cgi
2. Instead of exporting proxy environment variables on other terminal create file with a name .condarc in your home location i.e cd $HOME & create .condarc file with following content (Note : Use exactly same indentation)
Copy sample file /home/apps/skeleton/.condarc to avoid yaml error & replace [proxy ip] with your proxy ip
proxy_servers:
https: https://[proxy ip]:3128
https: https://[proxy ip]:3128
save the file & check.
Note : this procedure is only for conda related proxy setup.
Common possible reasons for job not producing output/slower output generation etc.
1. Insufficient storage space to create output files See
2. Wrong output geneartion path
Example : The output path/directory doesn't exist & the application/software is not automatically creating it.
Example : Wrong output/error file path with PBS option See PBS Tutorial
3. Mutinode job not producing output/slower output generation than expected.
(HPC account setup is incomplete especially SSH Key setup required for multinode job is incomplete)
* Basic check is passwordless ssh between login nodes after login to HPC. See OR execute :
source /home/apps/skeleton/oneTimeHPCAccountEnvSetup.sh
* Check whether the command used for execution is correct, also go through
4. For reasons other than this, you can report us about the issue.
Note : For better understanding of HPC Facility, go through HPC Tutorials
Job waiting in queue for long time ?
Waiting time/Priority of job depends on no. of factors i.e. no. of requested resources, type of resources, resource availability, queue priority, requested walltime etc. The job will run when the resource requirements are met.
Jobs with lesser no. of cores will have less priority.
Skylake and Ice Lake Nodes: They are in heavy demand. Any jobs scheduled on skylake/icelake (especially V100/A100) nodes may experience long wait times in all queues sometimes.
High Queue : Submitting jobs in high queue doesn't mean that it will run quickly, YES it will have priority greater than standard & low queue, the job will run only when resource requirements are met.
License Software : If you are using license software, availability of licenses can also be one of the reason.
To check reason PBS report for your job, use command :
qstat -saw [JobID]
To check estimated start time of your job, use command :
qstat -awT [JobID]
Note : It may take some time to show estimated start time. Estimated time may change as per the priority of other jobs, it doesn't mean your job will run exactly at the same time.
You can try with different select & ncpus value combination for same no. of resources.
Before submitting longs job on HPC go through Best Practices