ATOMIN Cluster Software Info


COMPILERS AND OTHER SOFTWARE STUFF

There are, of course, standard linux GNU compilers version 6.3 available. A lot of other software such as Matlab, Mathematica, GSL and some usefull libraries can also be found. If something is not installed, please ask the administrator. Matlab is in R2016b version as the newer one cannot work headless at all! The same software is installed on all machines including: MKL - Intel Math Kernel Library (a library containing among other things highly optimized lapack and fftw3) and Intel MPI implementation. Those two are licenses free libraries provided by Intel and they are in /opt/intel/mkl and /opt/intel/mpi directories respectively.

OPENMP

All GNU compilers support multi-threading via the OPENMP extension, which enables parallel (multithreaded) jobs within each node - up to 96 (64) threads. Some trial and error testing is needed to check if the most efficient set-up is achieved with a bit less threads than the number of available computing cores (operating system needs sometimes quite a lot of computing power, specially with active glusterfs connections).

INTEL MPI

Openmp is contained within single node - if even more computing power is needed then one can perform parallel jobs via MPI (Message Passing Interface) which should also perform well since our cluster is connected via both Ethernet and Infiniband that delivers our fast interconnect. Intel MPI has been tested and found out to behave quite well. It can be found in /opt/intel/mpi directory. For jobs demanding more than a single machine Intel MPI requires a file called mpd.hosts specifying the nodes on which the program should be run. It should contain a list of nodes with appropriate number of processes in the form similar to:
complex01:96
complex02:96
complex03:96
complex04:96
complex05:96
complex06:96
complex07:64
complex08:64
This file needs to be created in the working directory of the program, i.e., the one from which mpirun is invoked. There is also an option to the mpirun command which enables usage of other than default file. In multinode case one also needs to define communication channel, for Intel MPI and our cluster it is "-r ssh". Another parameter useful for communication optimization "-genv I_MPI_DEVICE rdssm" indicates a mixed (hybrid) communication between cores (shared memory + Infiniband). The full command running such MPI job (still with Intel MPI) is: /opt/intel/mpi/intel64/bin/mpirun -r ssh -genv I_MPI_DEVICE rdssm -np total_core_number ./program_name PBS ADVICE: as queue system assigns the nodes dynamically, the file mpd.hosts cannot be created in advance. Instead the system creates a file containing all assigned nodes and passes the name of the file in the environment variable PBS_NODEFILE - this file needs to be used instead of the default mpd.hosts or, alternatively, one may copy the file to the mpd.hosts within the batch script before running mpirun.

OTHER MPI SOFTWARE

All other MPI related software (libraries etc.) may be listed (and also managed) via:
mpi-selector --list
the description of all that software is way to big to include here, for the details contact an administrator and be prepared for LONG reading.

QUEUE SYSTEM (SLURM)

At the moment the following queues are defined:
Q.name   Node no.  No. of cores per node   RAM available per node   Walltime
bigone   1-6            96                      256   GB            Infinite
small    1-2            64                      128   GB            Infinite
The "small" queue is served by complex07 and 08 only, whereas complex01 to complex06 serve the bigone.