Submitting Jobs on Discovery Cluster

CONTENTS

Migration to SLURM:- We will be migrating to the SLURM scheduler from 5.00PM August 31, 2016. Users can test their SLURM scripts by trial runs ONLY in the respective partitions. The partitions are similar to that implemented using LSF. The LSF queues will be closed at 5.00PM August 30, 2016. LSF will be turned off at 5.00PM August 31, 2016. You are strongly urged to migrate from LSF and to test your SLURM scripts on the cluster. Please note that until LSF is turned off on the cluster the SLURM queues are for test runs only not exceeding a few seconds.
SLURM manual is available for download by clicking HERE
Full SLURM documentation is available here – http://www.slurm.schedmd.com.
1) Interactive Jobs
2) Regular Runs: – TCP/IP 10Gb/sRDMA IBTCP/IP IPOIB 56Gb/s
3) MATLAB DISTRIBUTED COMPUTING SERVER (MDCS)
4) PYTHON 2.7.5 with SAGE 5.12 and mpi4py 1.3.1
5) Rmpi 0.6-3 and SNOW 0.3-13 using R 3.0.2
6a) NAMD 2.9 with MPICH-3.0.4
6b) ORCA 3.0.2 (and FOAM EXTEND 3.2) with OPENMPI-1.8.3
7) SAS 9.4 Interactive and Batch Jobs
8) Gaussian 09 Interactive and Batch Jobs and GaussView5
9) Submitting and Running Jobs on the GPU Queue “par-gpu”
10) Submitting and Running Stata (version 14) Jobs
11) Submitting and Running Ansys Fluent (version 14) Jobs
12) Submitting and Running Mathematica (version 10.02) Jobs
13) Submitting and Running Serial Jobs using a Compute Nodes Local Storage Space
14) Submitting and Running Interactive and Batch jobs using OpenFoam 3.0+ with mpich
15) Dealing with zombie jobs on Discovery Cluster
16) Compiling OpenACC accelerated code on Discovery Cluster for use on GPU nodes
17) Using Schrodinger 2016 on Discovery Cluster

The SLURM scheduler needs the SLURM module with its prerequisites to be set in your “.bashrc” file that is in your home directory. “module whatis slurm-14.11.8″ will give you usage. After you do this and logout and login again “module list” will show at a minimum four modules including the slurm module. Check using “sinfo -le” and “sinfo -Nle” SLURM is working and you can see the partition details. See image below – click the image for a larger resolution.
slurm_0

1) Interactive Jobs:

SLURM – set SLURM modules as shown HERE:
Using SLURM an interactive node is obtained and released from login nodes discovery2 or discovery4 using the following steps:
1) Use “salloc” with “-N 1″, “- -exclusive” and “-p partition_name”. NOTE: -N is always 1 and “- -exclusive” locks an interactive node for the user. Any other use cannot use this node for interactive and/or batch jobs. This is the preferred way to get an interactive node.
2) Once the allocation is done by SLURM, use “squeue -l -u user_id” to find out which compute node from the requested partition is allocated for the interactive job.
3) ssh -X into this compute node for your interactive work. (NOTE: The interactive session on the general partitions is for 24 hours only. Partitions owned by Faculty have no time limit for interactive or batch jobs.)
4) Once done exit from the compute node. Then release the allocation using “scancel Job_Allocation_ID”, or exit again to automatically kill the allocation.
5) Type “exit” once more to release the shell.

The steps 1) to 5) described above are shown below (click image for better resolution):
slurm_28

2) Regular Runs

Regular runs using 10g backplane open to all users for serial, embarrassingly parallel or parallel jobs are to be done using the “ser-par-10g” and similar partitions.

SLURM – set SLURM modules as shown HERE:
A typical SLURM template for submitting batch jobs is shown below. Jobs are submitted using “sbatch name_of_bash_slurm_submit_script”. Make sure you are familiar with sbatch and other SLURM commands explained in the manual available here, and basic bash shell scripting available from here.
slurm_5

Submit script for TCP/IP 10g backplane runs:
SLURM:
In case of IBM Platform MPI the default MPI on the cluster for the 10g backplane add the -TCP flag (this is the default if omitted) to the mpirun command.

The mpirun line in the SLURM submit script template shown above will now look like:

mpirun -prot -TCP -srun ./mpi_mm

- Submit script for IB RDMA backplane runs:
SLURM:
For RDMA backplane runs use the “parallel-ib” partition. The basic slurm submit script is similar with the exception that the -IBV -spawn and -1sided option with mpirun instead of -TCP. Please note the code that is being used here must be compiled to have RDMA enabled ie: have IB-VERBS API library included in the code for RDMA sockets communications.

The mpirun line in the SLURM submit script template shown above will now look like:

mpirun -prot -IBV -spawn -1sided -srun ./mpi_mm

Submit script for IB IPOIB 56Gb/s backplane runs:
SLURM:
For using infiniband IPOIB TCP backplane that has 56Gb/s bandwidth (FDR Infiniband) as opposed to the 10Gb/s bandwidth using the 10g backplane and requiring the “parallel-ib” slurm partition the basic slurm submit script is similar to the template shown above with the exception the options to the mpirun command will change as shown below:

mpirun -prot -TCP -netaddr 10.100.68.0/24 -srun ./mpi_mm

(NOTE if the executable is in your path drop the “./”.)

3) MATLAB DISTRIBUTED COMPUTING SERVER (MDCS)

Type “module whatis matlab_mdcs_2016a” to see modules required for matlab and put these module loads in your .bashrc file. Log out and log in again and you should be able to run matlab using all toolboxes and the distributed computing server on the Discovery Cluster. See below – click image for larger resolution.
matlab_slurm_6

Now obtain an interactive node as described here to launch and configure matlab for first use as shown below following the these steps.
1) Launch matlab on the compute node and run “configCluster”. This is to be done once only – on matlab first run.
2) Check ClusterInfo.getQueueName is empty.
3) Then set it to the partition you wish MDCS to launch jobs on using ClusterInfo.setQueueName(‘name_of_slurm_partition’).
4) Check the queue is correctly set by running ClusterInfo.getQueueName.
5) Now exit matlab and submit your batch jobs.

Please note every time you need to change the queue to use for MDCS jobs run step 3) to 5) above from any interactive node.

Image below shows first run configure procedure and setting of the MDCS queue. Click image for a better resolution.

matlab_slurm_7

The contents of par_parallel.m is below. It is advisable to run this test first to make sure you have correct submit scripts and your environment is set up correctly.

%  ----------------------------------------------------------------------
%  Nilay K Roy PhD
%  Northeastern University, Information Technology Services, Research Computing
%  Boston, MA
%  ----------------------------------------------------------------------
%  Demo MATLAB PI Program:  Local Thread Parallel Version
%  ----------------------------------------------------------------------
%  This MATLAB script calculates PI using the trapazoidal rule from the
%  integral of the arctangent (1/(1+x**2)). This is a simple parallel code
%  which uses a 'parfor' loop and runs with a matlab pool size of 8.
%  ----------------------------------------------------------------------
%
%  Clear environment and set output format
%
  clear all; format long eng;
%
%  Set processor (lab) pool size
%
  numprocs = 4;
  poolobj = parpool(numprocs);
%
%
%   Open an output file.
%
  fid=fopen('/scratch/nroy/matlab-test/par_PI.txt','wt');
%
%   Define and initialize global variables
%
 mypi = 0.0;
 ttime = 0.0;
%
%   Define and initialize 'for' loop integration variables.
%
  nv = 10000;    %  Set default number of intervals and accuracy
% nv = input('Please define the number of intervals: ')
  ht = 0.0;
  wd = 1.0 / nv;
%
% Start stopwatch timer to measure compute time.
%
  tic;
%
% This parallel 'parfor' loop divides the interval count 'nv' implicitly among the
% processors (labs) and computes partial sums on each of the arctangent function's value
% at the assigned intervals. MATLAB then combines the partial sums implicitly
% as it leaves the 'parfor' loop construct placing the global sum into 'ht'.
%
  parfor i = 1 : nv
    x = wd * (i - 0.50);
    ht = ht + (1/(1+(x*x)));
  end
%
% The numerical integration is completed by multiplying the summed
% function values by the constant interval (differential) 'wd' to get
% the area under the curve.
%
  mypi = wd * ht;
%
%  Stop stopwatch timer.
%
  ttime = toc;
%
% Print total time and calculated value of PI.
%
fprintf('Number of intervals chosen (nv) was: %d\n', nv);
fprintf('Number of processors (labs) used was: %d\n', numprocs);
fprintf('Computed value for PI is: %3.20f\n with error of %3.20f\n', mypi, abs(pi-mypi));
fprintf('Time to complete the computation was: %6.6f\n', ttime);
%
%
fprintf(fid,'Number of intervals chosen (nv) was: %d\n', nv);
fprintf(fid,'Number of processors (labs) used was: %d\n', numprocs);
fprintf(fid,'Computed value for PI is: %3.20f\n with error of %3.20f\n', mypi, abs(pi-mypi));
fprintf(fid,'Time to complete the computation was: %6.6f\n', ttime);
%
%   Close output file.
%
fclose(fid);
%
delete(poolobj);
%
%
% End of script
%

You will have to change the path in the code above to the correct one for your case:

 fid=fopen('/scratch/nroy/matlab-test/par_PI.txt','wt');

Now you can submit and check your job and output as shown below – click image for better resolution: (Note when the parpool is open “scontrol show licenses” gives the number of MDCS slots remaining)

matlab_slurm_8

Similar examples can be run using parallel loops, SPMD (single program multiple data), and matlabpool using MDCS from multiple nodes (not the simple case shown above), and compiling MPI with matlab using MEX. These examples can be found using the link here.

For configuring Matlab Compiler Toolbox (6.2) for first time use run >>mbuild -setup from your home directory. After this you will be configured correctly to use this. Note do not run this again – only once.

4) PYTHON 2.7.5 with SAGE 5.12 and mpi4py 1.3.1

Python 2.7.5 module comes with SAGE 5.12 and mpi4py 1.3.1. Type “module whatis python-2.7.5″ to see usage and dependent module details. The output is shown below:

[nroy@discovery4 ~]$ module whatis python-2.7.5
python-2.7.5 : loads the modules environment for Python 2.7.5 and Sage 5.12.

Needs the following modules to be loaded as prerequisites

module load gnu-4.8.1-compilers
module load gnu-4.4-compilers
module load fftw-3.3.3
module load platform-mpi

Now run

>>source /shared/apps/sage/sage-5.12/sage

Do this if you are running sage from an interactive node. You do not need to do this if you are running sage using a SLURM submit script.

Put these module load commands in your .bashrc file that is found in your /home/<user-id> directory.

NOTE: To run sage as python 2.7.5 into which you can import sage run or use "sage -python". For SAGE/PYTHON scripts the first line should read "#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python".
Both versions of python "python" and "sage -python" have mpi4py 1.3.1 installed. Contact researchcomputing@neu.edu if you need help or more details or have problems.

[nroy@discovery4 ~]$
[nroy@discovery4 ~]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.8.1-compilers   2) gnu-4.4-compilers     3) fftw-3.3.3            4) platform-mpi          5) python-2.7.5
[nroy@discovery4 ~]$

To run SAGE interactively get an interactive node as shown here, then run sage.

[nroy@compute-0-005 ~]$ source /shared/apps/sage/sage-5.12/sage
┌────────────────────────────────────────────────────────────────────┐
│ Sage Version 5.12, Release Date: 2013-10-07 │
│ Type “notebook()” for the browser-based notebook interface. │
│ Type “help()” for help. │
└────────────────────────────────────────────────────────────────────┘
sage: quit()
Exiting Sage (CPU time 0m0.05s, Wall time 0m6.25s).
[nroy@compute-0-005 ~]$
You can also run SAGE as python as shown below.

[nroy@compute-0-005 ~]$ sage -python
Python 2.7.5 (default, Oct  7 2013, 07:36:31) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> from sage.all import *
>>> quit()
[nroy@compute-0-005 ~]$ exit
exit
[nroy@discovery4 ~]$

An example run using mpi4py with PYTHON/SAGE is shown below. This will be the basis for submitting PYTHON/SAGE jobs to the cluster via SLURM. First let us look at the python MPI enabled code.

[nroy@discovery4 mpi4py-test]$ cat helloworld.py 
#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python
"""
Parallel Hello World
"""

from mpi4py import MPI
import sys

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write(
    "Hello, World! I am process %d of %d on %s.\n"
    % (rank, size, name))

[nroy@discovery4 mpi4py-test]$

Notice the "#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python" if you plan to use SAGE with python or this could be "#!/shared/apps/python/Python-2.7.5/INSTALL/bin/python" if using regular python without SAGE. Note that mpi4py 1.3.1 is linked to both SAGE python and python 2.7.5 available via this module.

Now lets look at the SLURM submit script template.

[nroy@compute-0-000 mpi4py-test]$ cat slurm_mpi4py.bash 
#!/bin/bash
#set a job name  
#SBATCH --job-name=run1
#################  
#a file for job output, you can check job progress
#SBATCH --output=run1.out
#################
# a file for errors from the job
#SBATCH --error=run1.err
#################
#time you think you need; default is one day
#in minutes in this case, hh:mm:ss
#SBATCH --time=20:00
#################
#number of tasks you are requesting
#SBATCH -n 32
#SBATCH --exclusive
#################
#partition to use
#SBATCH --partition=parallel-ib
#################
#number of nodes to distribute n tasks across
#SBATCH -N 2
#################

work=/home/nroy/mpi4py-test

cd $work

mpirun -np 8 -prot -TCP -srun ./helloworld.py
[nroy@compute-0-000 mpi4py-test]$ 

On submission of this "sbatch clurm_mpi4py.bash" the job will run in the queue specified and on completion you should see output as shown below. Ignore the error outputs like "helloworld.py: Rank 0:7: MPI_Init_thread: Unable to open /opt/ibm/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_bor - running with failsafe mpi collective algorithms." This is just falling back to failsafe and is a bug that is closed by mpi4py developers.

[nroy@discovery4 mpi4py-test]$ tail -41 output_file 
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :               5.45 sec.
    Max Memory :             7 MB
    Average Memory :         7.00 MB
    Total Requested Memory : -
    Delta Memory :           -
    (Delta: the difference between total requested memory and actual max usage.)
    Max Swap :               37 MB

    Max Processes :          1
    Max Threads :            1

The output (if any) follows:

Host 0 -- ip 10.100.8.45 -- ranks 0 - 7

 host | 0
======|======
    0 : SHM

 Prot -  All Intra-node communication is: SHM

Hello, World! I am process 3 of 8 on compute-0-005.
Hello, World! I am process 6 of 8 on compute-0-005.
Hello, World! I am process 5 of 8 on compute-0-005.
Hello, World! I am process 0 of 8 on compute-0-005.
Hello, World! I am process 7 of 8 on compute-0-005.
Hello, World! I am process 4 of 8 on compute-0-005.
Hello, World! I am process 1 of 8 on compute-0-005.
Hello, World! I am process 2 of 8 on compute-0-005.

PS:

Read file <error_file> for stderr output of this job.

[nroy@discovery4 mpi4py-test]$ 

5) Rmpi 0.6-6 and SNOW 0.4-1 using R 3.0.2:

R 3.0.2 on Discovery cluster compiled with gnu compilers 4.8.1 can run parallel mpi jobs using openmpi-1.8.3 with the Rmpi and SNOW packages. R parallel runs are done similarly using SLURM submit scripts and integrates with slurm. We use openmpi instead of the default Platform MPI in this case as R does not support Platform MPI. The modules required to run parallel R jobs are shown below and can be obtained by using "module whatis R-3.0.2-gnu".

[nroy@discovery4 R-3.0.2]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) gnu-4.8.1-compilers   3) fftw-3.3.3            4) perl-5.20.0           5) slurm-14.11.8         6) openmpi-1.8.3         7) R-3.0.2-gnu
[nroy@discovery4 R-3.0.2]$ 

The SLURM submit script used is shown below.

[nroy@discovery4 test1]$ cat r-rmpi_jobsubmit.bash 
#!/bin/bash
#set a job name  
#SBATCH --job-name=run1
#################  
#a file for job output, you can check job progress
#SBATCH --output=run1.out
#################
# a file for errors from the job
#SBATCH --error=run1.err
#################
#time you think you need; default is one day
#in minutes in this case, hh:mm:ss
#SBATCH --time=20:00
#################
#number of tasks you are requesting
#SBATCH -n 32
#SBATCH --exclusive
#################
#partition to use
#SBATCH --partition=ser-par-10g
#################
#number of nodes to distribute n tasks across
#SBATCH -N 4
#################

work=/home/nroy/rmpi/test1

cd $work

mpirun -np 1 R --no-save < snow_test.R

The script that calls the R slaves using Rmpi and SNOW is shown below.

[nroy@discovery4 test1]$ cat snow_test.R 
library(Rmpi)
library(snow)

# Initialize SNOW using MPI communication. The first line will get the
# number of MPI processes the scheduler assigned to us. Everything else 
# is standard SNOW

np=8
cluster <- makeCluster(np, type="MPI")

# Print the hostname for each cluster member
sayhello <- function()
{
	info <- Sys.info()[c("nodename", "machine")]
	paste("Hello from", info[1], "with CPU type", info[2])
}

names <- clusterCall(cluster, sayhello)
print(unlist(names))

# Compute row sums in parallel using all processes,
# then a grand sum at the end on the master process
parallelSum <- function(m, n)
{
	A <- matrix(rnorm(m*n), nrow = m, ncol = n)
	row.sums <- parApply(cluster, A, 1, sum)
	print(sum(row.sums))
}

parallelSum(500, 500)

stopCluster(cluster)
mpi.exit()

[nroy@discovery4 test1]$ 

Notice in the script above "library(Rmpi)" and "library(snow)" load the Rmpi and snow packages, and in the end we must call "stopCluster(cluster)" and "mpi.exit()". Failure to do so will leave the jobs running on the nodes.

To run this use "sbatch r-rmpi_jobsubmit.bash". After completion output will be as shown below.

[nroy@discovery4 test1]$ cat run1.out 

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rmpi)
> library(snow)
> 
> # Initialize SNOW using MPI communication. The first line will get the
> # number of MPI processes the scheduler assigned to us. Everything else 
> # is standard SNOW
> 
> np=8
> cluster <- makeCluster(np, type="MPI")
	8 slaves are spawned successfully. 0 failed.
> 
> # Print the hostname for each cluster member
> sayhello <- function()
+ {
+ 	info <- Sys.info()[c("nodename", "machine")]
+ 	paste("Hello from", info[1], "with CPU type", info[2])
+ }
> 
> names <- clusterCall(cluster, sayhello)
> print(unlist(names))
[1] "Hello from compute-0-001 with CPU type x86_64"
[2] "Hello from compute-0-001 with CPU type x86_64"
[3] "Hello from compute-0-001 with CPU type x86_64"
[4] "Hello from compute-0-001 with CPU type x86_64"
[5] "Hello from compute-0-001 with CPU type x86_64"
[6] "Hello from compute-0-001 with CPU type x86_64"
[7] "Hello from compute-0-001 with CPU type x86_64"
[8] "Hello from compute-0-002 with CPU type x86_64"
> 
> # Compute row sums in parallel using all processes,
> # then a grand sum at the end on the master process
> parallelSum <- function(m, n)
+ {
+ 	A <- matrix(rnorm(m*n), nrow = m, ncol = n)
+ 	row.sums <- parApply(cluster, A, 1, sum)
+ 	print(sum(row.sums))
+ }
> 
> parallelSum(500, 500)
[1] -522.1495
> 
> stopCluster(cluster)
[1] 1
> mpi.exit()
[1] "Detaching Rmpi. Rmpi cannot be used unless relaunching R."
> 
> 
[nroy@discovery4 test1]$ 

6a) NAMD 2.9 with MPICH-3.0.4:

NAMD 2.9 with VMD 1.9.1 on Discovery cluster is for use with MPICH-3.0.4. To use NAMD with MPICH load the modules as shown below:

nroy@discovery4 ~]$ module list
No Modulefiles Currently Loaded.
nroy@discovery4 ~]$ module load gnu-4.4-compilers;module load gnu-4.8.1-compilers;module load fftw-3.3.3-single;module load fftw-3.3.3;module load mpich-3.0.4;module load NAMD-2.9_VMD-1.9.1
nroy@discovery4 ~]$
nroy@discovery4 ~]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) gnu-4.8.1-compilers   3) fftw-3.3.3-single     4) fftw-3.3.3            5) mpich-3.0.4           6) NAMD-2.9_VMD-1.9.1
nroy@discovery4 ~]$

Now "namd2" will be in your path. To use "namd2" with the appropriate files and mpirun use the SLURM submit as shown for an example below. 

slurm_6

Based on what you want to do more details for these parameters and input files are on the NAMD documentation pages here.

For visualization and analysis of structures and other interactive GUI work using VMD get an interactive node as described here. To start VMD on an itneractive nodes issue "vmd". A screen shot of this is shown below (click the image for better resolution).

slurm_7

6b) ORCA 3.0.2 (and FOAM EXTEND 3.2) with OPENMPI-1.8.3:

Full details on how to run OPENMPI jobs (v1.8.3) with sample scripts and example code is given here - http://nuweb12.neu.edu/rc/wp-content/uploads/2014/10/USING_OPENMPI_ON_DISCOVERY_CLUSTER.pdf

Once you have familiarized yourself with OPENMPI you may run ORCA or FOAM EXTEND 3.2. ORCA (v3.0.2) is made available precompiled against OPENMPI. Similarly FOAM EXTEND 3.2 is also compiled against OPENMPI. The same OPENMPI procedure described for ORCA also applies to FOAM EXTEND 3.2.

OPENMPI is needed as ORCA is only distributed precompiled against OPENMPI - see http://www.cec.mpg.de/forum/portal.php. Example runs with sample scripts and input files is given here - http://nuweb12.neu.edu/rc/wp-content/uploads/2014/10/USING_ORCA_ON_DISCOVERY_CLUSTER.pdf.

After downloading the tar.gz archives that are provided in links within the pdf documents above for the OPENMPI and ORCA examples inflate them with the command "tar -zxvf name_of_tar_gz_file".

Please use the SLURM submit template shown below - replacing the LSF template in the OPENMPI and ORCA documentation for TCP and RDMA runs and the examples with that shown below.

TCP

slurm_8

RDMA

slurm_9

ORCA EXAMPLE

slurm_10

Should you have further questions, require more assistance or need clarifications email us at "researchcomputing@neu.edu".

7) SAS 9.4 Interactive and Batch Jobs:

SAS 9.4 on Discovery Cluster can be run in two modes: Interactively using the SAS GUI by obtaining and then logging into an interactive node from SLURM or by running batch jobs using SLURM partitions. DO NOT RUN SAS on any login nodes or compute nodes directly by bypassing SLURM.

To run interactively first get an interactive node from SLURM as shown here. Make sure that the proper modules are loaded before requesting an interactive node from SLURM. After login to the interactive node "sas &" will launch the SAS GUI. Use "scancel job_id" to release the interactive node as shown here when done.

To run a batch job a SLURM submit script like the type shown below should be used that will run the SAS script. The SAS script generally has a ".sas" extension that can be omitted in the SLURM submit script. A typical SLURM submit script template is shown below that is submitted using the "sbatch name-of-submit-script" command.

[nroy@compute-0-000 sas_test]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) gnu-4.8.1-compilers   3) fftw-3.3.3            4) perl-5.20.0           5) slurm-14.11.8         6) sas-9.4
[nroy@compute-0-000 sas_test]$ cat slurm_submit_sas.bash 
#!/bin/bash
#set a job name  
#SBATCH --job-name=run1
#################  
#a file for job output, you can check job progress
#SBATCH --output=run1.out
#################
# a file for errors from the job
#SBATCH --error=run1.err
#################
#time you think you need; default is one day
#in minutes in this case, hh:mm:ss
#SBATCH --time=40:00
#################
#number of tasks you are requesting
#SBATCH -n 16
#SBATCH --exclusive
#################
#partition to use
#SBATCH --partition=ser-par-10g
#################
#number of nodes to distribute n tasks across
#SBATCH -N 1
#################

work=/home/nroy/sas_test

cd $work

sas test
[nroy@compute-0-000 sas_test]$ 

[nroy@discovery2 sas_test]$ cat test.sas
data grades;
input h1 h2;
cards;
48 66
72 75
61 70
88 79
75 91
92 93
77 90
58 69
63 68
;
filename file1 'h1.eps';
goptions reset=global DEVICE=pslepsfc gsfmode=replace gsfname=file1 hsize=4 vsize=3;
proc gplot;
plot h1*h2;
run;
filename file2 'h2.eps';
goptions reset=global DEVICE=pslepsfc gsfmode=replace gsfname=file2 hsize=4 vsize=3;
proc gplot2;
plot h2*h1;
run;
[nroy@compute-0-000 sas_test]$ sbatch slurm_submit_sas.bash 
Submitted batch job 621
[nroy@compute-0-000 sas_test]$ squeue -l
Mon Aug  8 16:48:20 2016
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
               621 ser-par-1     run1     nroy  RUNNING       0:03     40:00      1 compute-0-001
[nroy@compute-0-000 sas_test]$

Further examples and full documentation is available here.

8) Gaussian 09 Interactive and Batch Jobs and GaussView5:
Gaussian 09 on Discovery Cluster can be run in two modes: Interactively by obtaining and then logging into an interactive node from SLURM or by running batch jobs using SLURM partitions. DO NOT RUN Gaussian 09 on any login nodes or compute nodes directly by bypassing SLURM.

First set up your .bashrc with the correct "module load " and "source" directives. "module whatis gaussian-09" will show you the required modules and source command to use. After you do this logout and login again and type "module list" to make sure the modules are loaded and in the correct order. You can now run Gaussian-09. Make sure you have emailed "researchcomputing@neu.edu" to get added to the "Gaussian" group on the Discovery Cluster before you start configuring your .bashrc and using Gaussian.

For obtaining an interactive node refer to the procedure here. You can now run from command line using "g09".

For running in batch mode you will use a SLURM submit script that you will submit using "sbatch your_submit_script". The submit script at a minimum should have the following lines as shown below for an example called "NKR_slurm_job19.job" that uses an input file to "g09" called NRK_19.com". The contents of this input file is also shown below:

slurm_11
slurm_12

Please note that in the example above the SLURM submit script requested and locked one compute node and all the memory in it.

To run GaussView5 you must obtain an interactive node using SLURM. Login to this node and run "gview &". Make sure correct modules are loaded in the order indicated and the proper "source" command is also in your .bashrc. As with gaussian-09 you must be added to the gaussian users group on Discovery Cluster.

Once you exit gview logout from the interactive node assigned to you by SLURM, and execute "scancel job_id" to kill the interactive job that you used to run gview.

9) Submitting and Running Jobs on the GPU Queue "par-gpu":
Jobs (Interactive or Batch) should be run on this queue only if using CUDA. This queue is not for jobs that do not make use of the GPU present in these nodes. The entire NVIDIA GPU computing SDK and CUDA Toolkit is available on the Discovery Cluster. The SLURM templates are the same as that for the 10Gb/s TCP-IP backplane for interactive jobs here, and for batch jobs here. To use the CUDA SDK and Toolkit load the correct module via your .bashrc. Type "module whatis cuda-7.0" or "module whatis cuda-6.5" to see usage instructions.

Examples are available here for CUDA. I have a copy of my notes here on CUDA / OPENCL - MPI that may be helpful to Discovery Cluster users. This gives among other things examples of using CUDA SDK or OPENCL to run hybrid GPGPU / MPI code on the Discovery Cluster. Replace any LSF submit scripts in it with SLURM submit scripts. Contact me directly if you have any questions at n.roy@neu.edu.

If you do use or plan to use the "par-gpu"/"par-gpu-2" partition on the Discovery Cluster please note and ensure the following without exception:

1) The "par-gpu" partition consists of 32 compute nodes, where each compute node has only one NVIDIA Tesla K-20m GPU. The "par-gpu-2" partition consists of 16 compute nodes, where each compute node has only one NVIDIA Tesla K-40m GPU.

2a) Each compute node in par-gpu has 32 logical cores, hence 32 SLURM compute slots per node. There are 32 compute nodes, and hence 32 GPU's in this queue.
2b) Each compute node in par-gpu-2 has 48 logical cores, hence 48 SLURM compute slots per node. There are 16 compute nodes, and hence 16 GPU's in this queue.

3) When submitting and running interactive or batch jobs use all 32 or 48 cores in every compute node. This way SLURM will close the node to other users. Every node is locked down and no one but you will use the GPU. If you do not do this SLURM may assign another user to the same node who will then begin using the same GPU. You do not want this. For debugging, code development, test and production runs you want one or more GPU(s) for yourself.

4) For interactive jobs get an exclusive compute node as is shown HERE.

5) For batch jobs the "#SBATCH" directives must include all of the following in addition to the other ones:

a) #SBATCH -n 32 or -n 48
(or multiples of 32 or 48)
b) #SBATCH --exclusive
c) #SBATCH --partition par-gpu or par-gpu-2
d) #SBATCH -N 1
(or 2 if -n 64/96, 3 if -n 96/144 etc)

(Note -n above should be a multiple of 32/48 if you want more than one node for a batch run. Please also note time limit for these partitions for both interactive and batch runs is 24 hours.)

10) Submitting and Running Stata (version 14) Jobs

Type "module whatis stata-14" to see usage details for Stata version 14 on Discovery Cluster. Typically Stata can be run interactively or in batch mode. An example of an interactive Stata session is shown below. Note that SLURM is used to get a node, then the node is logged into with either the -X or -Y option for X11 forwarding and Stata is launched. (Click any image below for better resolution. Use the back button of your browser to go back.)

slurm_13
slurm_14

After use exit from Stata and exit as shown above from the node and use scancel to kill the interactive job.

If you want to run Stata in batch mode you will need a SLURM submit script similar to that shown below. You will also need an appropriate *.do Stata "do" file in your work directory. "stata -b" directive tells Stata to run in batch mode. Finally submit the SLURM submit script using "sbatch sumbit.bash" where in this case as shown below "submit.bash" is the name of the SLURM sumbit script. (If needed click image below for better resolution.)

slurm_15

Note this module will provide stata, xstata, stata-se, xstata-se, stata-sm and xstata-sm. Note stata-mp and xstata-mp will not run as we are not licensed for it.

11) Submitting and Running Ansys Fluent (version 14) Jobs

DO NOT RUN THIS ON THE LOGIN NODES. These login nodes are discovery2 and discovery4. Running on login nodes is also against usage policy.

PLEASE NOTE: To use this software you need prior permission of ECE Faculty who own the licensing. Please email researchcomputing@neu.edu for getting this set up. There are a limited number available and usage is in cooperation with ECE Faculty use.

The best practice is to ask for all cores on a compute (interactive mode) or multiples thereof when running fluent in batch mode. You can check out a node for 24 hours. If you want more than 24 hours checkpoint your job. Parallel settings must all be "default" selection in GUI for interconnect type and MPI type.
(Click image below for better resolution and back button of browser to go back.)

Below is an example of running fluent on a interactive node (Click image for a larger resolution). After the run logout from the node and scancel the interactive job allocation as also shown below.

slurm_16
slurm_17

For batch jobs you will need to use a SLURM submit that uses a slurm partition that will be similar to that described here. Parallel shared memory and distributed runs for Ansys are further described in more detail here.

An example of a FLUENT run using a SLURM batch submit script is shown below. The partition here is ser-par-10g and we use 2 cores on a compute node, although the node is locked down. The run uses "fluent 2d" and the run.input file as an example is also shown.

slurm_18
slurm_19

Part of the output file after the run is also shown below. Click image below for better resolution.

slurm_20

Using "fluent-cleanup.pl" you may need to clean up your fluent runs if it does not complete or exit properly. ANSYS FLUENT creates a cleanup-fluent script file. The script can be used to clean up all ANSYS FLUENT - related processes. ANSYS FLUENT creates the cleanup-script file in the current working folder with a file name that includes the machine name and the process identification number (PID)(e.g.,cleanup-fluent-mymachine-1234). Again if you have issues contact us.

12) Submitting and Running Mathematica (version 10.02) Jobs
To use Mathematica 10 for interactive and batch jobs, on first use of Mathematica 10 you need to do the following: Launch Mathematica 10 GUI from an interactive compute node using "mathematica &". (More details on getting an interactive compute node are here.) You will see a "Activate online" window. Select the "Other ways to activate" button. Then select the third option "Connect to a network license server". In the Server name box enter "discovery2" and then select the "Activate" button. After this check the "I accept the terms of the agreement" box and select the "OK" button. Now your user account is configured to use Mathematica 10.0.2 on Discovery Cluster. Note this procedure is to be done only once.

To use Mathematica on the Discovery Cluster login and type "module whatis mathematica-10", then set your .bashrc. This is one of the three methods to access the modules as described here. After the modules are loaded check if licenses are free by running "check_mathematica_licenses_avail.sh". If there are not enough licenses free the runs using interactive or batch mode will fail.

For running Mathematica in BATCH mode using the partitions on Discovery cluster you will need a submit script like "submit.bash" that is shown below. This script is then submitted using "sbatch submit.bash". Again before submitting this run the "check_mathematica_licenses_avail.sh" command to make sure there are enough licenses to run your script. If SLURM submits your script and when it goes to RUN state if enough licenses are not available your job will terminate.
slurm_21
slurm_22

Advanced scripts using "LaunchKernels" and "LaunchSlaves" on slave nodes within SLURM are also possible. Users should contact researchcomputing@neu.edu for using this and other features of Mathematica with SLURM.

13) Submitting and Running Serial Jobs using a Compute Nodes Local Storage Space

If you have a serial job that does not need more than one compute node and hence no requirement for shared storage you can run it using the local storage disk on the compute node.
Typically every compute node has around 400-500GB of space in /tmp folder that is on its local disk. This is much faster and it is advised to stage your files here before running your job. Once done delete the folder and files after copying it over to your /home or /gss_gpfs_scratch space. This way you save on using the network for file I/O but use the compute node exclusively.
Please note that you should do this only if your job will run within the cores on a single compute node.
A typical script that does this is shown below, and explained further. Click image below for better resolution.

slurm_23

The script uses only one node and that the job is constrained on one compute node only. SLURM locks this node. /tmp is the local storage space on each compute node and this folder is local to that compute node. Once the job runs successfully it copies everything over to your /home or /gss_gpfs_scratch location. Similarly before the job is run the files are staged to the /tmp location. This is configurable by the user as indicated in the script comments above.

This will reduce the network overhead of reading files from network storage locations and serial jobs that do large I/O will see a significant speed up.

Please note /tmp is around 400-500GB on each compute node and be considerate of other users. If you modify the script make sure you delete all files staged to /tmp. If /tmp fills on any compute node other users will have their jobs crash and the node will be unusable. The compute node then is rebooted as access to the node is lost.

14) Submitting and Running Interactive and Batch jobs using OpenFoam 3.0+ with mpich

OpenFoam 3.0+ (http://www.openfoam.com/) is now available on Discovery Cluster. Please use the ser-par-10g-4 partition for interactive and batch runs for this package for best performance. To use it set your ".bashrc" with the appropriate modules including prerequisites. This is shown below - usage is obtained by running "module whatis openfoam-3.0+" (click image below for better resolution). Then get an interactive node as described here for interactive runs.

slurm_24

slurm_25

An example SLURM template for an interFoam run using mpich is shown below - click the image below for better resolution.

slurm_26
slurm_27

15) Dealing with zombie jobs on Discovery Cluster

Many times jobs will exit leaving zombie processes on various compute nodes on the Discovery Cluster. If you find this happening in your case while debugging or due to other reasons like program exceptions and the like you may want to modify your submit script to help you deal with such zombie processes that use resources outside the scheduler and disrupt your user environment. A detailed example template and procedure is given in the PDF document HERE (http://nuweb12.neu.edu/rc/wp-content/uploads/2016/08/Dealing_with_zombies_Discovery_Cluster_SLURM.pdf) for you to use as a reference in dealing with such processes.

16) Compiling OpenACC accelerated code on Discovery Cluster for use on GPU nodes

To use OpenACC 2.0 with nvidia CUDA SDK 7.0 use the GCC 5.2.1 compiler on the Discovery cluster. Full details of the modules needed, and example code and runs is HERE (http://nuweb12.neu.edu/rc/wp-content/uploads/2015/09/GNU-5.2_OpenACC-2.0_cuda-7.0_discovery_gpu-nodes.pdf).

17) Using Schrodinger 2016 on Discovery Cluster

Please note that the current version of Schrodinger is Schrodinger 2016-2. "module whatis schrodinger-2016" will give usage.

Please note that the current Schrodinger license is owned by Prof. Hicham Fenniri (h.fenniri@neu.edu), Chemical Engineering, Northeastern University. Before using Schrodinger 2016 on Discovery Cluster please contact him to work out your usage requirements.

Schrodinger 2016 on Discovery Cluster can be run both interactively and in batch mode using the "par-gpu" GPGPU queue on Discovery Cluster. Full details of the modules needed, and example code and runs is HERE (http://nuweb12.neu.edu/rc/wp-content/uploads/2015/12/Running_Schrodinger_2015-Interactively-Batch_Discovery-Cluster.pdf).

Please contact researchcomputing@neu.edu if you need further help, or have issues of any sort.

SLURM command reference can be obtained from http://www.slurm.schedmd.com - for more information on the various SLURM directives and commands that can be used in the submit scripts and from the command line. A concise manual is here.

Again, if you have any questions or difficulties, require specialized assistance with respect to SLURM, platform MPI, mpich, software including new software or require training or clarifications contact us.

Back To The Top