Submitting Jobs on Discovery Cluster

CONTENTS

1) Interactive Jobs
2) Regular Runs: – TCP/IP 10Gb/sRDMA IBTCP/IP IPOIB 56Gb/s
3) MATLAB DISTRIBUTED COMPUTING SERVER
4) PYTHON 2.7.5 with SAGE 5.12 and mpi4py 1.3.1
5) Rmpi 0.6-3 and SNOW 0.3-13 using R 3.0.2
6) NAMD 2.9 with MPICH-3.0.4
7) SAS 9.4 Interactive and Batch Jobs
8) Gaussian 09 Interactive and Batch Jobs and GaussView5
9) Submitting and Running Jobs on the GPU Queue “par-gpu”
10) Submitting and Running Stata (version 13) Jobs
11) Submitting and Running Ansys Fluent (version 14) Jobs
12) Submitting and Running Mathematica (version 9) Jobs
13) Submitting and Running Serial Jobs using a Compute Nodes Local Storage Space

1) Interactive Jobs:

All users can obtain an interactive node to work on using bsub as shown below:bsub_interactive-10g

The -I option above is for interactive and -q specifies the interactive queue to use. -n option asks for processors. For “interactive-10g” or “interactive-ib” you cannot get more than 16. In general using one processor should suffice so please be considerate. If no slots are available you will have to wait and you will eventually be logged in when a slot is available. You can check the cores you are using and free interactive nodes as shown below.bhosts_interactive

When done type exit and the interactive job will end and the node core will be available as shown below.bhosts_interactive_exit

Similar procedure applies to the other interactive queue “interactive-ib” but this queue is limited to users approved by RCC. All users do not have access to this queue.

For debugging parallel code use the regular queue “ser-par-10g” for non-RDMA enabled code (TCP/IP) and “parallel-ib” for RDMA enabled code but request a smaller number of cores in the “#BSUB -n” option. Once debugging is done you can scale up the number of cores.

You can also get an interactive node from which you can launch GUI’s like Matlab. In this case the “bsub” submission directive is different. The “-IX” option is used and the directive is run the background using “&”. When a compute node is obtained hit the “enter” key. Then log into the compute node and you can launch GUI’s. This is shown below for a CFD GUI paraFoam from the CFD openfoam 2.3 package. Some users who extensively configure their .bashrc and .ssh/authorized_keys and .ssh/known_hosts may have issues with -IX then they use the -Is option with the -XF flag for X11 windows forwarding. This is similar to using the -IX option.

NOTE: If you get “<<\Terminated while pending>>” message then try using “-Is” option and use the “-XF” flag with it for explicit X11 Windows Forwarding. The command is shown below:

[nilay.roy@discovery4 ~]$  bsub -Is -XF -n 1 -q ht-10g /bin/bash 
Job <215703> is submitted to queue .
<<\ssh X11 forwarding job>>
<<\Waiting for dispatch ...>>
Warning: Permanently added '10.100.8.46' (RSA) to the list of known hosts.
<<\Starting on compute-0-006>>
[nilay.roy@compute-0-006 ~]$ exit
exit
[nilay.roy@discovery4 ~]$ bjobs -w
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
215703  nilay.roy RUN   ht-10g     discovery4  compute-0-006 /bin/bash  Aug  6 12:10
[nilay.roy@discovery4 ~]$
[nilay.roy@discovery4 ~]$ bkill 215703
Job <215703> is being terminated
[nilay.roy@discovery4 ~]$ 
[nilay.roy@discovery4 ~]$ bjobs -w
No unfinished job found
[nilay.roy@discovery4 ~]$ 

Remember to “bkill <\job_id>” when done.

Click image below for better resolution.
int-10g-gui

After you are done log out and issue “bjobs” to get the job ID of the interactive job. Then issue “bkill job_id”. This also is shown below. Click image below for better resolution.

int-10g-gui_logout

2) Regular Runs

Regular runs using 10g backplane open to all users for serial, embarrassingly parallel or parallel jobs are to be done using the “ser-par-10g” queue. A typical template to use for submitting such jobs are given below and explained in detail. Make sure you are familiar with bsub and other platform LSF commands available from here and basic bash shell scripting available from here.

Submit script for TCP/IP 10g backplane runs:

A typical submit script for this is shown below and the template can be downloaded from here. Cut and paste from the pdf into a file ending with .bash then modify to use. For example this file could be named job125.bash and you would submit it at the command prompt using

>>bsub < job125.bash

Use the “bjobs” or “bjobs -w” command to see your running or pending jobs and “bjobs -w -u all” to see all running or pending jobs on the cluster.

TCP LSF TEMPLATEbsub_tcp_templateThe #BSUB directives always come first. -J is a unique job name and must be changed. -R has the span[ptile] option that can be omitted unless you want to use a fixed number of cores on every node the job is dispatched to. For the parallel-ib queue or the ser-par-10g queue this should not exceed 16 if used. You can also specify the minimum memory you need in the -R option as well. For the ht-10g queue this should not exceed 32.

The bsub submit command also has a -w option that allows you to put in dependencies before a job can run … for example if you want the job to run only after another completes. Further details for this and other options and if you have questions or problems contact us.

- Submit script for IB RDMA backplane runs:

For RDMA backplane runs using the “parallel-ib” queue the template is slightly different. We use the -IBV -spawn and -1sided option with mpirun instead of -TCP and drop the -lsf option and give a hostlist. However the user just has to use the right template and not worry about making changes. The template for a typical submit script is below and can be downloaded from here. The information in the template is self explanatory. Only users authorized to use this queue should download and use this template. Again the use of the -R span[ptile] option is up to the user – it prevents a job from getting fragmented across nodes.

IBV LSF TEMPLATEbsub_IBV_template
Submit script for IB IPOIB 56Gb/s backplane runs:

For using infiniband IPOIB TCP backplane that has 56Gb/s bandwidth (FDR Infiniband) as opposed to the 10Gb/s bandwidth using the 10g backplane and requiring the “parallel-ib” queue the template is slightly different. In this case the template is shown below.

IPOIB TCP LSF TEMPLATE
bsub_ipoib_template

Notice that the template is similar to the 10Gb/s TCP template except for the “-netaddr” directive for “mpirun”. This tells MPI to use the IB TCP stack (@56Gb/s) not the usual 10Gb/s TCP stack common on all compute nodes. Compute nodes “compute-1-064 to compute-1-127″ have IB. Since the queue “parallel-ib” uses these nodes,  this will work (like RDMA) on this queue only. This template can be downloaded from here.

3) MATLAB DISTRIBUTED COMPUTING SERVER

Type “module whatis matlab_dce_2013b” to see modules required for matlab and put these module loads in your .bashrc file. Log out and log in again and you should be able to run matlab using all toolboxes and the distributed computing server on the Discovery Cluster.

At first login into your account or if you want to start using matlab you must copy the following cluster configuration directory called “.matlab” into your home directory (note the “.” before matlab – this is required). This is shown below. Only after doing this should you start matlab (>>matlab &) and run the command from matlab “>>configCluster(‘discovery’)” and then begin to use it. This will set up the configuration environment and needs to be done only once on first use. Failure to do this will make Discovery Cluster queues unavailable for use with matlab and you will have to run jobs locally on the login node. This is against Discovery Cluster usage guidelines.

[nroy@discovery2 ~]$ cp -R /shared/apps/matlab/matlab-2013b/env_script/.matlab ~/.

Many examples of parallel and distributed computing are available for Matlab on their website here.

Check that you have the matlab environment correctly setup by running matlab from a compute node got interactively as shown below. All the toolboxes should show up. (Click the image below for full resolution.)
matlab_dcs_1

You can also run an interactive matlab gui from the nodes to look at your code. However do not use the login nodes for computation via the gui. It is only for editing your code. If you want to run interactively get a compute node interactively from LSF as described below.

Lets say we want to run an interactive Matlab session on one core using the “interactive-10g” queue on Discovery. Obtain an interactive queue with X-forwarding enabled using the -IX option with bsub command. Then login to that node with the “-X’” or “-Y” option to ssh. You will not be logged in automatically. Press enter when you see the <<Starting on ….>> line to get to the prompt before ssh. Next load the required modules and start Matlab. This is shown below:

Matalb_Interactive_DiscoveryAfter you are done – exit from Matlab then exit from the node. Then type “bjobs” and find the job id of the interactive session. Next use bkill with the job id to kill the interactive job running in LSF. Check again with bjobs that this job is terminated by lsf. This is shown below.

Matalb_Interactive_Discovery_ExitThe figure below shows an interactive matlab session on the cluster.

matlab_gui_discovery_login_node

Now let us run an example that starts a matlab process using parallel computing toolbox, and spawns 8 worker threads to compute the value of PI in parallel. This example makes use of an LSF submit script. With the parallel computing toolbox one cannot run jobs on more than one node – that is where the Distributed Computing Server component of Matlab is used. However all cores on the node can be used. For spanning jobs across nodes launch and use the Matlab GUI, or use a LSF submit scipt to run a batch job. The former is described in greater detail later but both methods can be uses to run Matlab with Distributed Computing Server component. Not all jobs will benefit from this. Many jobs show speedup when run on multiple cores but restricted to within a node. Each node has 16 physical cores (or 32/40 logical cores if hyper-threading with turbo boost up to 4.0Ghz is turned on). Testing your runs both ways is advised.

The lsf submit script for using the parallel computing toolbox is shown below:matlab_dcs_bsub

In the script above we chose “#BSUB -n” as 9 since the master matlab dcs thread will spawn 8 matlab worker threads. The input matlab file is called par_parallel.m. Note we do not include the .m, as it is assumed.

The contents of par_parallel.m is below and can be downloaded from here – par_parallel.m. It is advisable to run this test first to make sure you have correct submit scripts and your environment is set up correctly. Make sure you rename the file par_parallel.m removing the .txt extension.

%  ----------------------------------------------------------------------
%  Nilay K Roy PhD
%  Northeastern University, Information Technology Services, Research Computing
%  Boston, MA
%  ----------------------------------------------------------------------
%  Demo MATLAB PI Program:  Local Thread Parallel Version
%  ----------------------------------------------------------------------
%  This MATLAB script calculates PI using the trapazoidal rule from the
%  integral of the arctangent (1/(1+x**2)). This is a simple parallel code
%  which uses a 'parfor' loop and runs with a matlab pool size of 8.
%  ----------------------------------------------------------------------
%
%  Clear environment and set output format
%
  clear all; format long eng;
%
%  Set processor (lab) pool size
%
  matlabpool open 8;
  numprocs = matlabpool('size');
%
%
%   Open an output file.
%
  fid=fopen('/scratch/nroy/matlab-test/par_PI.txt','wt');
%
%   Define and initialize global variables
%
 mypi = 0.0;
 ttime = 0.0;
%
%   Define and initialize 'for' loop integration variables.
%
  nv = 10000;    %  Set default number of intervals and accuracy
% nv = input('Please define the number of intervals: ')
  ht = 0.0;
  wd = 1.0 / nv;
%
% Start stopwatch timer to measure compute time.
%
  tic;
%
% This parallel 'parfor' loop divides the interval count 'nv' implicitly among the
% processors (labs) and computes partial sums on each of the arctangent function's value
% at the assigned intervals. MATLAB then combines the partial sums implicitly
% as it leaves the 'parfor' loop construct placing the global sum into 'ht'.
%
  parfor i = 1 : nv
    x = wd * (i - 0.50);
    ht = ht + (1/(1+(x*x)));
  end
%
% The numerical integration is completed by multiplying the summed
% function values by the constant interval (differential) 'wd' to get
% the area under the curve.
%
  mypi = wd * ht;
%
%  Stop stopwatch timer.
%
  ttime = toc;
%
% Print total time and calculated value of PI.
%
fprintf('Number of intervals chosen (nv) was: %d\n', nv);
fprintf('Number of processors (labs) used was: %d\n', numprocs);
fprintf('Computed value for PI is: %3.20f\n with error of %3.20f\n', mypi, abs(pi-mypi));
fprintf('Time to complete the computation was: %6.6f\n', ttime);
%
%
fprintf(fid,'Number of intervals chosen (nv) was: %d\n', nv);
fprintf(fid,'Number of processors (labs) used was: %d\n', numprocs);
fprintf(fid,'Computed value for PI is: %3.20f\n with error of %3.20f\n', mypi, abs(pi-mypi));
fprintf(fid,'Time to complete the computation was: %6.6f\n', ttime);
%
%   Close output file.
%
fclose(fid);
%
matlabpool close;
%
%
% End of script
%

You will have to change the path in the code above to the correct one for your case:

 fid=fopen('/scratch/nroy/matlab-test/par_PI.txt','wt');

Now you can submit and check your job and output as shown below:

[nroy@discovery4 matlab_dcs_test]$ bsub < bsub_matlab.bash 
 Job <36768> is submitted to queue <ht-10g>.
 [nroy@discovery4 matlab_dcs_test]$ bjobs
 JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
 36768   nroy    RUN   ht-10g     discovery4  9*compute-0 *labJob.01 Oct 25 12:07
 [nroy@discovery4 matlab_dcs_test]$

You can peek into your running job as shown below:

[nroy@discovery4 matlab_dcs_test]$ bpeek 36768 
<< output from stdout >>

< M A T L A B (R) >                   Copyright 1984-2013 The MathWorks, Inc.                     R2013b (8.2.0.701) 64-bit (glnxa64)                               August 13, 2013

  To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com.   Starting matlabpool using the 'local' profile ... << output from stderr >>

[nroy@discovery4 matlab_dcs_test]$

When the job is complete you can view the results as shown below:

[nroy@discovery4 matlab-test]$ cat par_PI.txt 
 Number of intervals chosen (nv) was: 10000
 Number of processors (labs) used was: 8
 Computed value for PI is: 0.78539816360578118548
  with error of 2.35619448998401193052
 Time to complete the computation was: 1.411928
 [nroy@discovery4 matlab-test]$ pwd
 /scratch/nroy/matlab-test
 [nroy@discovery4 matlab-test]$ pwd
 /scratch/nroy/matlab-test
 [nroy@discovery4 matlab-test]$

One can also submit jobs directly from the Matlab GUI launched from a login node that can span all cores in one node (using parallel computing toolbox like the example described above) or span many cores across multiple nodes using both the parallel computing toolbox and the Distributed Computing Server. This is now explained in detail below. For this example we will use the same code (“par_parallel.m”) but will comment out the following two lines: “matlabpool open 8″ and “matlabpool close”.

Steps:

  1. Launch the Matlab GUI after login in to a login node as shown above.
  2. Go the “Parallel” Icon and Select -> “Current Cluster” -> “discovery_local_2013b”. This is shown in detail in the image below. (Click on the image for full resolution and the back arrow on the browser to get back to this document.) matlab_dcs_01
  3. Next run the cluster configuration script for this cluster, and “cd” to the working directory. Now set the queue to use for which you have permissions like ht-10g, or ser-par-10g etc. (Hint: For any command you can autocomplete using the tab key and see the full sub-command set.)matlab_dcs_02
  4. Now submit a job using the Matlab batch command. Lets say we want to use 32 cores spanning two nodes with 16 cores spanned in each node. We first set this by running ClusterInfo.setProcsPerNode(16) and then run the batch command. In the Job Monitor window we can see the progress of the job and using “bjobs” on the terminal window we can also see the job status. This is shown below.matlab_dcs_03

Another example using SPMD (single program multiple data) is shown below and the file for this “spmd_parallel.m” can be downloaded from here. Make sure you rename the file spmd_parallel.m removing the .txt extension. Note that using a matlabpool size of 32 spans 16 workers on each of two nodes and the master worker (one) runs from the third node. Since each node (except hyper-threading enabled nodes) cannot run more that 16 threads the master needs a third node. If you use a matlabpool size of 31 then we wil have all 32 (31 workers + 1 master) run from two nodes (16 each). Note also that hyper-threading enabled nodes can run up to 32 threads. Command “j.wait” forces matlab gui to wait for the job to complete before allowing the user to continue working. Use this for short jobs and debugging. This is shown below.matlab_dcs_04

Similar examples can be run using parallel loops, SPMD (single program multiple data), and matlabpool using MDCS from multiple nodes (not the simple case shown above), and compiling MPI with matlab using MEX. These examples can be found using the link  here.

4) PYTHON 2.7.5 with SAGE 5.12 and mpi4py 1.3.1

Python 2.7.5 module comes with SAGE 5.12 and mpi4py 1.3.1. Type “module whatis python-2.7.5″ to see usage and dependent module details. The output is shown below:

[nroy@discovery4 ~]$ module whatis python-2.7.5
python-2.7.5 : loads the modules environment for Python 2.7.5 and Sage 5.12.

Needs the following modules to be loaded as prerequisites

module load gnu-4.8.1-compilers
module load gnu-4.4-compilers
module load fftw-3.3.3
module load platform-mpi

Now run

>>source /shared/apps/sage/sage-5.12/sage

Do this if you are running sage from an interactive node. You do not need to do this if you are running sage using a LSF submit script.

Put these module load commands in your .bashrc file that is found in your /home/<user-id> directory.

NOTE: To run sage as python 2.7.5 into which you can import sage run or use "sage -python". For SAGE/PYTHON scripts the first line should read "#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python".
Both versions of python "python" and "sage -python" have mpi4py 1.3.1 installed. Contact researchcomputing@neu.edu if you need help or more details or have problems.

[nroy@discovery4 ~]$
[nroy@discovery4 ~]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.8.1-compilers   2) gnu-4.4-compilers     3) fftw-3.3.3            4) platform-mpi          5) python-2.7.5
[nroy@discovery4 ~]$

To run SAGE interactively get an interactive node as shown below and then run SAGE.

[nroy@discovery4 ~]$ bsub -Is -n 2 -q ht-10g /bin/bash
Job  is submitted to queue .
<>
<>
[nroy@compute-0-005 ~]$ source /shared/apps/sage/sage-5.12/sage 
┌────────────────────────────────────────────────────────────────────┐
│ Sage Version 5.12, Release Date: 2013-10-07                        │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
sage: quit()
Exiting Sage (CPU time 0m0.05s, Wall time 0m6.25s).
[nroy@compute-0-005 ~]$

You can also run SAGE as python as shown below.

[nroy@compute-0-005 ~]$ sage -python
Python 2.7.5 (default, Oct  7 2013, 07:36:31) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> from sage.all import *
>>> quit()
[nroy@compute-0-005 ~]$ exit
exit
[nroy@discovery4 ~]$

An example run using mpi4py with PYTHON/SAGE is shown below. This will be the basis for submitting PYTHON/SAGE jobs to the cluster via LSF. First let us look at the python MPI enabled code.

[nroy@discovery4 mpi4py-test]$ cat helloworld.py 
#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python
"""
Parallel Hello World
"""

from mpi4py import MPI
import sys

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write(
    "Hello, World! I am process %d of %d on %s.\n"
    % (rank, size, name))

[nroy@discovery4 mpi4py-test]$

Notice the “#!/shared/apps/sage/sage-5.12/spkg/bin/sage -python” if you plan to use SAGE with python or this could be “#!/shared/apps/python/Python-2.7.5/INSTALL/bin/python” if using regular python without SAGE. Note that mpi4py 1.3.1 is linked to both SAGE python and python 2.7.5 available via this module.

Now lets look at the LSF submit script template.

nroy@discovery4 mpi4py-test]$ cat bsub_mpi4py.bash 
#!/bin/sh
#BSUB -J JOB.1
#BSUB -o output_file
#BSUB -e error_file
#BSUB -n 8
#BSUB -q ht-10g
#BSUB -cwd /home/nroy/mpi4py-test
######## THIS IS A TEMPLATE FILE FOR TCP ENABLED MPI RUNS ON THE DISCOVERY CLUSTER ########
#### #BSUB -n has a value equal to the given value for the -np option ####
# prefix for next run is entered below
# file staging code is entered below

#### Enter your working directory below - this is the string returned from issuing the command 
#### "pwd"
#### IF you stage your files this is your run directory in the high speed scratch space mounted 
#### across all compute nodes
work=/home/nroy/mpi4py-test
#####################################################
########DO NOT EDIT ANYTHING BELOW THIS LINE#########
#####################################################
cd $work
tempfile1=hostlistrun
tempfile2=hostlist-tcp
echo $LSB_MCPU_HOSTS > $tempfile1
declare -a hosts
read -a hosts < ${tempfile1}
for ((i=0; i<${#hosts[@]}; i += 2)) ; 
do 
   HOST=${hosts[$i]}
   CORE=${hosts[(($i+1))]} 
   echo $HOST:$CORE >> $tempfile2
done
#####################################################
########DO NOT EDIT ANYTHING ABOVE THIS LINE#########
#####################################################
###### Change only the -np option giving the number of MPI processes and the executable to use 
###### with options to it
###### IN the example below this would be "8", "helloworld.py" and the options for the executable 
###### DO NOT CHANGE ANYTHING ELSE BELOW FOR mpirun OPTIONS
###### MAKE SURE THAT THE "#BSUB -n" is equal to the "-np" number below. IN this example it is 8.

mpirun -np 8 -prot -TCP -lsf helloworld.py

#/home/nroy/mpi4py-test/helloworld.py
# any clean up tasks and file migration code is entered below
#####################################################
########DO NOT EDIT ANYTHING BELOW THIS LINE#########
#####################################################
rm $work/$tempfile1
rm $work/$tempfile2
#####################################################
########DO NOT EDIT ANYTHING ABOVE THIS LINE#########
#####################################################
[nroy@discovery4 mpi4py-test]$

Note above how the helloworld.py python script is called using mpirun. The “-np” value of mpirun must equal the “#BSUB -n” value. On submission of this “bsub < bsub_mpi4py.bash” the job will run in the queue specified and on completion you should see output as shown below. Ignore the error outputs like “helloworld.py: Rank 0:7: MPI_Init_thread: Unable to open /opt/ibm/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_bor – running with failsafe mpi collective algorithms.” This is just falling back to failsafe and is a bug that is closed by mpi4py developers.

[nroy@discovery4 mpi4py-test]$ tail -41 output_file 
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :               5.45 sec.
    Max Memory :             7 MB
    Average Memory :         7.00 MB
    Total Requested Memory : -
    Delta Memory :           -
    (Delta: the difference between total requested memory and actual max usage.)
    Max Swap :               37 MB

    Max Processes :          1
    Max Threads :            1

The output (if any) follows:

Host 0 -- ip 10.100.8.45 -- ranks 0 - 7

 host | 0
======|======
    0 : SHM

 Prot -  All Intra-node communication is: SHM

Hello, World! I am process 3 of 8 on compute-0-005.
Hello, World! I am process 6 of 8 on compute-0-005.
Hello, World! I am process 5 of 8 on compute-0-005.
Hello, World! I am process 0 of 8 on compute-0-005.
Hello, World! I am process 7 of 8 on compute-0-005.
Hello, World! I am process 4 of 8 on compute-0-005.
Hello, World! I am process 1 of 8 on compute-0-005.
Hello, World! I am process 2 of 8 on compute-0-005.

PS:

Read file <error_file> for stderr output of this job.

[nroy@discovery4 mpi4py-test]$ 

5) Rmpi 0.6-3 and SNOW 0.3-13 using R 3.0.2:

R 3.0.2 on Discovery cluster compiled with gnu compilers 4.8.1 can run parallel mpi jobs using mpich 3.0.4 with the Rmpi and SNOW packages. R parallel runs are done similarly using lsf submit scripts and integrates with Platform LSF. We use mpich instead of the default Platform MPI in this case as R does not support Platform MPI. The modules required to run parallel R jobs are shown below and can be obtained by using “module whatis R-3.0.2-gnu”.

[nroy@discovery4 test1]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) gnu-4.8.1-compilers   3) fftw-3.3.3            4) mpich-3.0.4           5) R-3.0.2-gnu
[nroy@discovery4 test1]$ 

The LSF submit script used is shown below.

[nroy@discovery4 test1]$ cat r-rmpi_jobsubmit.bash 
#!/bin/sh 
#BSUB -J JOB.125 
#BSUB -o output_file 
#BSUB -e error_file 
#BSUB -n 63 #BSUB -q ht-10g 
#BSUB -cwd /scratch/nroy/rmpi/test1

######## THIS IS A TEMPLATE FILE FOR TCP ENABLED MPI RUNS ON THE DISCOVERY CLUSTER ########

#### #BSUB -n has a value equal to the given value for the -np option ####

# prefix for next run is entered below

# file staging code is entered below #mkdir /scratch/nroy/rmpi/test1

#### Enter your working directory below - this is the string returned from issuing the command "pwd" #### 
#### IF you stage your files this is your run directory in the high speed scratch space mounted across all compute nodes #### 
work=/scratch/nroy/rmpi/test1

##################################################### 
########DO NOT EDIT ANYTHING BELOW THIS LINE######### 
##################################################### 
cd $work 
tempfile1=hostlistrun 
tempfile2=hostlist-tcp 
echo $LSB_MCPU_HOSTS > $tempfile1 
declare -a hosts 
read -a hosts < ${tempfile1} 
for ((i=0; i<${#hosts[@]}; i += 2)) ; 
do
     HOST=${hosts[$i]}
     CORE=${hosts[(($i+1))]}
     echo $HOST:$CORE >> $tempfile2 
done 
##################################################### 
########DO NOT EDIT ANYTHING ABOVE THIS LINE######### 
#####################################################

###### The example below runs a R program using Rmpi and SNOW.
###### Change only the -np option giving the number of MPI processes and the executible to use with options to it
###### DO NOT CHANGE ANYTHING ELSE BELOW FOR mpirun OPTIONS
###### MAKE SURE THAT THE "#BSUB -n" is equal 1. R will spawn the number of slaves as indicated in the R script snow_test.R
###### This will be the number in #BSUB -n above less 1. In this case #BSUB -n is 63 and in the script we will request 62 slaves.
###### When using the parallel-ib queue (IB backplane on Discovery IB nodes) for non-RDMA enabled code but the regular code, the faster
###### 56Gb/s IB TCP backplane can be used. To specify this use the -netaddr option exactly as shown. Many types of parallel code benefit
###### from this and you should test if the faster backplane shows performance improvement for 64 or more cores.

mpirun -np 1 -machinefile hostlist-tcp R --no-save < snow_test.R

# any clean up tasks and file migration code is entered below

##################################################### 
########DO NOT EDIT ANYTHING BELOW THIS LINE######### 
##################################################### 
rm $work/$tempfile1 
rm $work/$tempfile2 
##################################################### 
########DO NOT EDIT ANYTHING ABOVE THIS LINE######### 
#####################################################

[nroy@discovery4 test1]$

The script that calls the R slaves using Rmpi and SNOW is shown below.

[nroy@discovery4 test1]$ pwd
/scratch/nroy/rmpi/test1
[nroy@discovery2 test1]$ cat snow_test.R 
library(Rmpi)
library(snow)

# Initialize SNOW using MPI communication. The first line will get the
# number of MPI processes the scheduler assigned to us. Everything else 
# is standard SNOW

np=62 
cluster <- makeMPIcluster(np)

# Print the hostname for each cluster member
sayhello <- function()
{
 info <- Sys.info()[c("nodename", "machine")]
 paste("Hello from", info[1], "with CPU type", info[2])
}

 names <- clusterCall(cluster, sayhello)
 print(unlist(names))                

# Compute row sums in parallel using all processes,
# then a grand sum at the end on the master process
parallelSum <- function(m, n)
{
 A <- matrix(rnorm(m*n), nrow = m, ncol = n)
 row.sums <- parApply(cluster, A, 1, sum)
 print(sum(row.sums))
}

parallelSum(500, 500)

stopCluster(cluster)
mpi.exit()

Notice in the script above “library(Rmpi)” and “library(snow)” load the Rmpi and snow packages, and in the end we must call “stopCluster(cluster)” and “mpi.exit()”. Failure to do so will leave the jobs running on the nodes.

To run this use “bsub < r-rmpi_jobsubmit.bash”. After completion output will be as shown below.

[nroy@discovery4 test1]$ cat output_file 
Sender: LSF System <lsfadmin@compute-0-005>
Subject: Job 48395:  in cluster <mghpcc_cluster1> Done

Job  was submitted from host  by user  in cluster <mghpcc_cluster1>.
Job was executed on host(s) <32*compute-0-005>, in queue , as user  in cluster <mghpcc_cluster1>.
                            <31*compute-0-006>
 was used as the home directory.
 was used as the working directory.
Started at Wed Nov 27 15:04:45 2013
Results reported at Wed Nov 27 15:04:49 2013

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/sh
#BSUB -J JOB.125
#BSUB -o output_file
#BSUB -e error_file
#BSUB -n 63
#BSUB -q ht-10g
#BSUB -cwd /scratch/nroy/rmpi/test1

######## THIS IS A TEMPLATE FILE FOR TCP ENABLED MPI RUNS ON THE DISCOVERY CLUSTER ########

#### #BSUB -n has a value equal to the given value for the -np option ####

# prefix for next run is entered below

# file staging code is entered below
#mkdir /scratch/nroy/rmpi/test1

#### Enter your working directory below - this is the string returned from issuing the command "pwd" ####
#### IF you stage your files this is your run directory in the high speed scratch space mounted across all compute nodes #### 
work=/scratch/nroy/rmpi/test1

#####################################################
########DO NOT EDIT ANYTHING BELOW THIS LINE#########
#####################################################
cd $work
tempfile1=hostlistrun
tempfile2=hostlist-tcp
echo $LSB_MCPU_HOSTS > $tempfile1
declare -a hosts
read -a hosts < ${tempfile1}
for ((i=0; i<${#hosts[@]}; i += 2)) ; do     HOST=${hosts[$i]}     CORE=${hosts[(($i+1))]}     echo $HOST:$CORE >> $tempfile2
done
#####################################################
########DO NOT EDIT ANYTHING ABOVE THIS LINE#########
#####################################################

###### The example below runs a R program using Rmpi.
###### Change only the -np option giving the number of MPI processes and the executible to use with options to it
###### DO NOT CHANGE ANYTHING ELSE BELOW FOR mpirun OPTIONS
###### MAKE SURE THAT THE "#BSUB -n" is equal to the "-np" number below. IN this example it is 80. 
###### When using the parallel-ib queue (IB backplane on Discovery IB nodes) for non-RDMA enabled code but the regular code, the faster
###### 56Gb/s IB TCP backplane can be used. To specify this use the -netaddr option exactly as shown. Many types of parallel code benefit
###### from this and you should test if the faster backplane shows performance improvement for 64 or more cores.

(... more ...)
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :               98.57 sec.
    Max Memory :             4 MB
    Average Memory :         4.00 MB
    Total Requested Memory : -
    Delta Memory :           -
    (Delta: the difference between total requested memory and actual max usage.)
    Max Swap :               37 MB

    Max Processes :          1
    Max Threads :            1

The output (if any) follows:

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rmpi)
> library(snow)
> 
> # Initialize SNOW using MPI communication. The first line will get the
> # number of MPI processes the scheduler assigned to us. Everything else 
> # is standard SNOW
> 
> np=62 
> cluster  
> # Print the hostname for each cluster member
> sayhello  
> names  print(unlist(names))
 [1] "Hello from compute-0-005 with CPU type x86_64"
 [2] "Hello from compute-0-005 with CPU type x86_64"
 [3] "Hello from compute-0-005 with CPU type x86_64"
 [4] "Hello from compute-0-005 with CPU type x86_64"
 [5] "Hello from compute-0-005 with CPU type x86_64"
 [6] "Hello from compute-0-005 with CPU type x86_64"
 [7] "Hello from compute-0-005 with CPU type x86_64"
 [8] "Hello from compute-0-005 with CPU type x86_64"
 [9] "Hello from compute-0-005 with CPU type x86_64"
[10] "Hello from compute-0-005 with CPU type x86_64"
[11] "Hello from compute-0-005 with CPU type x86_64"
[12] "Hello from compute-0-005 with CPU type x86_64"
[13] "Hello from compute-0-005 with CPU type x86_64"
[14] "Hello from compute-0-005 with CPU type x86_64"
[15] "Hello from compute-0-005 with CPU type x86_64"
[16] "Hello from compute-0-005 with CPU type x86_64"
[17] "Hello from compute-0-005 with CPU type x86_64"
[18] "Hello from compute-0-005 with CPU type x86_64"
[19] "Hello from compute-0-005 with CPU type x86_64"
[20] "Hello from compute-0-005 with CPU type x86_64"
[21] "Hello from compute-0-005 with CPU type x86_64"
[22] "Hello from compute-0-005 with CPU type x86_64"
[23] "Hello from compute-0-005 with CPU type x86_64"
[24] "Hello from compute-0-005 with CPU type x86_64"
[25] "Hello from compute-0-005 with CPU type x86_64"
[26] "Hello from compute-0-005 with CPU type x86_64"
[27] "Hello from compute-0-005 with CPU type x86_64"
[28] "Hello from compute-0-005 with CPU type x86_64"
[29] "Hello from compute-0-005 with CPU type x86_64"
[30] "Hello from compute-0-005 with CPU type x86_64"
[31] "Hello from compute-0-005 with CPU type x86_64"
[32] "Hello from compute-0-006 with CPU type x86_64"
[33] "Hello from compute-0-006 with CPU type x86_64"
[34] "Hello from compute-0-006 with CPU type x86_64"
[35] "Hello from compute-0-006 with CPU type x86_64"
[36] "Hello from compute-0-006 with CPU type x86_64"
[37] "Hello from compute-0-006 with CPU type x86_64"
[38] "Hello from compute-0-006 with CPU type x86_64"
[39] "Hello from compute-0-006 with CPU type x86_64"
[40] "Hello from compute-0-006 with CPU type x86_64"
[41] "Hello from compute-0-006 with CPU type x86_64"
[42] "Hello from compute-0-006 with CPU type x86_64"
[43] "Hello from compute-0-006 with CPU type x86_64"
[44] "Hello from compute-0-006 with CPU type x86_64"
[45] "Hello from compute-0-006 with CPU type x86_64"
[46] "Hello from compute-0-006 with CPU type x86_64"
[47] "Hello from compute-0-006 with CPU type x86_64"
[48] "Hello from compute-0-006 with CPU type x86_64"
[49] "Hello from compute-0-006 with CPU type x86_64"
[50] "Hello from compute-0-006 with CPU type x86_64"
[51] "Hello from compute-0-006 with CPU type x86_64"
[52] "Hello from compute-0-006 with CPU type x86_64"
[53] "Hello from compute-0-006 with CPU type x86_64"
[54] "Hello from compute-0-006 with CPU type x86_64"
[55] "Hello from compute-0-006 with CPU type x86_64"
[56] "Hello from compute-0-006 with CPU type x86_64"
[57] "Hello from compute-0-006 with CPU type x86_64"
[58] "Hello from compute-0-006 with CPU type x86_64"
[59] "Hello from compute-0-006 with CPU type x86_64"
[60] "Hello from compute-0-006 with CPU type x86_64"
[61] "Hello from compute-0-006 with CPU type x86_64"
[62] "Hello from compute-0-006 with CPU type x86_64"
> 
> # Compute row sums in parallel using all processes,
> # then a grand sum at the end on the master process
> parallelSum  
> parallelSum(500, 500)
[1] -80.99394
> 
> stopCluster(cluster)
[1] 1
> mpi.exit()
[1] "Detaching Rmpi. Rmpi cannot be used unless relaunching R."
> 
> 

PS:

Read file <error_file> for stderr output of this job.

[nroy@discovery4 test1]$

6) NAMD 2.9 with MPICH-3.0.4:

NAMD 2.9 with VMD 1.9.1 on Discovery cluster is for use with MPICH-3.0.4. To use NAMD with MPICH load the modules as shown below:

nroy@discovery4 ~]$ module list
No Modulefiles Currently Loaded.
nroy@discovery4 ~]$ module load gnu-4.4-compilers;module load gnu-4.8.1-compilers;module load fftw-3.3.3-single;module load fftw-3.3.3;module load mpich-3.0.4;module load NAMD-2.9_VMD-1.9.1
nroy@discovery4 ~]$
nroy@discovery4 ~]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) gnu-4.8.1-compilers   3) fftw-3.3.3-single     4) fftw-3.3.3            5) mpich-3.0.4           6) NAMD-2.9_VMD-1.9.1
nroy@discovery4 ~]$

Now “namd2″ will be in your path. To use “namd2″ with the appropriate files and mpirun use the LSF submit template for r-mpi runs above “r-rmpi_jobsubmit.bash”, modifying the queue name and the mpirun parameters and NAMD input files. In the mpirun line the “R” call will be replaced with “namd2″ and the input parameters and config files that follow will be changed accordingly. Based on what you want to do more details for these parameters and input files are on the NAMD documentation pages here.

For visualization and analysis of structures and other interactive GUI work using VMD you may use the login nodes. To start VMD on the login nodes issue “vmd”. A screen shot of this shown below (click the image for better resolution).

VMD_Discovery_Cluster

7) SAS 9.4 Interactive and Batch Jobs:

SAS 9.4 on Discovery Cluster can be run in two modes: Interactively using the SAS GUI by obtaining and then logging into an interactive node from LSF or by running batch jobs using LSF queues. DO NOT RUN SAS on any login nodes or compute nodes directly by bypassing LSF.

To run interactively first get an interactive node from lsf using the “bsub -IX” option. Note that when do this you will have to login to the node assigned to you manually. After you are done exit from the node and then type “bjobs” to get the interactive job id and use “bkill <job id>” to kill this job and release the resources. Make sure that the proper modules are loaded before requesting an interactive node from LSF. This is shown below. Click the image for better resolution. Also once a node is assigned press enter to return to the shell prompt after which you can login to the assigned interactive compute node.

SAS_Interactive_GUI

To run a batch job a LSF submit script like the type shown below should be used that will run the SAS script. The SAS script generally has a “.sas” extension that can be omitted in the LSF submit script. A typical LSF submit script template is shown below that is submitted using the “bsub < name-of-submit-script” command.

[nroy@discovery2 sas_test]$ module list
Currently Loaded Modulefiles:
  1) gnu-4.4-compilers     2) fftw-3.3.3            3) platform-mpi          4) gnu-4.8.1-compilers   5) sas-9.4
[nroy@discovery2 sas_test]$ cat bsubmit_sas.bash 
#!/bin/sh
#BSUB -J JOB.1-sas
#BSUB -o output_file
#BSUB -e error_file
#BSUB -n 1
#BSUB -q ser-par-10g
#BSUB -cwd /home/nroy/sas_test
######## THIS IS A TEMPLATE FILE FOR BATCH RUNS ON THE DISCOVERY CLUSTER ########
# prefix for next run is entered below
# file staging code is entered below
#### Enter your working directory below - this is the string returned from issuing the command "pwd" ####
#### IF you stage your files this is your run directory in the high ####
#### speed scratch space mounted across all compute nodes ####
work=/home/nroy/sas_test
#####################################################
########DO NOT EDIT ANYTHING BELOW THIS LINE#########
#####################################################
cd $work
#####################################################
########DO NOT EDIT ANYTHING ABOVE THIS LINE#########
#####################################################
###### The example below runs a SAS script test.sas using one compute core (-n 1 in BSUB above).
sas test
# no need to enter the sas extension in the file name for .sas script files
# any clean up tasks and file migration code is entered below
[nroy@discovery2 sas_test]$ cat test.sas
data grades;
input h1 h2;
cards;
48 66
72 75
61 70
88 79
75 91
92 93
77 90
58 69
63 68
;
filename file1 'h1.eps';
goptions reset=global DEVICE=pslepsfc gsfmode=replace gsfname=file1 hsize=4 vsize=3;
proc gplot;
plot h1*h2;
run;
filename file2 'h2.eps';
goptions reset=global DEVICE=pslepsfc gsfmode=replace gsfname=file2 hsize=4 vsize=3;
proc gplot2;
plot h2*h1;
run;
[nroy@discovery2 sas_test]$ bsub < bsubmit_sas.bash 
Job <104834> is submitted to queue <ser-par-10g>.
[nroy@discovery2 sas_test]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
104834  nroy    RUN   ser-par-10 discovery2  compute-0-0 JOB.1-sas  Feb 11 18:52
[nroy@discovery2 sas_test]$ bjobs
No unfinished job found
[nroy@discovery2 sas_test]$

The example run is “test.sas” and that file is shown above.

After the run you will see results in the work folder as shown below - the file “output_file” has the details of the run. The file test.log is the SAS log file. Any errors will be in the file “error_file”.

[nroy@discovery2 sas_test]$ ls -la
total 379
drwxrwxr-x  2 nroy nroy   191 Feb 11 18:52 .
drwx------ 86 nroy nroy  8244 Feb 11 18:51 ..
-rw-rw-r--  1 nroy nroy  1135 Feb 11 18:15 bsubmit_sas.bash
-rw-rw-r--  1 nroy nroy     0 Feb 11 18:52 error_file
-rw-rw-r--  1 nroy nroy 17408 Feb 11 18:52 h1.eps
-rw-rw-r--  1 nroy nroy 17449 Feb 11 18:52 h2.eps
-rw-rw-r--  1 nroy nroy  2309 Feb 11 18:52 output_file
-rw-rw-r--  1 nroy nroy  3445 Feb 11 18:52 test.log
-rw-rw-r--  1 nroy nroy   369 Feb 11 18:51 test.sas
[nroy@discovery2 sas_test]$ cat output_file 
Sender: LSF System <lsfadmin@compute-0-043>
Subject: Job 104834:  in cluster <mghpcc_cluster1> Exited

Job  was submitted from host  by user  in cluster <mghpcc_cluster1>.
Job was executed on host(s) , in queue , as user  in cluster <mghpcc_cluster1>.
 was used as the home directory.
 was used as the working directory.
Started at Tue Feb 11 18:52:03 2014
Results reported at Tue Feb 11 18:52:06 2014

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/sh
#BSUB -J JOB.1-sas
#BSUB -o output_file
#BSUB -e error_file
#BSUB -n 1
#BSUB -q ser-par-10g
#BSUB -cwd /home/nroy/sas_test
######## THIS IS A TEMPLATE FILE FOR BATCH RUNS ON THE DISCOVERY CLUSTER ########
# prefix for next run is entered below
# file staging code is entered below
#### Enter your working directory below - this is the string returned from issuing the command "pwd" ####
#### IF you stage your files this is your run directory in the high ####
#### speed scratch space mounted across all compute nodes ####
work=/home/nroy/sas_test
#####################################################
########DO NOT EDIT ANYTHING BELOW THIS LINE#########
#####################################################
cd $work
#####################################################
########DO NOT EDIT ANYTHING ABOVE THIS LINE#########
#####################################################
###### The example below runs a SAS script test.sas using one compute core (-n 1 in BSUB above).
sas test
# no need to enter the sas extension in the file name for .sas script files
# any clean up tasks and file migration code is entered below

------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :               0.48 sec.
    Max Memory :             8 MB
    Average Memory :         8.00 MB
    Total Requested Memory : -
    Delta Memory :           -
    (Delta: the difference between total requested memory and actual max usage.)
    Max Swap :               37 MB

    Max Processes :          1
    Max Threads :            1

The output (if any) follows:

PS:

Read file <error_file> for stderr output of this job.

[nroy@discovery2 sas_test]$

Further examples and full documentation is available here.

8) Gaussian 09 Interactive and Batch Jobs and GaussView5:
Gaussian 09 on Discovery Cluster can be run in two modes: Interactively by obtaining and then logging into an interactive node from LSF or by running batch jobs using LSF queues. DO NOT RUN Gaussian 09 on any login nodes or compute nodes directly by bypassing LSF.

First set up your .bashrc with the correct “module load ” and “source” directives. “module whatis gaussian-09″ will show you the required modules and source command to use. After you do this logout and login again and type “module list” to make sure the modules are loaded and in the correct order. You can now run Gaussian-09. Make sure you have emailed “researchcomputing@neu.edu” to get added to the “Gaussian” group on the Discovery Cluster before you start configuring your .bashrc and using Gaussian.

For obtaining an interactive node refer to the procedure here. You can now run from command line using “g09″.

For running in batch mode you will use a LSF submit script that you will submit using “bsub < your_submit_script". The submit script at a minimum should have the following lines as shown below for an example called "NKR_job19.job" that uses an input file to "g09" called NRK_19.com". The contents of this input file is also shown below:

gauss09_ex

Please note that in the example above the LSF submit script requested one core. To specify memory requirements more exactly per core use the #BSUB -R “rusage[mem=____]” directive.

To run GaussView5 you must obtain an interactive node using the “bsub -IX” option as shown below (click image below for better resolution). Login to this node and run “gview”. Make sure correct modules are loaded in the order indicated and the proper “source” command is also in your .bashrc. As with gaussian-09 you must be added to the gaussian users group on Discovery Cluster.

gaussview5_example

Once you exit gview logout from the interactive node assigned to you by LSF, and execute “bkill ” to kill the interactive job that you used to run gview.

9) Submitting and Running Jobs on the GPU Queue “par-gpu”:
Jobs (Interactive or Batch) should be run on this queue only if using CUDA. This queue is not for jobs that do not make use of the GPU present in these nodes. The entire NVIDIA GPU computing SDK and CUDA Toolkit is available on the Discovery Cluster. The interactive and batch jobs templates are the same as that for the 10Gb/s TCP-IP backplane above. To use the CUDA SDK and Toolkit load the correct module via your .bashrc. Type “module whatis cuda-5.5″ or “module whatis cuda-6.0″ to see usage instructions.

A typical interactive run using two cores from a node in the “par-gpu” queue is shown below. Here we run a “deviceQuery” and a “bandwidthTest” on the GPU on that node using code compiled with NVIDIA’s CUDA compiler.

par_gpu_ineteractive_cuda_run

Further examples are available here for CUDA. I have a copy of my notes here on CUDA / OPENCL – MPI that may be helpful to Discovery Cluster users. This gives among other things examples of using CUDA SDK 6 or OPENCL to run hybrid GPGPU / MPI code on the Discovery Cluster. Contact me directly if you have any questions at n.roy@neu.edu.

10) Submitting and Running Stata (version 13) Jobs

Type “module whatis stata-13″ to see usage details for Stata version 13 on Discovery Cluster. Typically Stata can be run interactively or in batch mode. An example of an interactive Stata session is shown below. Note that LSF is used to get a node, then the node is logged into with either the -X or -Y option for X11 forwarding and Stata is launched. (Click any image below for better resolution. Use the back button of your browser to go back.)

stata-13-interactive

After use exit from Stata and exit as shown below from the node and use bkill to kill the interactive job.

stata-13-interactive-exit

If you want to run Stata in batch mode you will need a LSF submit script similar to that shown below. The #BSUB -n directive will have the number of cores you request. You will also need an appropriate *.do Stata “do” file in your work directory. “stata -b” directive tells Stata to run in batch mode. Finally submit the LSF submit script using “bsub < sumbit.bash" where in this case as shown below "submit.bash" is the name of the LSF sumbit script.

stata-13-batch-job

11) Submitting and Running Ansys Fluent (version 14) Jobs
The only queue that this is configured for is “ser-par-10g-2” on discovery cluster.
DO NOT RUN THIS ON THE LOGIN NODES. IT WILL NOT RUN. These login nodes are discovery2 and discovery4. Running on login nodes is also against usage policy.
The ser-par-10g-2 nodes are from compute-0-064 to compute-0-095 both nodes included and each node has 40 cores and 128GB RAM.
The best practice is to ask for all 40 cores when running fluent on an interactive node. You can check out a node for 24 hours. If you want more than 24 hours checkpoint your job and then get an interactive node again. Note in the example below since one asked for 40 cores below the node is now closed.
Parallel settings must all be “default” selection in GUI for interconnect type and MPI type.
(Click image below for better resolution and back button of browser to go back.)
ansys-fluent-14
For batch jobs you will need to use a LSF submit that uses the “ser-par-10g-2″ queue that will be similar to that described here. Note the “ser-par-10g-2″ queue has a 10Gb/s TCP-IP backplane only and no IB. Parallel shared memory and distributed runs for Ansys are further described in more detail here.

12) Submitting and Running Mathematica (version 9) Jobs

To use Mathematica on the Discovery Cluster the following modules are need as shown in the .bashrc below. This is one of the three methods to access the modules as described here. After the modules are loaded check if licenses are free by running “check_mathematica_licenses_avail.sh” as shown below. If there are not enough licenses free the runs using interactive or batch mode will fail. This is also shown below. Click image below for better resolution.
mathematica-1

To run Mathematica interactively on the Discovery Cluster obtain an interactive node from LSF with X11 forwarding. This is shown below where the “-IX” option is used to bsub. DO NOT RUN Mathematica on the LOGIN NODES. This is against the usage policy. Once you get a login node with the cores requested login to it. When done with the interactive run, log out of the node assigned to you for LSF and use “bkill” to kill the interactive job. This is shown below. Remember to login to the interactive node assigned to you with either the “-X” or “-Y” option. -n below is the number of cores you need. One can use either “ht-10g” queue that has fewer and slower nodes or “ser-par-10g-2″ queue that has many more and faster nodes. Please note that in the later queue there is a default wait time before any job starts of 60s. If you want to run Mathematica without GUI change the “-IX” option to “-Is” and drop the “&” at the end. Now when you submit you will automatically logged in the node and when you exit the interactively job will be killed automatically. The former case is more involved as by default LSF has X-11 disabled unless explicitly specified as in the former interactive case using “-IX” option.
mathematica-2
mathematica-3

For running Mathematica in BATCH mode using the queues on Discovery cluster you will need a submit script like “submit.bash” that is shown below. This script is then submitted using “bsub < submit.bash". Again before submitting this run the "check_mathematica_licenses_avail.sh" command to make sure there are enough licenses to run your script. If LSF submits your script and when it goes from PEND to RUN state if enough licenses are not available your job will terminate.
mathematica-4
mathematica-5
mathematica-6

Advanced scripts using “LaunchKernels” and “LaunchSlaves” on slave nodes within LSF are also possible. This is beyond the scope here and users should contact researchcomputing@neu.edu for using this and other features of Mathematica with LSF.

13 Submitting and Running Serial Jobs using a Compute Nodes Local Storage Space

If you have a serial job that does not need more than one compute node and hence no requirement for shared storage you can run it using the local storage disk on the compute node.
Typically every compute node has around 400-500GB of space in /tmp folder that is on its local disk. This is much faster and it is advised to stage your files here before running your job. Once done delete the folder and files after copying it over to your /home or /scratch space. This way you save on using the network for file I/O but use the compute node exclusively.
Please note that you should do this only if your job will run within the cores on a single compute node.
A typical script that does this is shown below, and explained further. Click image below for better resolution.

serial_job_local_disk

The script checks to see that only one node is being issued by LSF and that the job is constrained on one compute node only. If this is not the case it will generate a “error_file.error” in your /home directory that you have specified in the script. /tmp is the local storage space on each compute node and this folder is local to that compute node. Once the job runs successfully it copies everything over to your /home or /scratch location. Similarly before the job is run the files are staged to the /tmp location. This is configurable by the user as indicated in the script comments above.

This will reduce the network overhead of reading files from network storage locations and serial jobs that do large I/O will see a significant speed up.

Please note /tmp is around 400-500GB on each compute node and be considerate of other users. If you modify the script make sure you delete all files staged to /tmp. If /tmp fills on any compute node other users will have their jobs crash and the node will be unusable. The compute node then is rebooted as access to the node is lost.

Please contact researchcomputing@neu.edu if you need further help, or have issues.

Platform LSF command reference can be obtained from here – for more information on the various LSF directives and commands that can be used in the submit scripts and from the command line.

If you have any questions or difficulties, require specialized assistance with respect to LSF, platform MPI, mpich, software including new software or require training or clarifications contact us.

Back To The Top