Queen Bee Users Guide
Table of Contents
- Logon to Queen Bee via GSISSH using TeraGrid certificates
- Setting up your environment for TeraGrid users.
- Logon to Queen Bee via SSH for LONI users
- Setting up your environment for LONI users
- Creating home and work directories
- Changing your password and shell
- Compiling on Queen Bee
- Running on Queen Bee
- Commands for Monitoring
- Queue limits and descriptions
- Job queuing priority
- Using NCSA Archival Storage
Logon to Queen Bee via GSISSH using TeraGrid certificates
For TeraGrid users, you can access Queen Bee system at login1-qb.loni-lsu.teragrid.org or queenbee.loni-lsu.teragrid.org by using your TeraGrid Certs via gsissh. On any TeraGrid resource or a non-TeraGrid resource that supports myproxy and gsissh with a fairly recent Globus (4.0.1 and later) installation, you can run:
$ myproxy-logon -l-s myproxy.teragrid.org Enter MyProxy pass phrase:
Note: Please replace
After your credential has been received, you can execute the following to logon to Queen Bee:
$ gsissh login1-qb.loni-lsu.teragrid.org
SSH Keys
SSH is used to enable communication with the various nodes, and thus must function correctly to enable parallel process. For this reason you should be careful about modifying anything in your ~/.ssh directory. If you ever need to reset your ssh keys, use the following commands (Note: accept the default file, answer "y" to Overwrite and use an empty passphase):$ cd ~/.ssh $ ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/home/username/.ssh/id_dsa): /home/username/.ssh/id_dsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/username/.ssh/id_dsa. Your public key has been saved in /home/username/.ssh/id_dsa.pub. The key fingerprint is: 55:66:77:cc:aa:66:bb:aa:33:ff:bb:bb:33:66:88:44 username@qb1.loni.org
Return to top of page
Setting up your environment for TeraGrid users.
Queen Bee is one of the TeraGrid systems that uses Softenv as the default method for setting up user environments. TeraGrid users may active the modules method, provided as part of the TeraGrid Common User Environment, by placing a file named .nosoft in their home directories. On the next login, modules will be activated. Either modules or softenv will create a default environment.
By default, the TeraGrid .soft file contains:
# TeraGrid wide basic software suite @teragrid-basic # TeraGrid wide Globus 4 and Grid software suite @globus-4.0 # Platform recommended development software suite @teragrid-dev
If you edit this file, you can update your environment using
$ resoft
For those choosing to use modules, the TeraGrid .modules file contains:
module load default-teragrid # These lines enable elements of the TeraGrid Common User Environment. # Uncomment to activate. #module load cue-tg #module load cue-login-env #module load cue-build #module load cue-comm #module load cue-math
TeraGrid users are allowed to access Queen Bee via gsissh using the TeraGrid certificates only thus no password is needed for login. However, if you need a password to login Queen Bee using SSH, please request one using the password reset form found at https://allocations.loni.org/user_reset.php.
Return to top of page
Logon to Queen Bee via SSH
Queen Bee has two head nodes, qb1.loni.org and qb2.loni.org. You can login to one of them by connecting via ssh to any of the two. If you are a Windows user, you can find a good ssh client at here.
It is essential that your SSH keys be set up properly, or it will be impossible to run parallel jobs. If you cannot freely ssh between Queen Bee nodes, you will not be able to get your parallel program to run thus need to reset your ssh key by using the following commands.
$ cd ~/.ssh $ ssh-keygen -t dsa $ cp -p id_dsa.pub authorized_keys
Queen Bee has 2 head nodes (qb1 and qb2) and 668 compute nodes (qb001 to qb668). You will compile your code on a headnode, and execute it on one or more compute nodes. The remainder of this tutorial will guide you through an example of executing a parallel job on the compute nodes.
Return to top of page
Setting up your environment for LONI users.
First you have to set up your environment. You must decide which packages you want from the big list. Take note of the magic strings under the "softenv" column. In this case the magic strings we want are
- +intel-fc-11.1
- +intel-cc-11.1
- +mvapich-1.1-intel-11.1
Note that the suffix "intel-11.1" on the mvapich package name indicates that this copy of mvapich was compiled with the 11.1 compilers from Intel.
Next you need to add the appropriate variables to your environment. You can do this by using softenv. You just need to add these magic strings to your .soft file under your home directory (${HOME}/.soft) and then reset your environment by using command resoft.
[user_name@qb ~]$ vi ${HOME}/.soft
@default
+intel-cc-11.1
+intel-fc-11.1
+mvapich-1.1-intel-11.1
[user_name@qb ~]$ resoft
Return to top of page
Creating home and work directories
Your home directory (/home/your_username) is automatically created when you login to Queen Bee the first time. Queen Bee has its own /home disk with quotas enabled at 5 GB. Please do not use the /home volume for batch job I/O, use the /work volume instead.
Your directory on /work volume (/work/your_username) will be automatically created within an hour after you login first time.
Please limit the number of files per directory to 10,000. No disk quotas are currently in effect for the /work volume, but all files will be purged after 30 days. Should disk space become critically low, files may purged sooner. Please do not try to circumvent the removal process. This may lead to restrictions on your access to the /work volume. If you need large storage, please contact us at sys-help@loni.org and a project based storage will be added for you per your request.
TeraGrid users can transfer files to NCSA'a archival storage for long term retention. Please refer to the Using NCSA Archival Storage session at the bottom of this guide.
Return to top of page
Changing your password and shell
You can change or reset your password for Queen Bee on-line at https://allocations.loni.org/user_reset.php (the form requires users to enter the email address associated with their account).You can change your shell online at https://allocations.loni.org/profile.php also.
Return to top of page
Compiling on Queen Bee
So, you've managed to login and set up your environment on Queen Bee. You've done whatever tweaking you like to do on any Linux machine you've worked on in the past and you've got your environment set up to point to the Intel compiler, MPI packages. What now?
Let's assume we have a fortran or C/C++ MPI program that we wish to compile and run under MPI. There are several flavors (versions) of MPI available on Queen Bee, and using the MPICH_HOME variable in your make files will make it easier for you to switch flavors if you need to.
For TeraGrid users, the MPI is MVAPICH2 version 1.4 compiled using Intel compilers 11.1 (softenv keyword +mvapich2-1.4-intel11.1) by default. MVAPICH2 is an implementation of mpich2 to make efficient usage of the infiniband network, developed at Ohio State University.
You can verify if you have set this up correctly by checking whether corresponding mpif90 and mpirun are in your path:
$ which mpif90 /usr/local/packages/mvapich2/1.4/intel-11.1/bin/mpif90 $ which mpirun /usr/local/packages/mvapich2/1.4/intel-11.1/bin/mpirun
After the correct environment is set, you can compile your program using the following steps:
$ mpicc test.c -O3 -o a.out (c code) or $ mpif90 test.F -O3 -o a.out (fortran code)
Return to top of page
Running on Queen Bee
To run a parallel job on Queen Bee you will want to submit to the batch queue. Our queuing system is Torque Portable Batch System (PBS) which is the professional workload management system from Cluster Resources and Moab which works as job scheduler from Cluster Resources also. The command that you use to submit your job is "qsub".
The following PBS script shows an example of running a mvapich2 job, which is submitted using this command:
[user_name@qb ~]$ qsub test.qsub
The contents of test.qsub are as follows:
#!/bin/bash
#PBS -q workq
# the queue to be used.
#
#PBS -A allocation_code
# specify your project allocation code (ie. TG-ABC123456)
#
#PBS -l nodes=4:ppn=8
# number of nodes and number of processors on each node to be used.
# Do NOT use ppn = 1. Note that there are 8 processors on each Queen Bee node.
#
#PBS -l walltime=20:00:00
# requested Wall-clock time. SUs will be #hours x #cores
#
#PBS -o myoutput2
# name of the standard out file to be "output-file".
#
#PBS -j oe
# standard error output merge to the standard output file.
#
#PBS -N s_type
# name of the job (that will appear on executing the qstat command).
#
# Following are non PBS commands. PLEASE ADOPT THE SAME EXECUTION SCHEME
# i.e. execute the job by copying the necessary files from your home directpory
# to the scratch space, execute in the scratch space, and copy back
# the necessary files to your home directory.
#
export WORK_DIR=/work/$USER/your_code_directory
cd $WORK_DIR
# changing to your working directory (we recommend you to use work volume for batch job run)
#
export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
#
date
#timing the time job starts
#
mpirun_rsh -np $NPROCS -hostfile $PBS_NODEFILE $WORK_DIR/your_executable
date
# timing your job
For TeraGrid users, if you prefer to use MVAPICH rather than MVAPICH2, you can add the following line to the .soft file under your home directory. For LONI users or TeraGrid users (but not for setting up your TeraGrid environment), the default MPI is MVAPICH version 1.1 built using Intel compilers 11.1.
+mvapich-1.1-intel-11.1
Or alternatively, you can set the following environmental variables to achieve the same effect by adding the following to your .bashrc file under your home directory and running .bashrc script again:
export MPICH_HOME=/usr/local/packages/mvapich/1.1/intel-11.1 export LD_LIBRARY_PATH=$MPICH_HOME/lib:$LD_LIBRARY_PATH export PATH=$MPICH_HOME/bin:$PATH
You can verify if you have this setup correctly by checking whether corresponding mpif90 and mpirun are in your path:
$ which mpif90
/usr/local/packages/mvapich/1.1/intel-11.1/bin/mpif90
$ which mpirun
/usr/local/packages/mvapich/1.1/intel-11.1/bin/mpirun
After the correct environment is set, you can compile your program using the following command:
$ mpicc test.c -O3 -o a.out $ mpif90 test.F -O3 -o a.out
To run your application built using MVAPICH, you don't need to run the mpd daemon. If you want to run your program interactively on 16 processors, you need to first send an interactive job request to the PBS:
$ qsub -I -l nodes=2:ppn=8 -l walltime=00:30:00 -l cput=00:30:00
When your job request is granted, enter the directory under which your parallel executable is, then launch:
$ mpirun -np 16 myexecutable
The following is a sample PBS script to send your mvapich application to the PBS queue:
#!/bin/bash
#PBS -q checkpt
# the queue to be used.
#PBS -M your_mail_address@somehost.edu
# your notification email address
#PBS -A your_TG_ALLOCATION
# the project allocation
#
#PBS -l nodes=16:ppn=8
#
# number of nodes and number of processors on each node to be used.
# Do NOT use ppn = 1 except for serail job submitting to single queue.
#
#PBS -l cput=01:00:00
# requested CPU time.
#
#PBS -l walltime=01:00:00
# requested Wall-clock time.
#
#PBS -V
#
#PBS -o stdout
#PBS -e stdout
# name of the standard out file to be "output-file".
#
#PBS -j oe
# standard error output merge to the standard output file.
#
#PBS -N pbs-test
# name of the job (that will appear on executing the qstat command) to be "syschk".
#
# Following are non PBS commands. PLEASE ADOPT THE SAME EXECUTION SCHEME
# i.e. execute the job by copying the necessary files from your home directpory
# to the scratch space, execute in the scratch space, and copy back
# the necessary files to your home directory.
#
export WORK_DIR=/work/$USER/your_working_directory
export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
# REQUIRED for PBS to work.
# copies necessary files from home directory to scratch space.
cd $WORK_DIR
# changing the working directory to the scratch space
mpirun -machinefile $PBS_NODEFILE -np $NPROCS $WORK_DIR/test
# executes the executable.
So now you've successfully submitted your job to the queue -- but is it actually running? And if it does run, how can you analyze how well it did?
Return to top of page
Commands for monitoring
- qstat: this will show you the status of your job and the jobs of others in the queue. It can show you various other bits of information about your job as well, such as the number of nodes it intends to use, the name of the queue it's in, etc.
- mshow: this command displays various diagnostic messages about the system and job queues. It lists all the jobs in the queue, first those that are running, then those that are queued in the order that they will be run.
- showq: this command displays jobs info within the batch system.
- showstart: this command gives an estimated starting time for your job.
- qdel: this command deletes a PBS job in the queue.
There are some systems tools written by our systems staff available
- qshow: this command shows the load on each compute node that your job is using. It shows and optionally kills user processes on remote nodes or execute commands.
- qfree: this commands shows how the nodes in a cluster are allocated and shows system usage.
More detailed information on the Torque PBS commands and Moab to schedule and monitor jobs can be found at Cluster Resources on-line Documentations.
Return to top of page
Queue limits and descriptions
There are currently four queues on Queen Bee: workq, checkpt, preempt, and priority.
| workq | default, for non-preemptable parallel jobs | |||
| checkpt | preemptable, for parallel jobs that can be checkpointed | |||
| preempt | | for urgent parallel jobs that will preempt jobs in checkpt queue, requires special permission | | ||
| priority | for on-demand parallel jobs that will have higher priority, requires special permission |
At any given time, users may run up to 8 jobs at once, consuming a maximum of 384 total nodes. Additional user limits may be enforced as well. Please contact us at sys-help@loni.org for more information on user limits or special requests.
Return to top of page
Job queuing priority
The queuing system schedules jobs based on the job priority which takes into account several factors. Jobs with a higher job priority are scheduled ahead of jobs with a lower priority. Also it has a backfill capability when scheduling jobs that are short in duration or require a small number of nodes. That is the scheduler schedules small jobs while waiting for the start time of any large job requiring many nodes.
In determining which jobs to run first, Moab is using the following formula to calculate the Job priority:
Job priority = credential priority + fairshare priority + resource priority + service priority
(1) Credential Priority Subcomponent:
credential priority = credweight * (userweight * job.user.priority) credential priority = 100 * (10 * 100) = 100000 ( a constant )
(2) Fairshare Priority Subcomponent:
fairshare priority = fsweight * min (fscap, (fsuserweight * DeltaUserFSUsage)) fairshare priority = 100 * (10 * DeltaUserFSUsage)
A user's fair share usage is the sum of seven days of used daily processor seconds times daily decay factor divided by the sum of seven days of daily total processor seconds used times the daily decay factor. The decay factor is 0.9. DeltaUserFSUsage is the fair share target percent for each user (20 percent) minus the the calculated fair share usage percent.
In other words the target percentage minus the actual used percentage. For a user who has not used the cluster for a week:
fairshare priority = 100 * (10 * 20) = 20000
(3) Resource Priority Subcomponent
resource priority = resweight * min (rescap, (procweight * TotalProcessorsRequested) resource priority = 30 * min (26720, (10 * TotalProcessorsRequested)
For a 32 processor job: resource priority = 30 * 10 * 32 = 9600
(4) Service Priority Subcomponent
service priority = serviceweight * (queuetimeweight * QUEUETIME + xfactorweight * XFACTOR ) service priority = 2 * (2 * QUEUETIME + 20 * XFACTOR)
QUEUETIME is the time the job has been queued in minutes.
XFACTOR = 1 + QUEUETIME / WALLTIMELIMIT
For a one hour job in the queue for one day: service priority = 2 * (2 * 1440 + 20 * (1 + 1440 / 60 ) ) service priority = 2 * (2880 + 500 ) = 6760
These factors are adjusted as needed to make jobs of all sizes start fairly.
Return to top of page
Using NCSA Archival Storage
TeraGrid users can transfer files to NCSA's archival storage for longterm retention. They can use globus-url-copy and/or uberftp to transfer files between QuenBee and NCSA's archival storage.
Obtaining a Proxy
Before you can transfer files, you must create a temporary credential called a certificate proxy. Here is a more detailed explanation of the process.
To obtain the proxy, you will use myproxy. The general form of the command is:
myproxy-logon -l <username> -s myproxy.teragrid.org
You should use your TeraGrid User Portal username and password when requesting your proxy. If your username was abc, the following command would be used to obtain the proxy:
myproxy-logon -l abc -s myproxy.teragrid.org
You can use the grid-proxy-info command to verify that you have a valid proxy.
# grid-proxy-info subject : /C=US/O=National Center for Supercomputing Applications/CN=Allen Carlilse issuer : /C=US/O=National Center for Supercomputing Applications/OU=Certificate Authorities/CN=MyProxy identity : /C=US/O=National Center for Supercomputing Applications/CN=Allen Carlilse type : end entity credential strength : 512 bits path : /tmp/x509up_u1217 timeleft : 11:47:45
Using globus-url-copy
globus-url-copy can be used to copy a file between QueenBee and NCSA's Archival storage. Here is TeraGrid'd documentation for globus-url-copy. The general form of the command is:
globus-url-copy [-dbg] file:///absolute/path/tofile gsiftp://lsumss.ncsa.uiuc.edu/~/destination-name
Here is an example:
globus-url-copy -dbg file:///home/abc/test4.txt gsiftp://lsumss.ncsa.uiuc.edu/~/test4.txt
Note that complete absolute paths must be used.
Using uberftp
When transfering more than one file, you may find it preferable to use uberftp instead of globus-url-copy. Here is TeraGrid's documentation on uberftp.
uberftp operates very much like ftp. You connect and transfer files in the same manner. Here is an example session:
# uberftp lsumss.ncsa.uiuc.edu %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UNIX Archive FTP server (DiskXtender Version 2.9.1) active. Checking DiskXtender.conf 220 UNIX Archive FTP server ready. 230 User abc logged in. uberftp> dir drwx------ 2 abc ac DK common 1024 Dec 12 2005 .trash -rw------- 1 abc ac DK common 3145728 Mar 6 13:56 test3.txt -rw------- 1 abc ac DK common 3145728 Mar 6 14:32 test4.txt uberftp> get test3.txt test3.txt: 3145728 bytes in 2.09 seconds. 1501.90 KB/sec uberftp> quit 221 Goodbye. kthxbye
Sample Archival Storage Session
A typical session will include a proxy request followed by file transfers. Here is a typical session where the user abc will transfer a file to archival storage and then retrieve it.
Get a certificate proxy
[abc@qb1 ~]$ myproxy-logon -l abc -s myproxy.teragrid.org Enter MyProxy pass phrase: A credential has been received for user abc in /tmp/x509up_u1217.
Verify proxy cert exists and is still valid
[abc@qb1 ~]$ grid-proxy-info subject : /C=US/O=National Center for Supercomputing Applications/CN=Allen Carlilse issuer : /C=US/O=National Center for Supercomputing Applications/OU=Certificate Authorities/CN=MyProxy identity : /C=US/O=National Center for Supercomputing Applications/CN=Allen Carlilse type : end entity credential strength : 512 bits path : /tmp/x509up_u1217 timeleft : 11:47:45
Note that the timeleft field at the bottom must be greater than 0 for the cert to be useful. If it is 0, simply request your proxy again.
Copy a file to archival storage
[abc@qb1 ~]$ globus-url-copy file:///home/abc/test4.txt gsiftp://lsumss.ncsa.uiuc.edu/~/test4.txt
If you have problems, add the -dbg (debug) option to provide feedback during the process. The output may help you determine what went wrong.
At this point a user would logically delete the files from QueenBee (since they are safely archived at NCSA).
Retrieve a file from archival storage
[abc@qb1 ~]$ globus-url-copy gsiftp://lsumss.ncsa.uiuc.edu/~/test1.txt file:///home/abc/test1.txt
Return to top of page