UK Logo High Performance Computing November 23, 2009
Home News System Status User Policies Account Info Hardware Software Documentation FAQs Search
UKy HPC Frequently-Asked Questions

Frequently-Asked Questions related to:

For questions about this faq, please contact: help-hpc@uky.edu


FAQ Revised: Thursday 02 September 2004 13:37:44

Table of Contents

1. LSF (Batch System)

1. LSF (Batch System)

1.1. What is LSF ?

LSF (Load Sharing Facility) is a layer of software services on top of UNIX and Windows NT operating systems. LSF creates a single system image on a network of heterogeneous computers such that the whole network of computing resources can be effectively utilized and easily managed.

LSF uses the LSF Batch system to select the most suitable hosts, submit, and interact with individual tasks of parallel batch jobs. When you run a job under LSF, the batch job is submitted to a queue using the bsub command and the LSF Batch system then attends to all of the details associated with the running the job. Under LSF, a parallel batch job is submitted to a queue, where it waits until it reaches the front of the queue and the appropriate host resources become available. At that time the batch job is dispatched to the most suitable host(s) for execution. Consequently, under LSF, a batch job may not run immediately after being submitted but, instead, may be delayed if host resources are not available at the time that the job is submitted.

1.2. What is the advantage of submitting a job under LSF ?

The main advantage is that users do not have to specify which node(s) that their job is to be run on. By using LSF, the software itself will select the least loaded node(s) appropriate for the queue that the job was submitted to. Since it is not likely that a user will know the identity of the least loaded node(s) for any given queue, failure to submit your job as a batch job may result in your job being run on one or more nodes that are already heavily loaded. Moreover, since the load on any individual node can change dramatically between the time that a job is submitted and the time that it reaches the front of a queue, there is little chance that a user will have selected the most appropriate host resources on which to run a job.

The capability of LSF to restrict jobs to using only nodes that are most appropriate for a given queue is also an advantage from the administrative side since it allows for much better accounting and tracking of jobs.

1.3. What type(s) of jobs must be submitted to the HP Superdome cluster under LSF ?

The batch system (LSF) must be used whenever possible. Non-batch jobs on any node except the login node will be killed, unless special permission has been obtained in advance from a system administrator (help-hpc@uky.edu). The login node is a special case; interactive jobs up to 120 cpu-minutes long may be run in the interactive partition on the login node. This is intended for editing files, compiling jobs, short test runs, and similar activities. If you find your jobs run past the above limit, please investigate whether you can run the job as a batch job. There are examples here .

1.4. How do I submit a non-MPI job under LSF ?

For details on the commands for submitting an non-MPI job for batch execution on the HP Superdomes, see LSF Examples

1.5. How do I submit an MPI job under LSF ?

All jobs are submitted to the LSF Batch system using the bsub command; bsub submits a job for batch execution and assigns it a unique numerical job ID. However, the details of this command depend on whether your job is to be run on a single node (single host) or on multiple nodes (multiple hosts) within the cluster. If the total number of processors on which your MPI job requests is 64 or less then your job should be run on a single node (single host).

For details on the commands for submitting an MPI job for batch execution on the cluster, see:



1.6. What are the various queues for?

LSF queues are defined with particular sorts of jobs in mind. This is done to try to keep jobs with differing requirements from interfering with one another - Gaussian jobs and highly parallel MPI jobs run on the same node can slow each other down significantly, for example. There are several sorts of queue:

  • serial - The serial queue is intended for jobs that haven't been parallelized or cannot be run in parallel.
  • Gaussian - These queues (gauss, gaussshort) are used for Gaussian jobs, whether or not the jobs are parallel. In fact, most highly parallel Gaussian jobs contain long non-parallel sections, due to the algorithims used; this is the major reason running them along side MPI jobs works poorly.
  • Parallel queues - These queues (para, para64, parax, parashort) should be reserved for parallel jobs. Do not run serial (non-parallel) jobs in these queues; doing so can cause problems for the parallel jobs.
  • Short queues - These queues (parashort, gaussshort) are intended for testing, or for short jobs that are needed fairly quickly; short defined as something on the order of 8 hours of wall-clock time, but may be subject to change.
  • Background queues - These queues (gauss_bg, para_bg, etc.) can be used by users who have run thru their allocation and are prevented from using other LSF queues. The jobs in these queues run at low priority and will not be started unless the host has light load.


For more information on individual queues, look at the LSF queue table. If you are logged in on the system, the queue names and some status information is can be seen by using the bqueues command.


Please do not run jobs in queues that aren't appropriate for them; if you have questions, or need to do something that you can't do under the queue system, please send email to help-hpc@uky.edu describing your problem or question.

1.7. Are there any limits on how many jobs I can run at once?

Job queues have been established to assure equitable distribution and access to the entire complex. Each userid is limited to [N] LSF job-slots on the complex at a time. Projects (groups) are typically limited to (N + N/2) job-slots on the cluster at a time. A reasonable default for [N] has been chosen, but may be adjusted up or down as needed. See the following for more info.

Note: When a userid belongs to more than one project, the job-slots belonging to that userid will count against the limit of each project that userid could be running under.

Depending on what combination of queues you wish to use, you may not be able to run the full [N] job-slot limit at once; this will depend on the limits of the individual queues (more about this below).

A serial job uses 1 job-slot; an N-way parallel job uses N job slots. For example, a userid might run two 32-way jobs or 8 * 8-way jobs at a time.

If a user specifies a set number of job-slots when submitting a job, the higher of the specification or the actual usage will count towards the limit. When the resources on the complex are underutilized, these restrictions may be relaxed in various ways to optimize throughput. Some of the very large queues may require special permission; contact help-hpc@uky.edu if you have a question.

In addition to the overall job-slot limits, each individual queue may have limits on the number of job-slots a user may have running at one time, and may also limit how many job-slots a single job can have - for example, the Gaussian queues do not allow jobs bigger than sixteen slots. For example, the per-user job-slot limit on each Gaussian queue may allow users to run multiple 8-slot jobs in a queue. The overall limit controls the total number of job-slots a user may have running at one time on the whole system; possibly multiple jobs in multiple queues.

For information on individual queue limits, look at the LSF queue table.

Jobs in excess of a job-slot limit will stay in the PENDING state until enough other jobs finish to let them run.

1.8. How many LSF job-slots am I permitted to use?

To see the currently configured upper limit for your LSF job slots, run the LSF busers command. You will see some output similar to this:
USER/GROUP          JL/P    MAX  NJOBS   PEND    RUN  SSUSP  USUSP    RSV
The number in the MAX field will be your maximum number of job slots. You can also do this specifying your group/project - see man busers for more info.

Note: To find your project/group ID, run the usage command.

Note: These limits may be adjusted up or down by the admins depending on the availability of resources.

See this additional item for more info.

1.9. How to I check on the status of my job?

The bjobs command shows the status of LSF Batch jobs. Without options, bjobs shows only jobs belonging to the userid the command is run from. The most useful options are:

  • -a   shows jobs completed in the recent past.
  • -p   shows why a job's status is PEND (hasn't started to run yet) or SUSP (paused after starting).
  • -u userid   where userid is either another userid or the keyword all; shows jobs belonging to userid or all the jobs in the LSF system if all is used.


1.10. Why is my job status PENDing for so long?

It is the task of the batch system software (eg LSF ) to manage the flow of jobs and allow jobs to run when the system resources are available to do so in a fair and orderly fashion.

For example, when the system is less busy you may submit jobs that are able to schedule and execute quite quickly. When the system is busier, your jobs could potentially wait for longer periods. This interval will vary based on several factors.

If your job has been in PEND status for longer than you expect, you should expend the effort to find out why before submitting a trouble report. There are various system commands to get this information and more often than not there is no system problem and the job is just waiting on the requested resources; ie the system is just "busy".

To see why your job status is in PEND status, add the "-p" option to your LSF bjobs command.

  • If LSF indicates "Job slot limit reached" this means there are currently not enough processor slots (job-slots) available to fulfill your request. This can be for more than one reason; the host may not have the slots available or you may have exceeded the number of slots allocated to you (or your group).

    The LSF command bhosts will show the slots in use for each host and the various states: RUN,PEND,SUSPEND, etc. The LSF command busers will show the slots you are allocated. You may need to check at the user level and the group/project level. See the following for more info.

  • If bjobs -p reports something like "The CPU utilization (ut) is beyond threshold: 1 host;" this means that the host resource LSF would like to use to run your job is too busy right now; see the output of the LSF command lsload. and note the ut columns.

  • There may be PENDing jobs ahead of yours.

    You can see the PENDing jobs your_queue has (and why) with:

    bjobs -q your_queue -u all -p

    Depending on factors such as your job's anticipated run time, you may be able to pick a different LSF queue for quicker turn-around. See the following for more queue info.



    Note: jobs in the (_bg) queues will schedule and execute at lower than normal priority. Summary:

    Your job should be scheduled automatically by LSF when the requested resources become available.

  • 1.11. Since the node(s) on which my job will run is determined by the LSF Batch system, how do I find out on which node(s) my job is actually running once it has been submitted ?

    The bjobs command can be used to view running status and resource usage of batch jobs running in the LSF Batch system. Typing bjobs will display the following information on any jobs that are currently running on the HP Superdome cluster:

    JOBID  USER  STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME  SUBMIT_TIME 

    The host names listed under EXEC_HOST identify the nodes on which each of your jobs is currently running. JOBID identifies the unique numerical ID assigned to each job by LSF and QUEUE identifies the name of the queue to which each job was submitted.

    1.12. How do I monitor the status of my MPI job when it is running in the LSF Batch system ?

    First, identify the node(s) on which your job is running using the bjobs command.

    Then use the remote login command ssh nodeid to connect to the execution host nodeid (the node on which the job whose status you wish to monitor is running).

    Finally, type the command top -h to obtain status information on the job that is running on nodeid. When you are satisfied, type q to leave top, and exit to return to the login host.

    1.13. How do I terminate a job once it is running in the LSF Batch system ?

    The bkill command can be used to terminate batch jobs running in the LSF Batch system. Typing bkill job_id will terminate the batch job whose unique numerical ID (JOBID) is job_id.

    The time required to terminate a batch job will vary depending on how busy the system and network are, and on how many parallel processes on which the job is running (if any).

    When you kill a batch job that's not running on the login node, you do not have to clean up your scratch directory on the machine it was running on. LSF will copy the files back to your scratch directory on the login node, and will delete the files from the execution host as soon as you have no other jobs running on that machine. Follow the usual procedure to clean your scratch directory: wait until you have no LSF jobs running, and all the files from your previous jobs have been copied back to the login host before cleaning your scratch directory..

    1.14. How do I checkpoint and/or restart a job ?

    There are two webpages which go over this topic. One concerns checkpointing, the other concerns restarting checkpointed jobs. The checkpointing page is here ; the restart page is here. There are several methods of checkpointing, read both pages before attempting to checkpoint or restart a job.

    1.15. How do I do a timing run?

    A timing run - a type of job used to determine how effectively a program or algorithm performs - needs to be run without sharing a cpu with any other job. Otherwise, the measurements aren't valid because some cputime will be lost in time-sharing with the other job(s) - exactly how much would be lost is unpredictable, and would vary between runs. The cluster is set up to allow timing runs on up to 32 processors; if you need a larger timing run, send email to help-hpc@uky.edu (Note: Depending on system load larger timing runs may not be possible even given several weeks prior notice.)

    Do not try a timing run until you are certain your program runs properly. Debug your program, algorithm, and test data using the normal LSF queues before trying to do a timing run - this saves considerable time and is much less frustrating.

    To submit a timing run, send the job to the parax queue. Make sure you specify the proper number of processors. You must use the runpam command (assuming you are running an mpi job). See the section on submitting mpi jobs that run on two or more nodes for more information. The command to submit a 16-processor mpi timing run would look something like:

    bsub -n 16 -q parax runpam mympiopts myscript

    where myscript would be the script that actually runs the job, and mympiopts are any required mpirun options (if the job doesn't require any, this would be blank)

    Since timing runs demand no other job be running on the same processors, LSF will hold jobs submitted to parax until it can reserve as many processors as needed. If the cluster is heavily loaded, there may be a long wait.



    1.16. What is a "background" queue? What is a "background" queue good for?

    Background queues are queues that run and schedule jobs at a low priority. On this cluster, all "background" queues have names ending in "_bg". The major reason to use "background" queues is that the other queues are closed to users who have run thru their CPU allocation. Expect jobs run in background queues to wait until a machine is relatively unloaded to start, to run relatively slowly, and to be the first jobs suspended when the machine they run on becomes more heavily loaded.

    1.17. Why do the jobs I submit to parax, para64, or para128 fail immediately when I submit them? What is the "-W" option?

    The parax, para64, and para128 queues all require that users submit jobs with a runtime limit specified for that particular job. If you try to submit a job with no runtime limit, or with one longer than the queue allows, the job will not be accepted. The bsub option to specify a runtime limit is

    -Wxxxx

    where xxxx is the limit, specified either as hours:minutes or just as minutes.

    para128, in addition to the above requirement, requires special permission to use. If you want to use the para128 queue, send email to help-hpc@uky.edu.

    1.18. Where can I obtain additional information on LSF ?

    Additional information on LSF can be obtained from Platform Computing Corporation's Web site at www.platform.com.

    1.19. I want to run a job using multiple-level parallelism (an MPI job with sub-processes that use parallelized library routines, OpenMP, or routines using loop-parallelism; also known as mixed-mode). How do I do it?

    Consult the following link: Running a job using multi-level parallelism

    1.20. My job is in status "SSUSP". What does that mean?

    Your job has been suspended by the system. The cpu usage or other measure of the system load passed a set limit, and the system started suspending jobs to keep itself from getting overloaded. When the offending load measure drops below another set point, your job will resume running. This most commonly happens to jobs in the background (*_bg queues), which can also be suspended if other, higher priority jobs are submitted and need the resources being used by the job in the background queue. In other cases, the reason why particular jobs get suspended before other jobs in the same queue is less obvious. This is because the local system is running "fairshare" scheduling, which tries to give everyone a fair share of available resources. This is mainly useful because if one person submits many jobs at one time, the scheduler lets other people's jobs get run along with his instead of scheduling jobs in strict first-come-first-serve order. It also means that long-running jobs may be suspended before newer jobs are, since they've had more resources for a longer period.


    FAQ generated by: makefaq