News:2010-07-14- Upgrade and new structure of SUN gridengine

From ITSwiki
Jump to: navigation, search

Summary

The structure/design (i.e. queues etc.) has been redesigned in order to provide better "fairness". Key point is that the users now has to know the expected execution time at the moment of submission.

Details

New queues:

  1. fast
    This queue is for very short (test) jobs which requires max 10 min. of WALL time. Jobs taking more time will be killed.
    In order to use it jobs have to be submitted with qsub -q fast -P fast job.sh. Submitting jobs without arguments (i.e. qsub job.sh) will not run in this queue.
    The queue has access to almost all slots on all machines.
  2. long
    This queue is for long lasting jobs ( more than 12 hours) - the is no enforced upper limit.
    The queue can only utilize 4 slots on cimbi2-4 and 16 on hms2, i.e. long lasting jobs can not saturate those machines.
  3. himem
    For jobs needing up to 12 hours of CPU time and will be executed on either cimbi2-4 or hms1-2.
  4. himem-long
    For jobs needing more than 12 hours and running on either cimbi2-4 or hms1-2.
  5. himem2
    For jobs up to 12 hours and will be executed on hms2

The main new issue is, that at the moment of submission the user has to know whether the jobs needs more than 12 hours.

The benefit of the above structure is, that long lasting jobs can not saturate the system (but only approx. 50% of the capicity, hence there should be a reasonable high frequency of slots becoming available and allowing new users access).

Other "features":

  • The qstat command per default only show ones own jobs. To see all jobs in queue type qstat -u '*'.
  • For multi-theaded jobs one can do reservations. Submit the jobs like this qsub -R y job.sh. With this it is possible to avoid parallel job starvation where single theaded jobs can overrule the higher priority of multi-theaded jobs. However, the reservation mecanisme is not perfect; ef one submit a job with reservation to e.g. queue himem (qsub -R y -q himem job.sh) then the scheduler will make a reservation on a specific host belonging to the himem queue (e.g. cimbi2). If a slot become available on another host (e.g. cimbi3) that slot will not be reserved but allocated to another single theaded job in the queue.