Priority Queues

From UniCluster

Adding a Priority Queue

There are several methods that can be used to create a different priority queues in Grid Engine. Based on your requirements a simple "nice" level may be adequate but often more control is necessary.


Configuring the Queue Nice Level

Lets assume you have three queues, high.q, medium.q and low.q and you want jobs running in high.q to be prioritized over jobs in medium.q which has jobs that are prioritized over jobs running in low.q. You want all of the jobs to be able to run simultaneously with the higher priority jobs receiving preference. Simply run:

$ qconf -mq high.q

and change the priority field to -10. Then run

$ qconf -mq low.q

and change the priority field to 10.

Now if you run:

$ qsub -q high.q -N High_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -q medium.q -N Medium_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -q low.q -N Low_Worker $SGE_ROOT/examples/jobs/worker.sh

You can observe the three worker processes running simultaneously:

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high.q@cbrunner.univa.com      BIP   1/1       1.76     lx24-x86
      5 0.55500 High_Worke globus       r     11/19/2007 11:36:29     1
----------------------------------------------------------------------------
low.q@cbrunner.univa.com       BIP   1/1       1.76     lx24-x86
      7 0.55500 Low_Worker globus       r     11/19/2007 11:36:29     1
----------------------------------------------------------------------------
medium.q@cbrunner.univa.com    BIP   1/1       1.76     lx24-x86
      6 0.55500 Medium_Wor globus       r     11/19/2007 11:36:29     1

and use top to see how the different processes are being receving different percentages of the CPU:

$ top
Cpu(s): 43.3%us, 48.4%sy,  7.8%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   2071144k total,  2013504k used,    57640k free,   132280k buffers
Swap:  2031608k total,      304k used,  2031304k free,   458660k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31304 globus    15 -10  1856  520  436 R  100  0.0   0:24.92 work
31319 globus    25   0  1856  516  436 R   58  0.0   0:14.53 work
31367 globus    35  10  1856  524  436 R   30  0.0   0:07.14 work


Adding Priority Queues With Preemption and protecting Over-Subscription

If you wish to prevent over-subscription of a machine (since you have 3 queues per exec host but may only have 1 or 2 cores you have to configure things a bit different. First, we need to create 2 "resources" to help prioritize queued jobs. By creating a high_priority and low_priority resource with a non-zero urgency we can affect the order that the scheduler will schedule jobs.

$ echo "high_priority hp BOOL == FORCED NO FALSE 100" > /tmp/resourceTmp
$ echo "low_priority lp BOOL == FORCED NO FALSE -100" >> /tmp/resourceTmp
$ echo >> /tmp/resourceTmp
$ qconf -Mc /tmp/resourceTmp
$ rm /tmp/resourceTmp

Now use qconf to update the "complex_values" attribute in the high.q by adding "hp=TRUE" and in low.q "lp=TRUE"

Now if you run:

$ qsub -N High_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -N Medium_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -N Low_Worker $SGE_ROOT/examples/jobs/worker.sh

You will notice that only the medium job will run at a time and it will be in the medium.q:

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high.q@cbrunner.univa.com      BIP   0/1       1.92     lx24-x86
----------------------------------------------------------------------------
low.q@cbrunner.univa.com       BIP   0/1       1.92     lx24-x86
----------------------------------------------------------------------------
medium.q@cbrunner.univa.com    BIP   1/1       1.92     lx24-x86
    11 0.55500 High_Worke globus       r     11/19/2007 14:55:01     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    12 0.55500 Medium_Wor globus       qw    11/19/2007 14:54:56     1
    13 0.55500 Low_Worker globus       qw    11/19/2007 14:54:56     1

This is because the resources that were added were marked "FORCED" which means you need to explicitly ask for the resource in order to for your job to run in the queue. If you just use -q to specify low or medium you get...

$ qstat -j 8
==============================================================
job_number:                 8
exec_file:                  job_scripts/8
submission_time:            Mon Nov 19 14:47:10 2007
owner:                      globus
uid:                        501
group:                      globus
gid:                        501
sge_o_home:                 /usr/local/express
sge_o_log_name:             globus
.
.
.
notify:                     FALSE
job_name:                   High_Worker
jobshare:                   0
hard_queue_list:            high.q
shell_list:                 /bin/sh
env_list:
script_file:                /usr/local/express/sge/examples/jobs/worker.sh
scheduling info:            cannot run in queue "medium.q" because it is not contained in its hard   queue list (-q)
                           does not request 'forced' resource "high_priority" of queue instance  high.q@cbrunner.univa.com
                           cannot run in queue "low.q" because it is not contained in its hard queue list (-q)

Notice the scheduling info component.


At this point 3 jobs could still run con-currently (one in each queue) following this example:

$ qsub -l hp=TRUE -N High_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -N Medium_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -l lp=TRUE -N Low_Worker $SGE_ROOT/examples/jobs/worker.sh

And the results of the qstat...

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high.q@cbrunner.univa.com      BIP   1/1       2.05     lx24-x86
     14 0.60500 High_Worke globus       r     11/19/2007 15:01:31     1
----------------------------------------------------------------------------
low.q@cbrunner.univa.com       BIP   1/1       2.05     lx24-x86
     16 0.60500 Low_Worker globus       r     11/19/2007 15:01:31     1
----------------------------------------------------------------------------
medium.q@cbrunner.univa.com    BIP   1/1       2.05     lx24-x86
     15 0.50500 Medium_Wor globus       r     11/19/2007 15:01:46     1

If we want to solve the concurrency issue we can use subordinate queues by using qconf -mq high.q and replacing the NONE value for subordinate_list with "medium.q=1,low.q=1". This will cause jobs in low.q or medium.q to be suspended if a high.q job is submitted. We can still run a job in low.q and medium.q at the same time, though.

First submit a low and medium priority job:

$ qsub -N Medium_Worker $SGE_ROOT/examples/jobs/worker.sh \
 && qsub -l lp=TRUE -N Low_Worker $SGE_ROOT/examples/jobs/worker.sh

Now make sure the jobs are running...

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high.q@cbrunner.univa.com      BIP   0/1       0.99     lx24-x86
----------------------------------------------------------------------------
low.q@cbrunner.univa.com       BIP   1/1       0.99     lx24-x86
     18 0.60500 Low_Worker globus       r     11/19/2007 15:11:01     1
----------------------------------------------------------------------------
medium.q@cbrunner.univa.com    BIP   1/1       0.99     lx24-x86
     17 0.50500 Medium_Wor globus       r     11/19/2007 15:11:01     1

Now submit the high priority job:

$ qsub -l hp=TRUE -N High_Worker $SGE_ROOT/examples/jobs/worker.sh

And notice how the low and medium jobs got suspended when the high priority job started running

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high.q@cbrunner.univa.com      BIP   1/1       0.99     lx24-x86
     19 0.60500 High_Worke globus       r     11/19/2007 15:11:31     1
----------------------------------------------------------------------------
low.q@cbrunner.univa.com       BIP   1/1       0.99     lx24-x86      S
     18 0.60500 Low_Worker globus       S     11/19/2007 15:11:01     1
----------------------------------------------------------------------------
medium.q@cbrunner.univa.com    BIP   1/1       0.99     lx24-x86      S
     17 0.50500 Medium_Wor globus       S     11/19/2007 15:11:01     1

Now all you need to do is add some policy so that everyone doesn't use the high priority queue all of the time. This can be done easily by configuring Queue Limits.


Back to Administrative How Tos.