在 Linux 中哪个实时优先级是最高优先级

在 Linux 实时进程优先级范围1到99之间,我不清楚哪个优先级最高,1还是99。

“理解 Linux 内核”(O’Reilly)的第7.2.2节说1是最高优先级,考虑到正常进程的静态优先级从100到139,其中100是最高优先级,这是有道理的:

”每个实时进程都与一个实时优先级相关联,该优先级的值范围为1(最高) 至99(最低优先次序)。」

另一方面,sched _ setScheder 手册页(RHEL 6.1)声称99是最高的:

”在一个实时策略(SCHED _ FIFO,SCHED _ RR)下调度的进程 在1(低)到99(高)的范围内有一个 sched _ 丁优先级值。”

哪个是最高的实时优先级?

118032 次浏览

Your assumption that normal processes have static priorities from 100 to 139 is volatile at best and invalid at worst. What I mean is that: set_scheduler only allows the sched_priority to be 0 (which indicates dynamic priority scheduler) with SCHED_OTHER / SCHED_BATCH and SCHED_IDLE (true as of 2.6.16).

Programmatically static priorities are 1-99 only for SCHED_RR and SCHED_FIFO

Now you may see priorities from 100-139 being used internally by a dynamic scheduler howeve,r what the kernel does internally to manage dynamic priorities (including flipping the meaning of high vs. low priority to make the comparison or sorting easier) should be opaque to the user-space.

Remember in SCHED_OTHER you are mostly stuffing the processes in the same priority queue.

The idea is to make kernel easier to debug and avoid goofy out-of-bound mistakes.

So the rationale in switching the meaning could be that as a kernel developer don't want to use math like 139-idx (just in case idx > 139) ... it is better to do math with idx-100 and reverse the concept of low vs. high because idx < 100 is well understood.

Also a side effect is that niceness becomes easier to deal with. 100 - 100 <=> nice == 0; 101-100 <=> nice == 1; etc. is easier. It collapses to negative numbers nicely as well (NOTHING to do with static priorities) 99 - 100 <=> nice == -1 ...

To determine the highest realtime priority you can set programmatically, make use of the sched_get_priority_max function.

On Linux 2.6.32 a call to sched_get_priority_max(SCHED_FIFO) returns 99.

See http://linux.die.net/man/2/sched_get_priority_max

This comment in sched.h is pretty definitive:

/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
* tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
* values are inverted: lower p->prio value means higher priority.
*
* The MAX_USER_RT_PRIO value allows the actual maximum
* RT priority to be separate from the value exported to
* user-space.  This allows kernel threads to set their
* priority to a value higher than any user task. Note:
* MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
*/

Note this part:

Priority values are inverted: lower p->prio value means higher priority.

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
BUG_ON(p->se.on_rq);


p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
else
p->sched_class = &fair_sched_class;
set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().

  1. Absolutely, the realtime priority is applicable to the RT policies FIFO and RR which varies from 0-99.
  2. We do have the 40 as a count of the non real time process priority for BATCH, OTHER policies which varies from 0-39 not from 100 to 139. This, you can observe by looking at any process in the system which is not a realtime process. It will bear a PR of 20 and NIceness of 0 by default. If you decrease the niceness of a process (usually, lower or negative the number lesser the niceness, more hungry the process), say from 0 to -1, you 'll observe that the PRiority will drop to 19 from 20. This simply tells that, if you make a process more hungry or would like to get a little more attention by decreasing the niceness value of the PID you 'll get decrease in priority too, thus lower the PRIORITY number HIGHER the PRIORITY.

    Example:
    
    
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2079 admin     10 -10  280m  31m 4032 S  9.6  0.0  21183:05 mgmtd
    [admin@abc.com ~]# renice -n -11 2079
    2079: old priority -10, new priority -11
    [admin@abc.com ~]# top -b | grep mgmtd
    2079 admin      9 -11  280m  31m 4032 S  0.0  0.0  21183:05 mgmtd
    ^C
    

Hope this practical example clarifies the doubts and may help fix the words at incorrect source, if any.

Short Answer

99 will be the winner for real time priority.

PR is the priority level (range -100 to 39). The lower the PR, the higher the priority of the process will be.

PR is calculated as follows:

  • for normal processes: PR = 20 + NI (NI is nice and ranges from -20 to 19)
  • for real time processes: PR = - 1 - real_time_priority (real_time_priority ranges from 1 to 99)

Long Answer

There are 2 types of processes, the normal ones and the real time For the normal ones (and only for those), nice is applied as follows:

Nice

The "niceness" scale goes from -20 to 19, whereas -20 it's the highest priority and 19 the lowest priority. The priority level is calculated as follows:

PR = 20 + NI

Where NI is the nice level and PR is the priority level. So as we can see, the -20 actually maps to 0, while the 19 maps to 39.

By default, a program nice value is 0 bit it is possible for a root user to lunch programs with a specified nice value by using the following command:

nice -n <nice_value> ./myProgram

Real Time

We could go even further. The nice priority is actually used for user programs. Whereas the UNIX/LINUX overall priority has a range of 140 values, nice value enables the process to map to the last part of the range (from 100 to 139). This equation leaves the values from 0 to 99 unreachable which will correspond to a negative PR level (from -100 to -1). To be able to access to those values, the process should be stated as "real time".

There are 5 scheduling policies in a LINUX environment that can be displayed with the following command:

chrt -m

Which will show the following list:

1. SCHED_OTHER   the standard round-robin time-sharing policy
2. SCHED_BATCH   for "batch" style execution of processes
3. SCHED_IDLE    for running very low priority background jobs.
4. SCHED_FIFO    a first-in, first-out policy
5. SCHED_RR      a round-robin policy

The scheduling processes could be divided into 2 groups, the normal scheduling policies (1 to 3) and the real time scheduling policies (4 and 5). The real time processes will always have priority over normal processes. A real time process could be called using the following command (The example is how to declare a SCHED_RR policy):

chrt --rr <priority between 1-99> ./myProgram

To obtain the PR value for a real time process the following equation is applied:

PR = -1 - rt_prior

Where rt_prior corresponds to the priority between 1 and 99. For that reason the process which will have the higher priority over other processes will be the one called with the number 99.

It is important to note that for real time processes, the nice value is not used.

To see the current "niceness" and PR value of a process the following command can be executed:

top

Which shows the following output:

enter image description here

In the figure the PR and NI values are displayed. It is good to note the process with PR value -51 that corresponds to a real time value. There are also some processes whose PR value is stated as "rt". This value actually corresponds to a PR value of -100.

Linux Kernel implements two separate priority ranges -

  1. Nice value: -20 to +19; larger nice values correspond to lower priority.

  2. Real-time priority: 0 to 99; higher real-time priority values correspond to a greater priority.