Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which
is disabled by default at the moment): it relies on se.wait_start_fair
being 0 while update_stats_wait_end() did not recognize a 0 value,
so instead of 'skipping' the initial interval we gave the new child
a maximum boost of +runtime-limit ...
(No impact on the default kernel, but nice to fix for completeness.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
update the fair-clock before using it for the key value.
[ mingo@elte.hu: small cleanups. ]
Signed-off-by: Ting Yang <tingy@cs.umass.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
de-HZ-ification of the granularity defaults unearthed a pre-existing
property of CFS: while it correctly converges to the granularity goal,
it does not prevent run-time fluctuations in the range of
[-gran ... 0 ... +gran].
With the increase of the granularity due to the removal of HZ
dependencies, this becomes visible in chew-max output (with 5 tasks
running):
out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40
out: 27 . 27. 32 | flu: 0 . 0 | ran: 17 . 13 | per: 44 . 40
out: 27 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 36 . 40
out: 29 . 27. 32 | flu: 2 . 0 | ran: 17 . 13 | per: 46 . 40
out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40
out: 29 . 27. 32 | flu: 0 . 0 | ran: 18 . 13 | per: 47 . 40
out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40
average slice is the ideal 13 msecs and the period is picture-perfect 40
msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
mechanism in CFS to keep that from happening: it's a perfectly valid
solution that CFS finds.
to fix this we add a granularity/preemption rule that knows about
the "target latency", which makes tasks that run longer than the ideal
latency run a bit less. The simplest approach is to simply decrease the
preemption granularity when a task overruns its ideal latency. For this
we have to track how much the task executed since its last preemption.
( this adds a new field to task_struct, but we can eliminate that
overhead in 2.6.24 by putting all the scheduler timestamps into an
anonymous union. )
with this change in place, chew-max output is fluctuation-less all
around:
out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40
out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40
out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40
out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40
out: 28 . 27. 39 | flu: 0 . 1 | ran: 13 . 13 | per: 41 . 40
out: 28 . 27. 39 | flu: 0 . 1 | ran: 13 . 13 | per: 41 . 40
this patch has no impact on any fastpath or on any globally observable
scheduling property. (unless you have sharp enough eyes to see
millisecond-level ruckles in glxgears smoothness :-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
There is an Amarok song switch time increase (regression) under
hefty load.
What is happening is that sleeper_bonus is never consumed, and only
rarely goes below runtime_limit, so for the most part, Amarok isn't
getting any bonus at all. We're keeping sleeper_bonus right at
runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie
we don't consume if we're lower that that, and don't add if we're above
it. One Amarok thread waking (or anybody else) will push us past the
threshold, so the next thread waking gets nada, but will reap pain from
the previous thread waking until we drop back to runtime_limit. It
looks to me like under load, some random task gets a bonus, and
everybody else pays, whether deserving or not.
This diff fixed the regression for me at any load rate.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after
bootup: current_is_keventd is right to note its use of smp_processor_id
is preempt-safe, but should use raw_smp_processor_id to avoid the warning.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
runtime limit and wakeup granularity used to be a function of
granularity and that was incorrect changed to sched_latency.
Fix this to make wakeup granularity a function of min-granularity,
and the runtime limit equal to latency.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
due to adaptive granularity scheduling the role of sched_granularity
has changed to "minimum granularity", so rename the variable (and the
tunable) accordingly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Instead of specifying the preemption granularity, specify the wanted
latency. By fixing the granlarity to a constany the wakeup latency
it a function of the number of running tasks on the rq.
Invert this relation.
sysctl_sched_granularity becomes a minimum for the dynamic granularity
computed from the new sysctl_sched_latency.
Then use this latency to do more intelligent granularity decisions: if
there are fewer tasks running then we can schedule coarser. This helps
performance while still always keeping the latency target.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the lockdep sysctls not depend on CONFIG_SCHED_DEBUG.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix task startup penalty miscalculation: sysctl_sched_granularity is
unsigned int and wait_runtime is long so we first have to convert it
to long before turning it negative ...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
current code:
delta = calc_delta_mine(delta_exec, curr->load.weight, lw);
delta = min((u64)delta, cfs_rq->sleeper_bonus);
Notice that this calc_delta_mine() line is exactly delta_mine, which
gives:
delta = min((u64)delta_mine, cfs_rq->sleeper_bonus);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
current code:
delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec);
delta = calc_delta_mine(delta, curr->load.weight, lw);
delta = min((u64)delta, cfs_rq->sleeper_bonus);
drop the first min(), because we clip against sleeper_bonus in the 3rd line
again. That gives:
delta = calc_delta_mine(delta_exec, curr->load.weight, lw);
delta = min((u64)delta, cfs_rq->sleeper_bonus);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make the bonus balance more consistent: do not hand out a bonus if
there's too much in flight already, and only deduct as much from a
runner as it has the capacity. This makes the bonus engine a zero-sum
game (as intended).
this also simplifies the code:
text data bss dec hex filename
34770 2998 24 37792 93a0 sched.o.before
34749 2998 24 37771 938b sched.o.after
and it also avoids overscheduling in sleep-happy workloads like
hackbench.c.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mitchell Erblich suggested a quality-of-implementation change to
not requeue SCHED_RR tasks if there's only a single task on the
runqueue, by checking for rq->nr_running == 1.
provide a more efficient implementation of that, to check that
particular RT priority-queue only.
[ From: mingo@elte.hu ]
Also first requeue the task then set need_resched - results in slightly
better machine-instruction ordering. Also clean up the code a bit.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove HZ dependency from the granularity default. Use 10 msec for
the base granularity, 1 msec for wakeup granularity and 25 msec for
batch wakeup granularity. (These defaults are close to the values
that the default HZ=250 setting got previously, and thus it's the
most common setting.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when I built with CONFIG_FAIR_GROUP_SCHED=y, I need the following change
to make things right.
[ From: mingo@elte.hu ]
this config option is not upstream-configurable right now but lets fix
this for completeness.
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
sysfs: don't warn on removal of a nonexistent binary file
HOWTO: latest lxr url address changed
HOWTO: korean translation of Documentation/HOWTO
Fix Off-by-one in /sys/module/*/refcnt
sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()
Michael Gerdau reported reniced task CPU usage weirdnesses.
Such symptoms can be caused by limit underruns so double the
sched_runtime_limit.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Was playing with sched_smt_power_savings/sched_mc_power_savings and
found out that while the scheduler domains are reconstructed when sysfs
settings change, rebalance_domains() can get triggered with null domain
on other cpus, which is setting next_balance to jiffies + 60*HZ.
Resulting in no idle/busy balancing for 60 seconds.
Fix this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On a four package system with HT - HT load balancing optimizations were
broken. For example, if two tasks end up running on two logical threads
of one of the packages, scheduler is not able to pull one of the tasks
to a completely idle package.
In this scenario, for nice-0 tasks, imbalance calculated by scheduler
will be 512 and find_busiest_queue() will return 0 (as each cpu's load
is 1024 > imbalance and has only one task running).
Similarly MC scheduler optimizations also get fixed with this patch.
[ mingo@elte.hu: restored fair balancing by increasing the fuzz and
adding it back to the power decision, without the /2
factor. ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are two remaining gotchas:
- The directories have impossible permissions (writeable).
- The ctl_name for the kernel directory is inconsistent with
everything else. It should be CTL_KERN.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
construct a more or less wall-clock time out of sched_clock(), by
using ACPI-idle's existing knowledge about how much time we spent
idling. This allows the rq clock to work around TSC-stops-in-C2,
TSC-gets-corrupted-in-C3 type of problems.
( Besides the scheduler's statistics this also benefits blktrace and
printk-timestamps as well. )
Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
callbacks allow the scheduler to get out the most of the period where
the CPU has a reliable TSC. This results in slightly more precise
task statistics.
the ACPI bits were acked by Len.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
dequeue_signal:
if (__SI_TIMER) {
spin_unlock(&tsk->sighand->siglock);
do_schedule_next_timer(info);
spin_lock(&tsk->sighand->siglock);
}
Unless tsk == curent, this is absolutely unsafe: nothing prevents tsk from
exiting. If signalfd was passed to another process, do_schedule_next_timer()
is just wrong.
Add yet another "tsk == current" check into dequeue_signal().
This patch fixes an oopsable bug, but breaks the scheduling of posix timers
if the shared __SI_TIMER signal was fetched via signalfd attached to another
sub-thread. Mostly fixed by the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Roland McGrath <roland@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_timer_create() sets ->it_process and unlocks ->siglock, then checks
tmr->it_sigev_notify to define if get_task_struct() is needed.
We already passed ->it_id to the caller, another thread can delete this timer
and free its memory in between.
As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it
too before calling release_posix_timer(). A proper serialization would be to
take ->it_lock, we add a partly initialized timer on posix_timers_id, not
good.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
timer_delete does:
lock_timer();
timer->it_process = NULL;
unlock_timer();
release_posix_timer();
timer->it_process is checked in lock_timer() to prevent access to a
timer, which is on the way to be deleted, but the check happens after
idr_lock is dropped. This allows release_posix_timer() to delete the
timer before the lock code can check the timer:
CPU 0 CPU 1
lock_timer();
timer->it_process = NULL;
unlock_timer();
lock_timer()
spin_lock(idr_lock);
timer = idr_find();
spin_lock(timer->lock);
spin_unlock(idr_lock);
release_posix_timer();
spin_lock(idr_lock);
idr_remove(timer);
spin_unlock(idr_lock);
free_timer(timer);
if (timer->......)
Change the locking to prevent this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we're going to run the handler from free_irq() then we must do it with
local irq's disabled. Otherwise lockdep complains that the handler is taking
irq-safe spinlocks in a non-irq-safe fashion.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid futex_unlock_pi returning -EFAULT (which results in deadlock), by
clearing uval before jumping to retry_locked.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes an off-by-one in a BUG_ON() spotted by the Coverity
checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Amy Griffis <amy.griffis@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerd Hoffmann pointed out that my patch from yesterday can lead
to a null pointer dereference if the kernel is booted with no
console, and no earlyprintk defined. This fixes that issue.
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a followup to the cleanups for earlyprintk patch from Gerd Hoffmann
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69331af79cf29e26d1231152a172a1a10c2df511
This ensures that a bootconsole is unregistered if it is not replaced.
The current implementation spews garbage out the bootconsole in this case,
since the bootconsole structure is normally in the init section, and is
freed, but still used.
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the second inclusion of linux/capability.h, which has been
introduced with "[PATCH] move capable() to capability.h" (commit
c59ede7b78)
Signed-off-by: Christian Heim <phreak@gentoo.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Level type interrupts are resent by the interrupt hardware when they are
still active at irq_enable().
Suppress the resend mechanism for interrupts marked as level.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5a43a066b1: "genirq: Allow fasteoi
handler to retrigger disabled interrupts" was erroneously applied to
handle_level_irq(). This added the irq retrigger / resend functionality
to the level irq handler.
Revert the offending bits.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rebalance_domains(SCHED_IDLE) looks strange (typo), change it to CPU_IDLE.
the effect of this bug was slightly more agressive idle-balancing on
SMP than intended.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Ziljstra noticed that the sleeper bonus deduction code
was not properly rate-limited: a task that scheduled more
frequently would get a disproportionately large deduction.
So limit the deduction to delta_exec.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the following needlessly global code static:
- arch_reinit_sched_domains()
- struct attr_sched_mc_power_savings
- struct attr_sched_smt_power_savings
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
gcc-4.2 is a lot more picky about its symbol handling. EXPORT_SYMBOL no
longer works on symbols that are undefined or defined with static scope.
For example, with CONFIG_PROFILE off, I see:
kernel/profile.c:206: error: __ksymtab_profile_event_unregister causes a section type conflict
kernel/profile.c:205: error: __ksymtab_profile_event_register causes a section type conflict
This patch moves the EXPORTs inside the #ifdef CONFIG_PROFILE, so we
only try to export symbols that are defined.
Also, in kernel/kprobes.c there's an EXPORT_SYMBOL_GPL() for
jprobes_return, which if CONFIG_JPROBES is undefined is a static
inline and gives the same error.
And in drivers/acpi/resources/rsxface.c, there's an
ACPI_EXPORT_SYMBOPL() for a static symbol. If it's static, it's not
accessible from outside the compilation unit, so should bot be exported.
These three changes allow building a zx1_defconfig kernel with gcc 4.2
on IA64.
[akpm@linux-foundation.org: export jpobe_return properly]
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I find a function(clockevents_unregister_notifier) which is not called by
anything in tree.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some systems some PFNs reported by the early initialization code as
'nosave' may be invalid. If we try to set the corresponding bits in the
hibernation bitmap, BUG_ON() in memory_bm_find_bit() will be triggered and
the system won't be able to boot (cf.
https://bugzilla.novell.com/show_bug.cgi?id=296242).
Prevent this from happening by verifying if the 'nosave' PFNs are valid in
mark_nosave_pages().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Misplaced #endif is hiding the numa_zonelist_order sysctl when !SECURITY.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjan van de Ven pointed out that we should not print kernel addresses
in world-readable /proc files - fix that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
improve the rq-clock overflow logic: limit the absolute rq->clock
delta since the last scheduler tick, instead of limiting the delta
itself.
tested by Arjan van de Ven - whole laptop was misbehaving due to
an incorrectly calibrated cpu_khz confusing sched_clock().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (61 commits)
sched: refine negative nice level granularity
sched: fix update_stats_enqueue() reniced codepath
sched: round a bit better
sched: make the multiplication table more accurate
sched: optimize update_rq_clock() calls in the load-balancer
sched: optimize activate_task()
sched: clean up set_curr_task_fair()
sched: remove __update_rq_clock() call from entity_tick()
sched: move the __update_rq_clock() call to scheduler_tick()
sched debug: remove the 'u64 now' parameter from print_task()/_rq()
sched: remove the 'u64 now' local variables
sched: remove the 'u64 now' parameter from deactivate_task()
sched: remove the 'u64 now' parameter from dequeue_task()
sched: remove the 'u64 now' parameter from enqueue_task()
sched: remove the 'u64 now' parameter from dec_nr_running()
sched: remove the 'u64 now' parameter from inc_nr_running()
sched: remove the 'u64 now' parameter from dec_load()
sched: remove the 'u64 now' parameter from inc_load()
sched: remove the 'u64 now' parameter from update_curr_load()
sched: remove the 'u64 now' parameter from ->task_new()
...
This reverts commit 0fc4969b86. It was
always meant to be temporary, but it's generating more useless noise
than anything else, and we probably should never have done it in the
generic kernel (only had the people involved test it on their own).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
refine the granularity of negative nice level tasks: let them
reschedule more often to offset the effect of them consuming
their wait_runtime proportionately slower. (This makes nice-0
task scheduling smoother in the presence of negatively
reniced tasks.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the key has to be rescaled to /weight even if it has a positive value.
(this change only affects the scheduling of reniced tasks)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
round a tiny bit better in high-frequency rescheduling scenarios,
by rounding around zero instead of rounding down.
(this is pretty theoretical though)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
optimize update_rq_clock() calls in the load-balancer: update them
right after locking the runqueue(s) so that the pull functions do
not have to call it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
optimize activate_task() by removing update_rq_clock() from it.
(and add update_rq_clock() to all callsites of activate_task() that
did not have it before.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clean up set_curr_task_fair().
( identity transformation that causes no change in functionality. )
text data bss dec hex filename
39170 3750 36 42956 a7cc sched.o.before
39170 3750 36 42956 a7cc sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove __update_rq_clock() call from entity_tick().
no change in functionality because scheduler_tick() already calls
__update_rq_clock().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the __update_rq_clock() call from update_cpu_load() to
scheduler_tick().
( identity transformation that causes no change in functionality. )
this allows the direct use of rq->clock in ->task_tick() functions.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from sched_debug.c:print_task()/_rq().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
final step: remove all (now superfluous) 'u64 now' variables.
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from deactivate_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dequeue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from enqueue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dec_nr_running().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from inc_nr_running().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dec_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from inc_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_curr_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->task_new().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->put_prev_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from pick_next_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->pick_next_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->dequeue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->enqueue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_curr_rt().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from put_prev_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from pick_next_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from set_next_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dequeue_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from enqueue_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from enqueue_sleeper().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from __enqueue_sleeper().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_curr_end().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_dequeue().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_curr_start().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_wait_end().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from __update_stats_wait_end().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_enqueue().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_stats_wait_start().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_curr().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from print_cfs_rq().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change all 'now' timestamp uses in assignments to rq->clock.
( this is an identity transformation that causes no functionality change:
all such new rq->clock is necessarily preceded by an update_rq_clock()
call. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
eliminate __rq_clock() use by changing it to:
__update_rq_clock(rq)
now = rq->clock;
identity transformation - no change in behavior.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
eliminate rq_clock() use by changing it to:
update_rq_clock(rq)
now = rq->clock;
identity transformation - no change in behavior.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add the [__]update_rq_clock(rq) functions. (No change in functionality,
just reorganization to prepare for elimination of the heavy 64-bit
timestamp-passing in the scheduler.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are two problems with balance_tasks() and how it used:
1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).
2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.
The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.
load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel.sched_domain hierarchy is under CTL_UNNUMBERED and thus
unreachable to sysctl(2). Generating .ctl_number's in such situation is
not useful.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>