* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.
In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.
This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In my device I get many interrupts from a high speed USB device in a very
short period of time. The system spends a lot of time reprogramming the
hardware timer which is in a slower timing domain as compared to the CPU.
This results in the CPU spending a huge amount of time waiting for the
timer posting to be done. All of this reprogramming is useless as the
wake up time has not changed.
As measured using ETM trace this drops my reprogramming penalty from
almost 60% CPU load down to 15% during high interrupt rate. I can send
traces to show this.
Suppress setting of duplicate timer event when timer already stopped.
Timer programming can be very costly and can result in long cpu stall/wait
times.
[akpm@linux-foundation.org: coding-style fixes]
[tglx@linutronix.de: move the check to the right place and avoid raising
the softirq for nothing]
Signed-off-by: Richard Woodruff <r-woodruff2@ti.com>
Cc: johnstul@us.ibm.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: remove false positive warning
After a cpu was taken down during cpu hotplug (read: disabled for interrupts)
it still might have pending softirqs. However take_cpu_down makes sure
that the idle task will run next instead of ksoftirqd on the taken down cpu.
The idle task will call tick_nohz_stop_sched_tick which might warn about
pending softirqs just before the cpu kills itself completely.
However the pending softirqs on the dead cpu aren't a problem because they
will be moved to an online cpu during CPU_DEAD handling.
So make sure we warn only for online cpus.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, move all hrtimer processing into hardirq context
This is an attempt at removing some of the hrtimer complexity by
reducing the number of callback modes to 1.
This means that all hrtimer callback functions will be ran from HARD-irq
context.
I went through all the 30 odd hrtimer callback functions in the kernel
and saw only one that I'm not quite sure of, which is the one in
net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
Furthermore, the hrtimer core now calls callbacks directly with IRQs
disabled in case you try to enqueue an expired timer. If this timer is a
periodic timer (which should use hrtimer_forward() to advance its time)
then it might be possible to end up in an inf. recursive loop due to the
fact that hrtimer_forward() doesn't round up to the next timer
granularity, and therefore keeps on calling the callback - obviously
this needs a fix.
Aside from that, this seems to compile and actually boot on my dual core
test box - although I'm sure there are some bugs in, me not hitting any
makes me certain :-)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: (future) size reduction for large NR_CPUS.
Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: nohz powersavings and wakeup regression
commit fb02fbc14d (NOHZ: restart tick
device from irq_enter()) causes a serious wakeup regression.
While the patch is correct it does not take into account that spurious
wakeups happen on x86. A fix for this issue is available, but we just
revert to the .27 behaviour and let long running softirqs screw
themself.
Disable it for now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit fb02fbc14d (NOHZ: restart tick
device from irq_enter())
solves the problem of stale jiffies when long running softirqs happen
in a long idle sleep period, but it has a major thinko in it:
When the interrupt which came in _is_ the timer interrupt which should
expire ts->sched_timer then we cancel and rearm the timer _before_ it
gets expired in hrtimer_interrupt() to the next period. That means the
call back function is not called. This game can go on for ever :(
Prevent this by making sure to only rearm the timer when the expiry
time is more than one tick_period away. Otherwise keep it running as
it is either already expired or will expiry at the right point to
update jiffies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Venkatesch Pallipadi <venkatesh.pallipadi@intel.com>
We did not restart the tick device from irq_enter() to avoid double
reprogramming and extra events in the return immediate to idle case.
But long lasting softirqs can lead to a situation where jiffies become
stale:
idle()
tick stopped (reprogrammed to next pending timer)
halt()
interrupt
jiffies updated from irq_enter()
interrupt handler
softirq function 1 runs 20ms
softirq function 2 arms a 10ms timer with a stale jiffies value
jiffies updated from irq_exit()
timer wheel has now an already expired timer
(the one added in function 2)
timer fires and timer softirq runs
This was discovered when debugging a timer problem which happend only
when the ath5k driver is active. The debugging proved that there is a
softirq function running for more than 20ms, which is a bug by itself.
To solve this we restart the tick timer right from irq_enter(), but do
not go through the other functions which are necessary to return from
idle when need_resched() is set.
Reported-by: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Elias Oltmanns <eo@nebensachen.de>
We have two separate nohz function calls in irq_enter() for no good
reason. Just call a single NOHZ function from irq_enter() and call
the bits in the tick code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: per CPU hrtimers can be migrated from a dead CPU
The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.
Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: possible hang on CPU onlining in timer one shot mode.
The tick_next_period variable is only used during boot on nohz/highres
enabled systems, but for CPU onlining it needs to be maintained when
the per cpu clock events device operates in one shot mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: rare hang which can be triggered on CPU online.
tick_do_timer_cpu keeps track of the CPU which updates jiffies
via do_timer. The value -1 is used to signal, that currently no
CPU is doing this. There are two cases, where the variable can
have this state:
boot:
necessary for systems where the boot cpu id can be != 0
nohz long idle sleep:
When the CPU which did the jiffies update last goes into
a long idle sleep it drops the update jiffies duty so
another CPU which is not idle can pick it up and keep
jiffies going.
Using the same value for both situations is wrong, as the CPU online
code can see the -1 state when the timer of the newly onlined CPU is
setup. The setup for a newly onlined CPU goes through periodic mode
and can pick up the do_timer duty without being aware of the nohz /
highres mode of the already running system.
Use two separate states and make them constants to avoid magic
numbers confusion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after
I made an offlined cpu online, I found this cpu's event handler was
tick_handle_periodic, not tick_nohz_handler.
After debuging, I found this bug was caused by the wrong tick mode. the
tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline.
This patch fixes this bug.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Found an interactivity problem on a quad core test-system - simple
CPU loops would occasionally delay the system un an unacceptable way.
After much debugging with Peter Zijlstra it turned out that the problem
is caused by the string of sched_clock() changes - they caused the CPU
clock to jump backwards a bit - which confuses the scheduler arithmetics.
(which is unsigned for performance reasons)
So revert:
# c300ba2: sched_clock: and multiplier for TSC to gtod drift
# c0c8773: sched_clock: only update deltas with local reads.
# af52a90: sched_clock: stop maximum check on NO HZ
# f7cce27: sched_clock: widen the max and min time
This solves the interactivity problems.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: add PCI ID for 6300ESB force hpet
x86: add another PCI ID for ICH6 force-hpet
kernel-paramaters: document pmtmr= command line option
acpi_pm clccksource: fix printk format warning
nohz: don't stop idle tick if softirqs are pending.
pmtmr: allow command line override of ioport
nohz: reduce jiffies polling overhead
hrtimer: Remove unused variables in ktime_divns()
hrtimer: remove warning in hres_timers_resume
posix-timers: print RT watchdog message
Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.
What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.
The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.
The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.
This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.
This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In case a cpu goes idle but softirqs are pending only an error message is
printed to the console. It may take a very long time until the pending
softirqs will finally be executed. Worst case would be a hanging system.
With this patch the timer tick just continues and the softirqs will be
executed after the next interrupt. Still a delay but better than a
hanging system.
Currently we have at least two device drivers on s390 which under certain
circumstances schedule a tasklet from process context. This is a reason
why we can end up with pending softirqs when going idle. Fixing these
drivers seems to be non-trivial.
However there is no question that the drivers should be fixed.
This patch shouldn't be considered as a bug fix. It just is intended to
keep a system running even if device drivers are buggy.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Glauber <jan.glauber@de.ibm.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix (probably theoretical only) rq->clock update bug:
in tick_nohz_update_jiffies() [which is called on all irq
entry on all cpus where the irq entry hits an idle cpu] we
call touch_softlockup_watchdog() before we update jiffies.
That works fine most of the time when idle timeouts are within
60 seconds. But when an idle timeout is beyond 60 seconds,
jiffies is updated with a jump of more than 60 seconds,
which causes a jump in cpu-clock of more than 60 seconds,
triggering a false positive.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Miller reported:
|--------------->
the following commit:
| commit 27ec440779
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Thu Feb 28 21:00:21 2008 +0100
|
| sched: make cpu_clock() globally synchronous
|
| Alexey Zaytsev reported (and bisected) that the introduction of
| cpu_clock() in printk made the timestamps jump back and forth.
|
| Make cpu_clock() more reliable while still keeping it fast when it's
| called frequently.
|
| Signed-off-by: Ingo Molnar <mingo@elte.hu>
causes watchdog triggers when a cpu exits NOHZ state when it has been
there for >= the soft lockup threshold, for example here are some
messages from a 128 cpu Niagara2 box:
[ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
[ 168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
[ 168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
[ 168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
[ 169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
[ 169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
[ 169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
[ 169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
[ 169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
[ 169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
[ 169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
[ 169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
[ 169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
[ 169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
[ 169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
[ 169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
[ 169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
[ 169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
[ 169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
[ 169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
[ 169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]
It's simple enough to trigger this by doing a 10 minute sleep after a
fresh bootup then starting a parallel kernel build.
I suspect this might be reintroducing a problem we've had and fixed
before, see the thread:
http://marc.info/?l=linux-kernel&m=119546414004065&w=2
<---------------|
touch the softlockup watchdog when exiting NOHZ state - we are
obviously not locked up.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Various SMP balancing algorithms require that the bandwidth period
run in sync.
Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call
ts = &per_cpu(tick_cpu_sched, cpu);
and
cpu = smp_processor_id();
once instead of twice.
No functional change done, as changed code runs with local irq off.
Reduces source lines and text size (20bytes on x86_64).
[ akpm@linux-foundation.org: Build fix ]
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared
before caused by (repeated) calls to:
$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 1 > /sys/devices/system/cpu/cpu1/online
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Cc: johnstul@us.ibm.com
Cc: Rafael Wysocki <rjw@sisk.pl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The
idle CPU will not progress the RCU through its grace period and a
synchronize_rcu my get stuck. Without this patch I have a box that will
not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine
with this patch.
This patch comes from the -rt kernel where it has been tested for
several months.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
the function accordingly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Current idle time in kstat is based on jiffies and is coarse grained.
tick_sched.idle_sleeptime is making some attempt to keep track of idle time
in a fine grained manner. But, it is not handling the time spent in
interrupts fully.
Make tick_sched.idle_sleeptime accurate with respect to time spent on
handling interrupts and also add tick_sched.idle_lastupdate, which keeps
track of last time when idle_sleeptime was updated.
This statistics will be crucial for cpufreq-ondemand governor, which can
shed some conservative gaurd band that is uses today while setting the
frequency. The ondemand changes that uses the exact idle time is coming
soon.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
fixes. When there's copyright, there should be GPL.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Small cleanups to tick-related code. Wrong preempt count is followed
by BUG(), so it is hardly KERN_WARNING.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In order to more easily allow for the scheduler to use timers, clean up
the locking a bit.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We need to teach no_hz about the rt throttling because its tick driven.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Miller reported soft lockup false-positives that trigger
on NOHZ due to CPUs idling for more than 10 seconds.
The solution is touch the softlockup watchdog when we return from
idle. (by definition we are not 'locked up' when we were idle)
http://bugzilla.kernel.org/show_bug.cgi?id=9409
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the unused
EXPORT_SYMBOL_GPL(tick_nohz_get_sleep_length).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>