This was writeable in 2.6.23 but the cpuidle merge made it read-only. But
some people's scripts (ie: Mark's) were writing to it.
As an unhappy compromise, make max_cstate writeable again if the kernel was
configured without CONFIG_CPU_IDLE.
http://bugzilla.kernel.org/show_bug.cgi?id=9683
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
The timer broadcast code might access HPET, which should not be
accessed after the busmaster disable.
In acpi_idle_enter_simple() this change also prevents, that we modify
the busmaster state without going actually idle. This might leave the
ACPI bm state in a stale state, when we leave the function early in
the need_resched() check.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: fix APIC related bootup crash on Athlon XP CPUs
time: add ADJ_OFFSET_SS_READ
x86: export the symbol empty_zero_page on the 32-bit x86 architecture
x86: fix kprobes_64.c inlining borkage
pci: use pci=bfsort for HP DL385 G2, DL585 G2
x86: correctly set UTS_MACHINE for "make ARCH=x86"
lockdep: annotate do_debug() trap handler
x86: turn off iommu merge by default
x86: fix ACPI compile for LOCAL_APIC=n
x86: printk kernel version in WARN_ON and other dump_stack users
ACPI: Set max_cstate to 1 for early Opterons.
x86: fix NMI watchdog & 'stopped time' problem
AMD Opteron processors before CG revision don't like C-states > 1.
This solves the long standing bugzilla #5303 and probably some more
on affected machines:
http://bugzilla.kernel.org/show_bug.cgi?id=5303
[ tglx@linutronix.de: reworked the patch so it does not wreck ia64 ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix for http://bugzilla.kernel.org/show_bug.cgi?id=9355
cpuidle always used to fallback to C2 if there is some bm activity while
entering C3. But, presence of C2 is not always guaranteed. Change cpuidle
algorithm to detect a safe_state to fallback in case of bm_activity and
use that state instead of C2.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Port 2aa44d0567
(sched: sched_clock_idle_[sleep|wakeup]_event()) to cpuidle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Port 18eab85503
(Enable C3 even when PM2_control is zero) to cpuidle.
Without this patch, some systems will notice a regression
when enabling CPU_IDLE -- C3 would no longer be available.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (41 commits)
ACPICA: hw: Don't carry spinlock over suspend
ACPICA: hw: remove use_lock flag from acpi_hw_register_{read, write}
ACPI: cpuidle: port idle timer suspend/resume workaround to cpuidle
ACPI: clean up acpi_enter_sleep_state_prep
Hibernation: Make sure that ACPI is enabled in acpi_hibernation_finish
ACPI: suppress uninitialized var warning
cpuidle: consolidate 2.6.22 cpuidle branch into one patch
ACPI: thinkpad-acpi: skip blanks before the data when parsing sysfs
ACPI: AC: Add sysfs interface
ACPI: SBS: Add sysfs alarm
ACPI: SBS: Add ACPI_PROCFS around procfs handling code.
ACPI: SBS: Add support for power_supply class (and sysfs)
ACPI: SBS: Make SBS reads table-driven.
ACPI: SBS: Simplify data structures in SBS
ACPI: SBS: Split host controller (ACPI0001) from SBS driver (ACPI0002)
ACPI: EC: Add new query handler to list head.
ACPI: Add acpi_bus_generate_event4() function
ACPI: Battery: add sysfs alarm
ACPI: Battery: Add sysfs support
ACPI: Battery: Misc clean-ups, no functional changes
...
Fix up conflicts in drivers/misc/thinkpad_acpi.[ch] manually
The conversion of x86-64 to clock events makes the
#ifdef CONFIG_GENERIC_CLOCKEVENTS
n the timer broadcast functions useless. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Some timers stop during C2 and C3, and so there are various
generations of timer broadcast workarounds to deal with that.
But that (already complex) code gets confused during suspend.
As it is unlikely that deep C-states would save much power
during the actual suspend/resume process anyway, deep C-states
were disabled via the addition of .suspend/.resume hooks
in to the ACPI processor driver.
Here that workaround is ported to the cpuidle version of
the ACPI idle loop. Technically, ACPI could un-register
itself from cpuidle on .suspend, but that code path
is currently quite cumbersome. So instead,
we simply invoke C1 from the C2 and C3 handlers
for the duration of .suspend/.resume.
Signed-off-by: Len Brown <len.brown@intel.com>
commit e5a16b1f9eec0af7cfa0830304b41c1c0833cf9f
Author: Len Brown <len.brown@intel.com>
Date: Tue Oct 2 23:44:44 2007 -0400
cpuidle: shrink diff
processor_idle.c | 440 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 429 insertions(+), 11 deletions(-)
Signed-off-by: Len Brown <len.brown@intel.com>
commit dfbb9d5aedfb18848a3e0d6f6e3e4969febb209c
Author: Len Brown <len.brown@intel.com>
Date: Wed Sep 26 02:17:55 2007 -0400
cpuidle: reduce diff size
Reduces the cpuidle processor_idle.c diff vs 2.6.22 from this
processor_idle.c | 2006 ++++++++++++++++++++++++++-----------------
1 file changed, 1219 insertions(+), 787 deletions(-)
to this:
processor_idle.c | 502 +++++++++++++++++++++++++++++++++++++++----
1 file changed, 458 insertions(+), 44 deletions(-)
...for the purpose of making the cpuilde patch less invasive
and easier to review.
no functional changes. build tested only.
Signed-off-by: Len Brown <len.brown@intel.com>
commit 889172fc915f5a7fe20f35b133cbd205ce69bf6c
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Sep 13 13:40:05 2007 -0700
cpuidle: Retain old ACPI policy for !CONFIG_CPU_IDLE
Retain the old policy in processor_idle, so that when CPU_IDLE is not
configured, old C-state policy will still be used. This provides a
clean gradual migration path from old ACPI policy to new cpuidle
based policy.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 9544a8181edc7ecc33b3bfd69271571f98ed08bc
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Sep 13 13:39:17 2007 -0700
cpuidle: Configure governors by default
Quoting Len "Do not give an option to users to shoot themselves in the foot".
Remove the configurability of ladder and menu governors as they are
needed for default policy of cpuidle. That way users will not be able to
have cpuidle without any policy loosing all C-state power savings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 8975059a2c1e56cfe83d1bcf031bcf4cb39be743
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:27:07 2007 -0400
CPUIDLE: load ACPI properly when CPUIDLE is disabled
Change the registration return codes for when CPUIDLE
support is not compiled into the kernel. As a result, the ACPI
processor driver will load properly even if CPUIDLE is unavailable.
However, it may be possible to cleanup the ACPI processor driver further
and eliminate some dead code paths.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit e0322e2b58dd1b12ec669bf84693efe0dc2414a8
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:26:06 2007 -0400
CPUIDLE: remove cpuidle_get_bm_activity()
Remove cpuidle_get_bm_activity() and updates governors
accordingly.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 18a6e770d5c82ba26653e53d240caa617e09e9ab
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:25:58 2007 -0400
CPUIDLE: max_cstate fix
Currently max_cstate is limited to 0, resulting in no idle processor
power management on ACPI platforms. This patch restores the value to
the array size.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 1fdc0887286179b40ce24bcdbde663172e205ef0
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:25:40 2007 -0400
CPUIDLE: handle BM detection inside the ACPI Processor driver
Update the ACPI processor driver to detect BM activity and
limit state entry depth internally, rather than exposing such
requirements to CPUIDLE. As a result, CPUIDLE can drop this
ACPI-specific interface and become more platform independent. BM
activity is now handled much more aggressively than it was in the
original implementation, so some testing coverage may be needed to
verify that this doesn't introduce any DMA buffer under-run issues.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 0ef38840db666f48e3cdd2b769da676c57228dd9
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:25:14 2007 -0400
CPUIDLE: menu governor updates
Tweak the menu governor to more effectively handle non-timer
break events. Non-timer break events are detected by comparing the
actual sleep time to the expected sleep time. In future revisions, it
may be more reliable to use the timer data structures directly.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit bb4d74fca63fa96cf3ace644b15ae0f12b7df5a1
Author: Adam Belay <abelay@novell.com>
Date: Tue Aug 21 18:24:40 2007 -0400
CPUIDLE: fix 'current_governor' sysfs entry
Allow the "current_governor" sysfs entry to properly handle
input terminated with '\n'.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit df3c71559bb69b125f1a48971bf0d17f78bbdf47
Author: Len Brown <len.brown@intel.com>
Date: Sun Aug 12 02:00:45 2007 -0400
cpuidle: fix IA64 build (again)
Signed-off-by: Len Brown <len.brown@intel.com>
commit a02064579e3f9530fd31baae16b1fc46b5a7bca8
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Sun Aug 12 01:39:27 2007 -0400
cpuidle: Remove support for runtime changing of max_cstate
Remove support for runtime changeability of max_cstate. Drivers can use
use latency APIs.
max_cstate can still be used as a boot time option and dmi override.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 0912a44b13adf22f5e3f607d263aed23b4910d7e
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Sun Aug 12 01:39:16 2007 -0400
cpuidle: Remove ACPI cstate_limit calls from ipw2100
ipw2100 already has code to use accetable_latency interfaces to limit the
C-state. Remove the calls to acpi_set_cstate_limit and acpi_get_cstate_limit
as they are redundant.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit c649a76e76be6bff1fd770d0a775798813a3f6e0
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Sun Aug 12 01:35:39 2007 -0400
cpuidle: compile fix for pause and resume functions
Fix the compilation failure when cpuidle is not compiled in.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Adam Belay <adam.belay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 2305a5920fb8ee6ccec1c62ade05aa8351091d71
Author: Adam Belay <abelay@novell.com>
Date: Thu Jul 19 00:49:00 2007 -0400
cpuidle: re-write
Some portions have been rewritten to make the code cleaner and lighter
weight. The following is a list of changes:
1.) the state name is now included in the sysfs interface
2.) detection, hotplug, and available state modifications are handled by
CPUIDLE drivers directly
3.) the CPUIDLE idle handler is only ever installed when at least one
cpuidle_device is enabled and ready
4.) the menu governor BM code no longer overflows
5.) the sysfs attributes are now printed as unsigned integers, avoiding
negative values
6.) a variety of other small cleanups
Also, Idle drivers are no longer swappable during runtime through the
CPUIDLE sysfs inteface. On i386 and x86_64 most idle handlers (e.g.
poll, mwait, halt, etc.) don't benefit from an infrastructure that
supports multiple states, so I think using a more general case idle
handler selection mechanism would be cleaner.
Signed-off-by: Adam Belay <abelay@novell.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit df25b6b56955714e6e24b574d88d1fd11f0c3ee5
Author: Len Brown <len.brown@intel.com>
Date: Tue Jul 24 17:08:21 2007 -0400
cpuidle: fix IA64 buid
Signed-off-by: Len Brown <len.brown@intel.com>
commit fd6ada4c14488755ff7068860078c437431fbccd
Author: Adrian Bunk <bunk@stusta.de>
Date: Mon Jul 9 11:33:13 2007 -0700
cpuidle: static
make cpuidle_replace_governor() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit c1d4a2cebcadf2429c0c72e1d29aa2a9684c32e0
Author: Adrian Bunk <bunk@stusta.de>
Date: Tue Jul 3 00:54:40 2007 -0400
cpuidle: static
This patch makes the needlessly global struct menu_governor static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit dbf8780c6e8d572c2c273da97ed1cca7608fd999
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Tue Jul 3 00:49:14 2007 -0400
export symbol tick_nohz_get_sleep_length
ERROR: "tick_nohz_get_sleep_length" [drivers/cpuidle/governors/menu.ko] undefined!
ERROR: "tick_nohz_get_idle_jiffies" [drivers/cpuidle/governors/menu.ko] undefined!
And please be sure to get your changes to core kernel suitably reviewed.
Cc: Adam Belay <abelay@novell.com>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 29f0e248e7017be15f99febf9143a2cef00b2961
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Tue Jul 3 00:43:04 2007 -0400
tick.h needs hrtimer.h
It uses hrtimers.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit e40cede7d63a029e92712a3fe02faee60cc38fb4
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:40:34 2007 -0400
cpuidle: first round of documentation updates
Documentation changes based on Pavel's feedback.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 83b42be2efece386976507555c29e7773a0dfcd1
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:39:25 2007 -0400
cpuidle: add rating to the governors and pick the one with highest rating by default
Introduce a governor rating scheme to pick the right governor by default.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit d2a74b8c5e8f22def4709330d4bfc4a29209b71c
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:38:08 2007 -0400
cpuidle: make cpuidle sysfs driver governor switch off by default
Make default cpuidle sysfs to show current_governor and current_driver in
read-only mode. More elaborate available_governors and available_drivers with
writeable current_governor and current_driver interface only appear with
"cpuidle_sysfs_switch" boot parameter.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 1f60a0e80bf83cf6b55c8845bbe5596ed8f6307b
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:37:00 2007 -0400
cpuidle: menu governor: change the early break condition
Change the C-state early break out algorithm in menu governor.
We only look at early breakouts that result in wakeups shorter than idle
state's target_residency. If such a breakout is frequent enough, eliminate
the particular idle state upto a timeout period.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 45a42095cf64b003b4a69be3ce7f434f97d7af51
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:35:38 2007 -0400
cpuidle: fix uninitialized variable in sysfs routine
Fix the uninitialized usage of ret.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 80dca7cdba3e6ee13eae277660873ab9584eb3be
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:34:16 2007 -0400
cpuidle: reenable /proc/acpi//power interface for the time being
Keep /proc/acpi/processor/CPU*/power around for a while as powertop depends
on it. It will be marked deprecated and removed in future. powertop can use
cpuidle interfaces instead.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 589c37c2646c5e3813a51255a5ee1159cb4c33fc
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jul 3 00:32:37 2007 -0400
cpuidle: menu governor and hrtimer compile fix
Compile fix for menu governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 0ba80bd9ab3ed304cb4f19b722e4cc6740588b5e
Author: Len Brown <len.brown@intel.com>
Date: Thu May 31 22:51:43 2007 -0400
cpuidle: build fix - cpuidle vs ipw2100 module
ERROR: "acpi_set_cstate_limit" [drivers/net/wireless/ipw2100.ko] undefined!
Signed-off-by: Len Brown <len.brown@intel.com>
commit d7d8fa7f96a7f7682be7c6cc0cc53fa7a18c3b58
Author: Adam Belay <abelay@novell.com>
Date: Sat Mar 24 03:47:07 2007 -0400
cpuidle: add the 'menu' governor
Here is my first take at implementing an idle PM governor that takes
full advantage of NO_HZ. I call it the 'menu' governor because it
considers the full list of idle states before each entry.
I've kept the implementation fairly simple. It attempts to guess the
next residency time and then chooses a state that would meet at least
the break-even point between power savings and entry cost. To this end,
it selects the deepest idle state that satisfies the following
constraints:
1. If the idle time elapsed since bus master activity was detected
is below a threshold (currently 20 ms), then limit the selection
to C2-type or above.
2. Do not choose a state with a break-even residency that exceeds
the expected time remaining until the next timer interrupt.
3. Do not choose a state with a break-even residency that exceeds
the elapsed time between the last pair of break events,
excluding timer interrupts.
This governor has an advantage over "ladder" governor because it
proactively checks how much time remains until the next timer interrupt
using the tick infrastructure. Also, it handles device interrupt
activity more intelligently by not including timer interrupts in break
event calculations. Finally, it doesn't make policy decisions using the
number of state entries, which can have variable residency times (NO_HZ
makes these potentially very large), and instead only considers sleep
time deltas.
The menu governor can be selected during runtime using the cpuidle sysfs
interface like so:
"echo "menu" > /sys/devices/system/cpu/cpuidle/current_governor"
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit a4bec7e65aa3b7488b879d971651cc99a6c410fe
Author: Adam Belay <abelay@novell.com>
Date: Sat Mar 24 03:47:03 2007 -0400
cpuidle: export time until next timer interrupt using NO_HZ
Expose information about the time remaining until the next
timer interrupt expires by utilizing the dynticks infrastructure.
Also modify the main idle loop to allow dynticks to handle
non-interrupt break events (e.g. DMA). Finally, expose sleep ticks
information to external code. Thomas Gleixner is responsible for much
of the code in this patch. However, I've made some additional changes,
so I'm probably responsible if there are any bugs or oversights :)
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 2929d8996fbc77f41a5ff86bb67cdde3ca7d2d72
Author: Adam Belay <abelay@novell.com>
Date: Sat Mar 24 03:46:58 2007 -0400
cpuidle: governor API changes
This patch prepares cpuidle for the menu governor. It adds an optional
stage after idle state entry to give the governor an opportunity to
check why the state was exited. Also it makes sure the idle loop
returns after each state entry, allowing the appropriate dynticks code
to run.
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 3a7fd42f9825c3b03e364ca59baa751bb350775f
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Apr 26 00:03:59 2007 -0700
cpuidle: hang fix
Prevent hang on x86-64, when ACPI processor driver is added as a module on
a system that does not support C-states.
x86-64 expects all idle handlers to enable interrupts before returning from
idle handler. This is due to enter_idle(), exit_idle() races. Make
cpuidle_idle_call() confirm to this when there is no pm_idle_old.
Also, cpuidle look at the return values of attch_driver() and set
current_driver to NULL if attach fails on all CPUs.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 4893339a142afbd5b7c01ffadfd53d14746e858e
Author: Shaohua Li <shaohua.li@intel.com>
Date: Thu Apr 26 10:40:09 2007 +0800
cpuidle: add support for max_cstate limit
With CPUIDLE framework, the max_cstate (to limit max cpu c-state)
parameter is ingored. Some systems require it to ignore C2/C3
and some drivers like ipw require it too.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 43bbbbe1cb998cbd2df656f55bb3bfe30f30e7d1
Author: Shaohua Li <shaohua.li@intel.com>
Date: Thu Apr 26 10:40:13 2007 +0800
cpuidle: add cpuidle_fore_redetect_devices API
add cpuidle_force_redetect_devices API,
which forces all CPU redetect idle states.
Next patch will use it.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit d1edadd608f24836def5ec483d2edccfb37b1d19
Author: Shaohua Li <shaohua.li@intel.com>
Date: Thu Apr 26 10:40:01 2007 +0800
cpuidle: fix sysfs related issue
Fix the cpuidle sysfs issue.
a. make kobject dynamicaly allocated
b. fixed sysfs init issue to avoid suspend/resume issue
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 7169a5cc0d67b263978859672e86c13c23a5570d
Author: Randy Dunlap <randy.dunlap@oracle.com>
Date: Wed Mar 28 22:52:53 2007 -0400
cpuidle: 1-bit field must be unsigned
A 1-bit bitfield has no room for a sign bit.
drivers/cpuidle/governors/ladder.c:54:16: error: dubious bitfield without explicit `signed' or `unsigned'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 4658620158dc2fbd9e4bcb213c5b6fb5d05ba7d4
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Wed Mar 28 22:52:41 2007 -0400
cpuidle: fix boot hang
Patch for cpuidle boot hang reported by Larry Finger here.
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.2/2025.html
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Larry Finger <larry.finger@lwfinger.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit c17e168aa6e5fe3851baaae8df2fbc1cf11443a9
Author: Len Brown <len.brown@intel.com>
Date: Wed Mar 7 04:37:53 2007 -0500
cpuidle: ladder does not depend on ACPI
build fix for CONFIG_ACPI=n
In file included from drivers/cpuidle/governors/ladder.c:21:
include/acpi/processor.h:88: error: expected specifier-qualifier-list before âacpi_integerâ
include/acpi/processor.h:106: error: expected specifier-qualifier-list before âacpi_integerâ
include/acpi/processor.h:168: error: expected specifier-qualifier-list before âacpi_handleâ
Signed-off-by: Len Brown <len.brown@intel.com>
commit 8c91d958246bde68db0c3f0c57b535962ce861cb
Author: Adrian Bunk <bunk@stusta.de>
Date: Tue Mar 6 02:29:40 2007 -0800
cpuidle: make code static
This patch makes the following needlessly global code static:
- driver.c: __cpuidle_find_driver()
- governor.c: __cpuidle_find_governor()
- ladder.c: struct ladder_governor
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Adam Belay <abelay@novell.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 0c39dc3187094c72c33ab65a64d2017b21f372d2
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Wed Mar 7 02:38:22 2007 -0500
cpu_idle: fix build break
This patch fixes a build breakage with !CONFIG_HOTPLUG_CPU and
CONFIG_CPU_IDLE.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 8112e3b115659b07df340ef170515799c0105f82
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Mar 6 02:29:39 2007 -0800
cpuidle: build fix for !CPU_IDLE
Fix the compile issues when CPU_IDLE is not configured.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Adam Belay <abelay@novell.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 1eb4431e9599cd25e0d9872f3c2c8986821839dd
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Feb 22 13:54:57 2007 -0800
cpuidle take2: Basic documentation for cpuidle
Documentation for cpuidle infrastructure
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit ef5f15a8b79123a047285ec2e3899108661df779
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Feb 22 13:54:03 2007 -0800
cpuidle take2: Hookup ACPI C-states driver with cpuidle
Hookup ACPI C-states onto generic cpuidle infrastructure.
drivers/acpi/procesor_idle.c is now a ACPI C-states driver that registers as
a driver in cpuidle infrastructure and the policy part is removed from
drivers/acpi/processor_idle.c. We use governor in cpuidle instead.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 987196fa82d4db52c407e8c9d5dec884ba602183
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Thu Feb 22 13:52:57 2007 -0800
cpuidle take2: Core cpuidle infrastructure
Announcing 'cpuidle', a new CPU power management infrastructure to manage
idle CPUs in a clean and efficient manner.
cpuidle separates out the drivers that can provide support for multiple types
of idle states and policy governors that decide on what idle state to use
at run time.
A cpuidle driver can support multiple idle states based on parameters like
varying power consumption, wakeup latency, etc (ACPI C-states for example).
A cpuidle governor can be usage model specific (laptop, server,
laptop on battery etc).
Main advantage of the infrastructure being, it allows independent development
of drivers and governors and allows for better CPU power management.
A huge thanks to Adam Belay and Shaohua Li who were part of this mini-project
since its beginning and are greatly responsible for this patchset.
This patch:
Core cpuidle infrastructure.
Introduces a new abstraction layer for cpuidle:
* which manages drivers that can support multiple idles states. Drivers
can be generic or particular to specific hardware/platform
* allows pluging in multiple policy governors that can take idle state policy
decision
* The core also has a set of sysfs interfaces with which administrato can know
about supported drivers and governors and switch them at run time.
Signed-off-by: Adam Belay <abelay@novell.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Three main sets of changes:
1) dmi_get_system_info() return value should have been marked const,
since callers should not be changing that data.
2) const-ify DMI internals, since DMI firmware tables should,
whenever possible, be marked const to ensure we never ever write to
that data area.
3) const-ify DMI API, to enable marking tables const where possible
in low-level drivers.
And if we're really lucky, this might enable some additional
optimizations on the part of the compiler.
The bulk of the changes are #2 and #3, which are interrelated. #1 could
have been a separate patch, but it was so small compared to the others,
it was easier to roll it into this changeset.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
device_suspend() calls ACPI suspend functions, which seems to have undesired
side effects on lower idle C-states. It took me some time to realize that
especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
show this effect. I'm quite sure that other bug reports against suspend/resume
about turning the system into a brick have the same root cause.
After fishing in the dark for quite some time, I realized that removing the ACPI
processor module before suspend (this removes the lower C-state functionality)
made the problem disappear. Interestingly enough the propability of having a
bricked box is influenced by various factors (interrupts, size of the ram image,
...). Even adding a bunch of printks in the wrong places made the problem go
away. The previous periodic tick implementation simply pampered over the
problem, which explains why the dyntick / clockevents changes made this more
prominent.
We avoid complex functionality during the boot process and we have to do the
same during suspend/resume. It is a similar scenario and equaly fragile.
Add suspend / resume functions to the ACPI processor code and disable the lower
idle C-states across suspend/resume. Fall back to the default idle
implementation (halt) instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
construct a more or less wall-clock time out of sched_clock(), by
using ACPI-idle's existing knowledge about how much time we spent
idling. This allows the rq clock to work around TSC-stops-in-C2,
TSC-gets-corrupted-in-C3 type of problems.
( Besides the scheduler's statistics this also benefits blktrace and
printk-timestamps as well. )
Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
callbacks allow the scheduler to get out the most of the period where
the CPU has a reliable TSC. This results in slightly more precise
task statistics.
the ACPI bits were acked by Len.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Enable C3 without bm control only for CST based C3.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On systems that do not have pm2_control_block, we cannot really use
ARB_DISABLE before C3. We used to disable C3 totally on such systems.
To be compatible with Windows, we need to enable C3 on such systems now.
We just skip ARB_DISABLE step before entering the C3-state and assume
hardware is handling things correctly. Also, ACPI spec is not clear
about pm2_control is _needed_ for C3 or not.
We have atleast one system that need this to enable C3.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Always disable/enable interrupts in the acpi idle routine,
even in the error path.
This is required as the 2.6.20 change in git commit d331e739f5ad2aaa9...
"Fix interrupt race in idle callback" expects the idle handler
to enable interrupt before returning.
There was a case in acpi idle routine, in which interrupt was not being
enabled before return, which caused the system to hang at bootup, while
enabling C-states on an SMP system.
The signature of the hang was that "processor.nocst"
was required to enable boot.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.
This is then displayed the first time mark_tsc_unstable is called.
This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Thomas's patch for including <asm/apic.h> for x86 UP builds came into
Linus's tree from two different directions, both of which were merged.
This reverts the latter, yanking out the duplicate #include and comment.
Signed-off-by: Ray Lee <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.
Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 25496caec1, which
broke bootup on at least Ingo's ThinkPad T60. Need to figure out
exactly what is wrong before we can re-do the logic.
Requested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use IPI for blacklisted CPUs, add parameter IPI vs LAPIC
Currently, Linux disables lapic timer for all machines with C2 and higher
C-state support.
According to Intel only specific Intel models (Banias/Dothan) are broken
in respect of not waking up from C2 with lapic.
However, I am not sure about the naming of the parameter and how it
could/should get integrated into the dyntick part
(CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC
still running?, ..) is needed? Does this make sense (always use
CLOCK_EVT_NOTIFY_BROADCAST_ON, but use OFF if forced by use_ipi=0:
clockevents_notify(use_ipi ? CLOCK_EVT_NOTIFY_BROADCAST_ON :
CLOCK_EVT_NOTIFY_BROADCAST_OFF, &pr->id);
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a preperatory patch for highres/dyntick:
- replace the big #ifdef ARCH_APICTIMER_STOPS_ON_C3 hackery by functions
- remove the double switch in the power verify function (in the worst case
we switched ipi to apic and 20usec later apic to ipi)
- keep track of the the state which stops local APIC timer
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <len.brown@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
apic.h does not get included on UP compiles. That way the
APICTIMER_STOPS_ON_C3 is not there and UP boxen have no support for timer
broadcasting. This was never noticed, because the lapic timer is only used
for profiling on UP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
apic.h does not get included on UP compiles. That way the
APICTIMER_STOPS_ON_C3 is not there and UP boxen have no support for timer
broadcasting. This was never noticed, because the lapic timer is only used
for profiling on UP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
cosmetic only
Make "module name" actually match the file name.
Invoke with ';' as leaving it off confuses Lindent and gcc doesn't care.
Fix indentation where Lindent did get confused.
Signed-off-by: Len Brown <len.brown@intel.com>
drivers/acpi/namespace/nsparse.c:126: warning: int format, different type arg (arg 7)
drivers/acpi/tables/tbfadt.c:224: warning: unsigned int format, different type arg (arg 6)
drivers/acpi/utilities/utdebug.c:184: warning: cast from pointer to integer of different size
drivers/acpi/utilities/utdebug.c:184: warning: cast from pointer to integer of different size
drivers/acpi/utilities/utdebug.c:197: warning: cast from pointer to integer of different size
drivers/acpi/processor_idle.c:1093: warning: long long unsigned int format, u64 arg (arg 5)
Signed-off-by: Len Brown <len.brown@intel.com>
Remove flags parameter for acpi_{get,set}_register().
It is no longer necessary now that these functions use a
spinlock for mutual exclusion.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
ACPI: replace kmalloc+memset with kzalloc
ACPI: Add support for acpi_load_table/acpi_unload_table_id
fbdev: update after backlight argument change
ACPI: video: Add dev argument for backlight_device_register
ACPI: Implement acpi_video_get_next_level()
ACPI: Kconfig - depend on PM rather than selecting it
ACPI: fix NULL check in drivers/acpi/osl.c
ACPI: make drivers/acpi/ec.c:ec_ecdt static
ACPI: prevent processor module from loading on failures
ACPI: fix single linked list manipulation
ACPI: ibm_acpi: allow clean removal
ACPI: fix git automerge failure
ACPI: ibm_acpi: respond to workqueue update
ACPI: dock: add uevent to indicate change in device status
ACPI: ec: Lindent once again
ACPI: ec: Change #define to enums there possible.
ACPI: ec: Style changes.
ACPI: ec: Acquire Global Lock under EC mutex.
ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
ACPI: ec: Rename gpe_bit to gpe
...
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.
After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (c01262ba 0 0)
IRQ_19-1479 1D..1 0us : resched_task (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __spin_unlock_irqrestore (try_to_wake_up)
...
<idle>-0 1...1 11us!: default_idle (cpu_idle)
...
<idle>-0 0Dn.1 602us : smp_apic_timer_interrupt (c0103baf 1 0)
...
<...>-5856 0D..2 618us : __switch_to (__schedule)
<...>-5856 0D..2 618us : __schedule <<idle>-0> (20 162)
<...>-5856 0D..2 619us : __spin_unlock_irq (__schedule)
<...>-5856 0...1 619us : trace_stop_sched_switched (__schedule)
<...>-5856 0D..1 619us : trace_stop_sched_switched <<...>-5856> (37 0)
what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!
the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:
commit 495ab9c045
Author: Andi Kleen <ak@suse.de>
Date: Mon Jun 26 13:59:11 2006 +0200
[PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
the problem is this type of change:
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
- clear_thread_flag(TIF_POLLING_NRFLAG);
+ current_thread_info()->status &= ~TS_POLLING;
smp_mb__after_clear_bit();
while (!need_resched()) {
local_irq_disable();
this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:
clear_bit(flag, &ti->flags);
and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:
static inline void clear_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( LOCK_PREFIX
"btrl %1,%0"
hence smp_mb__after_clear_bit() is defined as a simple compile barrier:
#define smp_mb__after_clear_bit() barrier()
but the explicit TS_POLLING clearing introduced by the patch:
+ current_thread_info()->status &= ~TS_POLLING;
is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.
CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]
Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.
( And it also turned out that the reason why these latencies never
triggered on my testsystems is that i routinely use idle=poll, which
was the only idle variant not affected by this bug. )
The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch breaks C-state discovery on my IBM IntelliStation Z30 because
the return value of acpi_processor_get_power_info_fadt is not assigned to
"result" in the case that acpi_processor_get_power_info_cst returns
-ENODEV. Thus, if ACPI provides C-state data via the FADT and not _CST (as
is the case on this machine), we incorrectly exit the function with -ENODEV
after reading the FADT. The attached patch sets the value of result so
that we don't exit early.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Acked-by: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/acpi/processor_idle.c:1112: warning: 'smp_callback' defined but not used
Cc: Len Brown <lenb@kernel.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ACPI processor init functions should be marked as __cpuinit as they use
structures marked with __cpuinitdata.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm
Platform firmware exports the support for Native C-state to OS using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm
With Processor Native C-state, we use 'MWAIT' instruction on the processor
to enter different C-states (C1, C2, C3). We won't use the special IO
ports to enter C-state and no SMM mode etc required to enter C-state.
Overall this will mean better C-state support.
One major advantage of using MWAIT for all C-states is, with this and
"treat interrupt as break event" feature of MWAIT, we can now get accurate
timing for the time spent in C1, C2, .. states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Add infrastructure to track "maximum allowable latency" for power saving
policies.
The reason for adding this infrastructure is that power management in the
idle loop needs to make a tradeoff between latency and power savings
(deeper power save modes have a longer latency to running code again). The
code that today makes this tradeoff just does a rather simple algorithm;
however this is not good enough: There are devices and use cases where a
lower latency is required than that the higher power saving states provide.
An example would be audio playback, but another example is the ipw2100
wireless driver that right now has a very direct and ugly acpi hook to
disable some higher power states randomly when it gets certain types of
error.
The proposed solution is to have an interface where drivers can
* announce the maximum latency (in microseconds) that they can deal with
* modify this latency
* give up their constraint
and a function where the code that decides on power saving strategy can
query the current global desired maximum.
This patch has a user of each side: on the consumer side, ACPI is patched
to use this, on the producer side the ipw2100 driver is patched.
A generic maximum latency is also registered of 2 timer ticks (more and you
lose accurate time tracking after all).
While the existing users of the patch are x86 specific, the infrastructure
is not. I'd like to ask the arch maintainers of other architectures if the
infrastructure is generic enough for their use (assuming the architecture
has such a tradeoff as concept at all), and the sound/multimedia driver
owners to look at the driver facing API to see if this is something they
can use.
[akpm@osdl.org: cleanups]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jesse.barnes@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>