Dave,
Attached is an update of my patch against the cpufreq fixes branch.
Before applying the patch I compiled and booted the tree to see if the panic
was still there -- to my surprise it was not. This is because of the conversion
of cpufreq_cpu_governor to a char[].
While the panic is kaput, the problem of stale data continues and my patch is
still valid. It is possible to end up with the wrong governor after hotplug
events because CPUFREQ_DEFAULT_GOVERNOR is statically linked to a default,
while the cpu siblings may have had a different governor assigned by a user.
ie) the patch is still needed in order to keep the governors assigned
properly when hotplugging devices
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Currently on governer backup/restore path we storing governor's pointer.
This is wrong because one may unload governor's module after cpu goes
offline. As result use-after-free will take place on restored cpu.
It is not easy to exploit this bug, but still we have to close this
issue ASAP. Issue was introduced by following commit
084f349394
##TESTCASE##
#!/bin/sh -x
modprobe acpi_cpufreq
# Any non default governor, in may case it is "ondemand"
modprobe cpufreq_ondemand
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
rmmod acpi_cpufreq
rmmod cpufreq_ondemand
modprobe acpi_cpufreq # << use-after-free here.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Dave Jones <davej@redhat.com>
remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
commit 42a06f2166
Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.
Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.
However, the rwlock write-lock is not needed upon governor stop.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: trenn@suse.de
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.
Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
on a managed CPU will result in:
Jul 22 15:15:37 linux kernel: [ 80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6()
Jul 22 15:15:37 linux kernel: [ 80.013866] Hardware name: To Be Filled By O.E.M.
Jul 22 15:15:37 linux kernel: [ 80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Jul 22 15:15:37 linux kernel: [ 80.013870] Modules linked in: powernow_k8
Jul 22 15:15:37 linux kernel: [ 80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40
Jul 22 15:15:37 linux kernel: [ 80.013876] Call Trace:
Jul 22 15:15:37 linux kernel: [ 80.013879] [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013884] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 15:15:37 linux kernel: [ 80.013888] [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e
Jul 22 15:15:37 linux kernel: [ 80.013891] [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013894] [<ffffffff8112f213>] create_dir+0x58/0x87
Jul 22 15:15:37 linux kernel: [ 80.013898] [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f
Jul 22 15:15:37 linux kernel: [ 80.013902] [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de
Jul 22 15:15:37 linux kernel: [ 80.013905] [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e
Jul 22 15:15:37 linux kernel: [ 80.013908] [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57
Jul 22 15:15:37 linux kernel: [ 80.013913] [<ffffffff810667bc>] ? mark_lock+0x22/0x228
Jul 22 15:15:37 linux kernel: [ 80.013918] [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4
...
This bug slipped in by git commit:
150b06f7f223cfd0f808737a5243cceca8ea47fa
When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function
is not left anymore, only cpufreq_add_dev_policy.
This patch should reconstruct the identical functionality again as it
was before the split.
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Commit 4bc5d34135 is broken and causes regressions:
(1) cpufreq_driver->resume() and ->suspend() were only called on
__powerpc__, but you could set them on all architectures. In fact,
->resume() was defined and used before the PPC-related commit
42d4dc3f4e complained about in 4bc5d34135.
(2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
would never be called.
(3) This means speedstep-smi would be unusuable after suspend or resume.
The _real_ problem was calling cpufreq_driver->get() with interrupts
off, but it re-enabling interrupts on some platforms. Why is ->get()
necessary?
Some systems like to change the CPU frequency behind our
back, especially during BIOS-intensive operations like suspend or
resume. If such systems also use a CPU frequency-dependant timing loop,
delays might be off by large factors. Therefore, we need to ascertain
as soon as possible that the CPU frequency is indeed at the speed we
think it is. You can do this two ways: either setting it anew, or trying
to get it. The latter is what was done, the former also has the same IRQ
issue.
So, let's try something different: defer the checking to after interrupts
are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
Timings may be off until this later stage, so let's watch out for
resume regressions caused by the deferred handling of frequency changes
behind the kernel's back.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
The suspend code runs with interrupts disabled, and the powerpc workaround we
do in the cpufreq suspend hook calls the drivers ->get method.
powernow-k8's ->get does an smp_call_function_single
which needs interrupts enabled
cpufreq's suspend/resume code was added in 42d4dc3f4e to work around
a hardware problem on ppc powerbooks. If we make all this code
conditional on powerpc, we avoid the issue above.
Signed-off-by: Dave Jones <davej@redhat.com>
The first offline/online cycle is successful, the second not.
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
echo 0 >cpu1/online
The last command will trigger:
Jul 22 14:39:50 linux kernel: [ 593.210125] ------------[ cut here ]------------
Jul 22 14:39:50 linux kernel: [ 593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b()
Jul 22 14:39:50 linux kernel: [ 593.210144] Hardware name: To Be Filled By O.E.M.
Jul 22 14:39:50 linux kernel: [ 593.210148] Modules linked in: powernow_k8
Jul 22 14:39:50 linux kernel: [ 593.210158] Pid: 378, comm: kondemand/2 Tainted: G W 2.6.31-rc2 #38
Jul 22 14:39:50 linux kernel: [ 593.210163] Call Trace:
Jul 22 14:39:50 linux kernel: [ 593.210171] [<ffffffff812008e8>] ? kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [ 593.210181] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 14:39:50 linux kernel: [ 593.210190] [<ffffffff81041962>] warn_slowpath_null+0xf/0x11
Jul 22 14:39:50 linux kernel: [ 593.210198] [<ffffffff812008e8>] kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [ 593.210206] [<ffffffff811ffa19>] kobject_get+0x1a/0x22
Jul 22 14:39:50 linux kernel: [ 593.210214] [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb
Jul 22 14:39:50 linux kernel: [ 593.210222] [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67
Jul 22 14:39:50 linux kernel: [ 593.210231] [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f
Jul 22 14:39:50 linux kernel: [ 593.210240] [<ffffffff810529ea>] worker_thread+0x200/0x313
...
The output continues on every do_dbs_timer ondemand freq checking poll.
This regression was introduced by git commit:
3f4a782b5c
The policy is released when the cpufreq device is removed in:
__cpufreq_remove_dev():
/* if this isn't the CPU which is the parent of the kobj, we
* only need to unlink, put and exit
*/
Not creating the symlink is not sever at all.
As long as:
sysfs_remove_link(&sys_dev->kobj, "cpufreq");
handles it gracefully that the symlink did not exist.
Possibly no error should be returned at all, because ondemand
governor would still provide the same functionality.
Userspace in userspace gov case might be confused if the link
is missing.
Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Suspend/Resume fails on multi socket, multi core systems because the cpufreq
code erroneously sets the per_cpu policy_cpu value when a logical cpu is
offline.
This most notably results in missing sysfs files that are used to set the
cpu frequencies of the various cpus.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.
This is step one of fixing the overall locking dependency mess in cpufreq.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
CC: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Commit b14893a62c although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.
Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here
http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.htmlhttp://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html
and few others..
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject : cpufreq timer teardown problem
> Submitter : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date : 2009-04-23 14:00 (24 days old)
> References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch : http://patchwork.kernel.org/patch/19754/
> http://patchwork.kernel.org/patch/19753/
The patches linked above depend on the following patch to remove
circular locking dependency :
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call
(the following issue was faced when using cancel_delayed_work_sync() in the
timer teardown (which fixes a race).
* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
>
> A: work -> do_dbs_timer() -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
>
>
Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.
The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().
Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :
cpufreq.c: __cpufreq_remove_dev()
...
if (cpufreq_driver->target)
__cpufreq_governor(data, CPUFREQ_GOV_STOP);
unlock_policy_rwsem_write(cpu);
To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.
Applies on top of the 2.6.30-rc5 tree.
Required to remove circular dep in teardown of both conservative and
ondemande governors so they can use cancel_delayed_work_sync().
CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
unneeded.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
[CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
[CPUFREQ] Make cpufreq-nforce2 less obnoxious
[CPUFREQ] p4-clockmod reports wrong frequency.
[CPUFREQ] powernow-k8: Use a common exit path.
[CPUFREQ] Change link order of x86 cpufreq modules
[CPUFREQ] conservative: remove 10x from def_sampling_rate
[CPUFREQ] conservative: fixup governor to function more like ondemand logic
[CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
[CPUFREQ] conservative: amend author's email address
[CPUFREQ] Use swap() in longhaul.c
[CPUFREQ] checkpatch cleanups for acpi-cpufreq
[CPUFREQ] powernow-k8: Only print error message once, not per core.
[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
[CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
[CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
[CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
[CPUFREQ] checkpatch cleanups for powernow-k8
[CPUFREQ] checkpatch cleanups for ondemand governor.
[CPUFREQ] checkpatch cleanups for powernow-k7
[CPUFREQ] checkpatch cleanups for speedstep related drivers.
...
This reverts commit e088e4c9cd.
Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.
Course of action:
- Find out the remaining causes of overheating, and fix them
if possible. ACPI should be doing the right thing automatically.
If it isn't, we need to fix that.
- mark p4-clockmod ui as deprecated
- try again with the removal in six months.
It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.
Signed-off-by: Dave Jones <davej@redhat.com>
It's not only useful for the ondemand and conservative governors, but
also for userspace daemons to know about the HW transition latency of
the CPU.
It is especially useful for userspace to know about this value when
the ondemand or conservative governors are run. The sampling rate
control value depends on it and for userspace being able to set sane
tuning values there it has to know about the transition latency.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: use new cpumask API to reduce memory usage
This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previously driver resume would always set the current policy min/max with
the cpuinfo min/max, defined by user_policy.min/max. Resulting in a reset
of policy settings when policy.min/max != cpuinfo.min/max when coming out
of suspend. Now user_policy is saved as the policy instead of cpuinfo to
preserve what the user actually set.
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>
p4-clockmod has a long history of abuse. It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power. This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.
However p4-clockmod does have a purpose. It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.
This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
After calling cpufreq_cpu_get, error handling code should call
cpufreq_cpu_put.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@
(
if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
|
x = cpufreq_cpu_get@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != cpufreq_cpu_put(x)
when != if (x) { ... cpufreq_cpu_put(x); ...}
return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
cpufreq_cpu_put(x)
)
@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@
* x = cpufreq_cpu_get@p1(...)
<...
* if@p3 (...)
S
...>
* return@p2 \(NULL\|ret\);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec74)
But it can still happen that _PPC is called at processor driver
initialization time.
This patch should make sure that this is not possible anymore.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
Format string bug. Not exploitable, as this is only writable by root,
but worth fixing all the same.
Spotted-by: Ilja van Sprundel <ilja@netric.org>
Signed-off-by: Dave Jones <davej@redhat.com>
If cpu specific cpufreq driver(i.e. longrun) has "setpolicy" function,
governor object isn't set into cpufreq_policy object at "__cpufreq_set_policy"
function in driver/cpufreq/cpufreq.c .
This causes a null object access at "store_scaling_setspeed" and
"show_scaling_setspeed" function in driver/cpufreq/cpufreq.c when reading or
writing through /sys interface (ex. cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed)
Addresses:
http://bugzilla.kernel.org/show_bug.cgi?id=10654https://bugzilla.redhat.com/show_bug.cgi?id=443354
Signed-off-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the
error exit 'err_out_unregister' from different places once with the
'cpu_policy_rwsem' lock held, once with the lock released:
| if (ret)
| goto err_out_unregister;
| }
|
| policy->governor = NULL; /* to assure that the starting sequence is
| * run in cpufreq_set_policy */
|
| /* set default policy */
| ret = __cpufreq_set_policy(policy, &new_policy);
| policy->user_policy.policy = policy->policy;
| policy->user_policy.governor = policy->governor;
|
| unlock_policy_rwsem_write(cpu);
|
| if (ret) {
| dprintk("setting policy failed\n");
| goto err_out_unregister;
| }
This leads to the following error message in case of a failing
__cpufreq_set_policy() call:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
[<c01b4564>] unlock_policy_rwsem_write+0x30/0x40
but there are no more locks to release!
other info that might help us debug this:
1 lock held by swapper/1:
#0: (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130
stack backtrace:
[<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104)
[<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c)
r6:00000028 r5:c3c1ab80 r4:c01b4564
[<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c)
r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80
[<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30)
r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c
[<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40)
r4:c3c34884
[<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398)
[<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130)
[<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174)
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change cpufreq_policy and cpufreq_governor pointer tables
from arrays to per_cpu variables in the cpufreq subsystem.
Also some minor complaints from checkpatch.pl fixed.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.
To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
If cpufreq_register_notifier is called before pure initcalls,
init_cpufreq_transition_notifier_list will overwrite whatever it did,
causing notifiers to be ignored.
Print some noise to the kernel log if that happens.
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Fix the following warnings:
WARNING: vmlinux.o(.text+0xfe6711): Section mismatch in reference from the function cpufreq_unregister_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier
WARNING: vmlinux.o(.text+0xfe68af): Section mismatch in reference from the function cpufreq_register_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier
WARNING: vmlinux.o(.exit.text+0xc4fa): Section mismatch in reference from the function cpufreq_stats_exit() to the variable .cpuinit.data:cpufreq_stat_cpu_notifier
The warnings were casued by references to unregister_hotcpu_notifier()
from normal functions or exit functions.
This is flagged by modpost as a potential error because
it does not know that for the non HOTPLUG_CPU
scenario the unregister_hotcpu_notifier() is a nop.
Silence the warning by replacing the __initdata
annotation with a __refdata annotation.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
The cpufreq core should not take an extra kobject reference count for no
reason, and then refuse to release it. This has been reported as
keeping machines from properly powering down all the way.
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Yi Yang <yi.y.yang@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Frans Pop <elendil@planet.nl>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eliminate cpufreq_userspace scaling_setspeed deadlock.
Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.
Detailed examination of the deadlock showed the exact flow before the deadlock
as:
Thread 1 Thread 2
________ ________
cats files under /sys/devices/.../cpufreq/
Set governor to userspace
Adds a new sysfs entry for
scaling_setspeed
cats files under /sys/devices/.../cpufreq/
Set governor to performance
Holds cpufreq_rw_sem in write
mode
Sends a STOP notify to
userspace governor
cat /sys/devices/.../cpufreq/scaling_setspeed
Gets a handle on the above sysfs entry with
sysfs_get_active
Blocks while trying to get cpufreq_rw_sem
in read mode
Remove a sysfs entry for
scaling_setspeed
Blocks on sysfs_deactivate
while waiting for earlier
get_active (on other thread)
to drain
At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.
There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.
Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>