The patch below moves the cpu hotplugging higher up in the cpufreq
layering; this is needed to avoid recursive taking of the cpu hotplug
lock and to otherwise detangle the mess.
The new rules are:
1. you must do lock_cpu_hotplug() around the following functions:
__cpufreq_driver_target
__cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
__cpufreq_set_policy
2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
lock in any way; they are called with the lock taken already
3. if your governer spawns a thread that does things, like calling
__cpufreq_driver_target, your thread must honor rule #1.
4. the policy lock and other cpufreq internal locks nest within
the lock_cpu_hotplug() lock.
I'm not entirely happy about how the __cpufreq_governor rule ended up
(conditional locking rule depending on the argument) but basically all
callers pass this as a constant so it's not too horrible.
The patch also removes the cpufreq_governor() function since during the
locking audit it turned out to be entirely unused (so no need to fix it)
The patch works on my testbox, but it could use more testing
(otoh... it can't be much worse than the current code)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/cpufreq/cpufreq_ondemand.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_ondemand.c:374: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_ondemand.c:381: warning: implicit declaration of function 'unlock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_conservative.c:425: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c:432: warning: implicit declaration of function 'unlock_cpu_hotplug'
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rootcaused the bug to a deadlock in cpufreq and ondemand. Due to non-existent
ordering between cpu_hotplug lock and dbs_mutex. Basically a race condition
between cpu_down() and do_dbs_timer().
cpu_down() flow:
* cpu_down() call for CPU 1
* Takes hot plug lock
* Calls pre down notifier
* cpufreq notifier handler calls cpufreq_driver_target() which takes
cpu_hotplug lock again. OK as cpu_hotplug lock is recursive in same
process context
* CPU 1 goes down
* Calls post down notifier
* cpufreq notifier handler calls ondemand event stop which takes dbs_mutex
So, cpu_hotplug lock is taken before dbs_mutex in this flow.
do_dbs_timer is triggerred by a periodic timer event.
It first takes dbs_mutex and then takes cpu_hotplug lock in
cpufreq_driver_target().
Note the reverse order here compared to above. So, if this timer event happens
at right moment during cpu_down, system will deadlok.
Attached patch fixes the issue for both ondemand and conservative.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Keep the value of ignore_nice_load and freq_step of the conservative
governor after the governor is deselected and reselected.
Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Dave Jones <davej@redhat.com>
Venki, author of cpufreq_ondemand, came up with a neater way to remove the
initialiser code from the main loop of my code and out to the point when the
governor is actually initialised.
Not only does it look but it also feels cleaner, plus its simpler to
understand. It also saves a bunch of pointless conditional statements in the
main loop.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
All these changes should make cpufreq_conservative safe in regards to the x86
for_each_cpu cpumask.h changes and whatnot.
Whilst making it safe a number of pointless for loops related to the cpu
mask's were removed. I was never comfortable with all those for loops,
especially as the iteration is over the same data again and again for each
CPU you had in a single poll, an O(n^2) outcome to frequency scaling.
The approach I use is to assume by default no CPU's exist and it sets the
requested_freq to zero as a kind of flag, the reasoning is in the source ;)
If the CPU is queried and requested_freq is zero then it initialises the
variable to current_freq and then continues as if nothing happened which
should be the same net effect as before?
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The sensible approach to making conservative less responsive than ondemand :)
As mentioned in patch [1/4]. We do not want conservative to shoot through
all the frequencies, its point (by default) is to slowly move through them.
By default its ten times less responsive.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Since the conservative govenor was released its codebase has drifted from the
the direction and updates that have been applied to the ondemand govornor.
This patch addresses the lack of updates in that period and brings
conservative back up to date. The resulting diff file between
cpufreq_ondemand.c and cpufreq_conservative.c is now much smaller and shows
more clearly the differences between the two.
Another reason to do this is ages ago, knowingly, I did a piss poor attempt
at making conservative less responsive by knocking up
DEF_SAMPLING_RATE_LATENCY_MULTIPLIER by two orders of magnitude. I did fix
this ages ago but in my dis-organisation I must have toasted the diff and
left it the way it was. About two weeks ago a user contacted me saying he
was having problems with the conservative governor with his AMD Athlon XP-M
2800+ as /sys/devices/system/cpu/cpu0/cpufreq/conservative showed
sampling_rate_min 9950000
sampling_rate_max 1360065408
Nine seconds to decide about changing the frequency....not too responsive :)
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
The use of the 'ignore_nice' sysfs file is confusing to anyone using it.
This removes the sysfs file 'ignore_nice' and in its place creates a
'ignore_nice_load' entry that defaults to '0'; meaning nice'd processes
_are_ counted towards the 'business' calculation.
WARNING: this obvious breaks any userland tools that expected ignore_nice'
to exist, to draw attention to this fact it was concluded on the mailing
list that the entry should be removed altogether so the userland app breaks
and so the author can build simple to detect workaround. Having said that
it seems currently very few tools even make use of this functionality; all
I could find was a Gentoo Wiki entry.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Don't try to access not-present CPUs. Conservative governor will always
oops on SMP without this fix.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=4781
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] [3/5] ondemand,conservative governor idle_tick clean-up
Ondemand and conservative governor clean-up, it factorises the idle ticks
measurement.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [2/5] ondemand,conservative governor store the idle ticks for all cpus
Ondemand, conservative governor did not store prev_cpu_idle_up into
prev_cpu_idle_down for other CPUs than the current CPU.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [1/5] ondemand,conservative minor bug-fix and cleanup
Attached patch fixes some minor issues with Alexander's patch and related
cleanup in both ondemand and conservative governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
A new cpufreq module, based on the ondemand one with my additional patches
just posted. This one is more suitable for battery environments where its
probably more appealing to have the cpu freq gracefully increase and decrease
rather than flip between the min and max freq's.
N.B. Bruno Ducrot pointed out that the amd64's "do have unacceptable latency
between min and max freq transition, due to the step-by-step requirements
(200MHz IIRC)"; so AMD64 users would probably benefit from this too.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>