I have had four seperate system lockups attributable to this exact problem
in two days of testing. Instead of trying to handle all the weird end
cases and wrap, how about changing it to look for exactly what we appear
to want.
The following patch removes a couple races in setup_APIC_timer. One occurs
when the HPET advances the COUNTER past the T0_CMP value between the time
the T0_CMP was originally read and when COUNTER is read. This results in
a delay waiting for the counter to wrap. The other results from the counter
wrapping.
This change takes a snapshot of T0_CMP at the beginning of the loop and
simply loops until T0_CMP has changed (a tick has happened).
<later>
I have one small concern about the patch. I am not sure it meets the intent
as well as it should. I think we are trying to match APIC timer interrupts up
with the hpet counter increment. The event which appears to be disturbing
this loop in our test environment is the NMI watchdog. What we believe has
been happening with the existing code is the setup_APIC_timer loop has read
the CMP value, and the NMI watchdog code fires for the first time. This
results in a series of icache miss slowdowns and by the time we get back to
things it has wrapped.
I think this code is trying to get the CMP as close to the counter value as
possible. If that is the intent, maybe we should really be testing against a
"window" around the CMP. Something like COUNTER = CMP+/2. It appears COUNTER
should get advanced every 89nSec (IIRC). The above seems like an unreasonably
small window, but may be necessary. Without documentation, I am not sure of
the original intent with this code.
In summary, this code fixes my boot hangs, but since I am not certain of the
intent of the existing code, I am not certain this has not introduced new bugs
or unexpected behaviors.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: "Aaron Durbin" <adurbin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a little problem in Documentation/vm/slabinfo.c
The code is using "%d" in a printf() call to print an 'unsigned long'.
This patch corrects it to use "%lu" instead.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
The dynamic dma kmalloc creation can run into trouble if a
GFP_ATOMIC allocation is the first one performed for a certain size
of dma kmalloc slab.
- Move the adding of the slab to sysfs into a workqueue
(sysfs does GFP_KERNEL allocations)
- Do not call kmem_cache_destroy() (uses slub_lock)
- Only acquire the slub_lock once and--if we cannot wait--do a trylock.
This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
for a range of sizes failing due to another process holding the slub_lock.
However, we only need to acquire the spinlock once in order to establish
each power of two DMA kmalloc cache. The possible conflict is with the
slub_lock taken during slab management actions (create / remove slab cache).
It is rather typical that a driver will first fill its buffers using
GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
Drivers will also create its slab caches first outside of an atomic
context before starting to use atomic kmalloc from an interrupt context.
If there are any failures then they will occur early after boot or when
loading of multiple drivers concurrently. Drivers can already accomodate
failures of GFP_ATOMIC for other reasons. Retries will then create the slab.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
The MAX_PARTIAL checks were supposed to be an optimization. However, slab
shrinking is a manually triggered process either through running slabinfo
or by the kernel calling kmem_cache_shrink.
If one really wants to shrink a slab then all operations should be done
regardless of the size of the partial list. This also fixes an issue that
could surface if the number of partial slabs was initially above MAX_PARTIAL
in kmem_cache_shrink and later drops below MAX_PARTIAL through the
elimination of empty slabs on the partial list (rare). In that case a few
slabs may be left off the partial list (and only be put back when they
are empty).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
UIO currently contains a rather dubious statement which wants removing.
The actual questions around whether user space code that depends tightly
on kernel GPL code designed to co-work with it are derivative works of
the kernel is extremely complex, and since we don't have space for either
a masters length essay on legal issues or need to start flamewars lets
simply remove the comment and leave law to lawyers
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <tovalds@linux-foundation.org>
The default definition in asm-generic conflicts with Alpha's O_DIRECT,
so, like several other arches, it needs to be redefined.
Signed-off-by: Richard Hendersion <rth@twiddle.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends
NFS: Replace flush_scheduled_work with cancel_work_sync() and friends
SUNRPC: Don't call gss_delete_sec_context() from an rcu context
NFSv4: Don't call put_rpccred() from an rcu callback
NFS: Fix NFSv4 open stateid regressions
NFSv4: Fix a locking regression in nfs4_set_mode_locked()
NFS: Fix put_nfs_open_context
SUNRPC: Fix a race in rpciod_down()
Trivial fix: mark the buffer to hexdump as const so callers could avoid
casting their const buffers when calling print_hex_dump().
The patch is really trivial and I suggest to consider it as a fix
(it fixes GCC warnings) and push it to current tree.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix memory leak when cpu hotplugging.
[SPARC64]: Do not assume sun4v chips have load-twin/store-init support.
[SPARC64]: Fix hard-coding of cpu type output in /proc/cpuinfo on sun4v.
[SPARC]: Centralize find_in_proplist() instead of duplicating N times.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: at91_mci: remove whitespace at the end of lines
mmc: reorganize bounce buffer init
wbsd: fix section mismatch warnings
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (61 commits)
sched: refine negative nice level granularity
sched: fix update_stats_enqueue() reniced codepath
sched: round a bit better
sched: make the multiplication table more accurate
sched: optimize update_rq_clock() calls in the load-balancer
sched: optimize activate_task()
sched: clean up set_curr_task_fair()
sched: remove __update_rq_clock() call from entity_tick()
sched: move the __update_rq_clock() call to scheduler_tick()
sched debug: remove the 'u64 now' parameter from print_task()/_rq()
sched: remove the 'u64 now' local variables
sched: remove the 'u64 now' parameter from deactivate_task()
sched: remove the 'u64 now' parameter from dequeue_task()
sched: remove the 'u64 now' parameter from enqueue_task()
sched: remove the 'u64 now' parameter from dec_nr_running()
sched: remove the 'u64 now' parameter from inc_nr_running()
sched: remove the 'u64 now' parameter from dec_load()
sched: remove the 'u64 now' parameter from inc_load()
sched: remove the 'u64 now' parameter from update_curr_load()
sched: remove the 'u64 now' parameter from ->task_new()
...
Some versions of ld.so mmap the shared libraries right in over guest
memory, so compile lguest statically by default.
[ FC7 maps shared libraries very low, where the launcher maps guest's
physical memory. Quick fix is to link Launcher static, real fix is
for 2.6.24. ]
-static is a simple fix. I expect this problem will be more common than we
like, as different distro's make different "improvements" to ld.so
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a Guest makes hypercall which sets a GDT entry to not present, we
currently set any segment registers using that GDT entry to 0.
Unfortunately, this is not sufficient: there are other ways of
altering GDT entries which will cause a fault.
The correct solution to do what Linux does: let them set any GDT value
they want and handle the #GP when popping causes a fault. This has
the added benefit of making our Switcher slightly more robust in the
case of any other bugs which cause it to fault.
We kill the Guest if it causes a fault in the Switcher: it's the
Guest's responsibility to make sure it's not using segments when it
changes them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lguest uses a host-supplied wallclock-based clocksource when the TSC
is not reliable. As this is already in nanoseconds, I naively used a
multiplier of 1 and a shift of 0.
But update_wall_time() in its infinite wisdom decides to adjust the
clock a little (where does it think it's getting a more accurate time
from?)
It will happily tweak the multiplier... to 0, then -1.
So the "fix" is to use a shift of 22 like everyone else, and a
multiplier of 1 << 22.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0fc4969b86. It was
always meant to be temporary, but it's generating more useless noise
than anything else, and we probably should never have done it in the
generic kernel (only had the people involved test it on their own).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some cleanup with whitespace/tab at the end of lines.
Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Reorganize the code that initializes mmc_block's bounce buffer in
order to avoid warnings when MMC_BLOCK_BOUNCE isn't used.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
This patch fixes the following section mismatch warnings
...
WARNING: vmlinux.o(.init.text+0x29d40): Section mismatch: reference to .exit.text:wbsd_release_resources (between 'wbsd_init' and 'wbsd_probe')
WARNING: vmlinux.o(.init.text+0x29d49): Section mismatch: reference to .exit.text:wbsd_free_mmc (between 'wbsd_init' and 'wbsd_probe')
WARNING: vmlinux.o(.init.text+0x29f28): Section mismatch: reference to .exit.text:wbsd_free_mmc (between 'wbsd_init' and 'wbsd_probe')
...
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
refine the granularity of negative nice level tasks: let them
reschedule more often to offset the effect of them consuming
their wait_runtime proportionately slower. (This makes nice-0
task scheduling smoother in the presence of negatively
reniced tasks.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the key has to be rescaled to /weight even if it has a positive value.
(this change only affects the scheduling of reniced tasks)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
round a tiny bit better in high-frequency rescheduling scenarios,
by rounding around zero instead of rounding down.
(this is pretty theoretical though)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
optimize update_rq_clock() calls in the load-balancer: update them
right after locking the runqueue(s) so that the pull functions do
not have to call it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
optimize activate_task() by removing update_rq_clock() from it.
(and add update_rq_clock() to all callsites of activate_task() that
did not have it before.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clean up set_curr_task_fair().
( identity transformation that causes no change in functionality. )
text data bss dec hex filename
39170 3750 36 42956 a7cc sched.o.before
39170 3750 36 42956 a7cc sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove __update_rq_clock() call from entity_tick().
no change in functionality because scheduler_tick() already calls
__update_rq_clock().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the __update_rq_clock() call from update_cpu_load() to
scheduler_tick().
( identity transformation that causes no change in functionality. )
this allows the direct use of rq->clock in ->task_tick() functions.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from sched_debug.c:print_task()/_rq().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
final step: remove all (now superfluous) 'u64 now' variables.
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from deactivate_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dequeue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from enqueue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dec_nr_running().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from inc_nr_running().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dec_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from inc_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_curr_load().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->task_new().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->put_prev_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from pick_next_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->pick_next_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->dequeue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from ->enqueue_task().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from update_curr_rt().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from put_prev_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from pick_next_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from set_next_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the 'u64 now' parameter from dequeue_entity().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>