Commit Graph

40454 Commits

Author SHA1 Message Date
Mel Gorman
0c6cb97463 [PATCH] Calculation fix for memory holes beyong the end of physical memory
absent_pages_in_range() made the assumption that users of the
arch-independent zone-sizing API would not care about holes beyound the end
of physical memory.  This was not the case and was "fixed" in a patch
called "Account for holes that are outside the range of physical memory".
However, when given a range that started before a hole in "real" memory and
ended beyond the end of memory, it would get the result wrong.  The bug is
in mainline but a patch is below.

It has been tested successfully on a number of machines and architectures.
Additional credit to Keith Mannthey for discovering the problem, helping
identify the correct fix and confirming it Worked For Him.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: keith mannthey <kmannth@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Alan Stern
057647fc47 [PATCH] workqueue: update kerneldoc
This patch (as812) changes the kerneldoc comments explaining the return
values from queue_work(), queue_delayed_work(), and
queue_delayed_work_on().  The updated comments explain more accurately the
meaning of the return code and avoid suggesting that a 0 value means the
routine was unsuccessful.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Alan Cox
c333526f48 [PATCH] JMB 368 PATA detection
The Jmicron JMB368 is PATA only so has the PATA on function zero.  Don't
therefore skip function zero on this device when probing

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Satoru Takeuchi
8fa1d7d3b2 [PATCH] cpu-hotplug: release `workqueue_mutex' properly on CPU hot-remove
_cpu_down() acquires `workqueue_mutex' on its process, but doen't release it
if __cpu_disable() fails.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Jim Houston
bb1d860551 [PATCH] time_adjust cleared before use
I notice that the code which implements adjtime clears the time_adjust
value before using it.  The attached patch makes the obvious fix.

Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Jim Houston <jim.houston@ccur.com>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Randy Dunlap
eba6cd6714 [PATCH] move SYS_HYPERVISOR inside the Generic Driver menu
Put SYS_HYPERVISOR inside the Generic Driver Config menu where it should
be.  Otherwise xconfig displays it as a dangling (lost) menu item under
Device Drivers, all by itself (when all options are displayed).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <holzheu@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Oleg Nesterov
d7c3f5f231 [PATCH] fill_tgid: cleanup delays accounting
fill_tgid() should skip not only an already exited group leader.  If the
task has ->exit_state != 0 it already did exit_notify(), so it also did
fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it
to avoid a double accounting.

This patch doesn't close the race completely, but it cleanups the code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:55 -07:00
Oleg Nesterov
a98b609426 [PATCH] taskstats: don't use tasklist_lock
Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe.
->siglock allows us to traverse subthread without tasklist.

Q: delay accounting looks wrong to me.  If sub-thread has already called
taskstats_exit_send() but didn't call release_task(self) yet it will be
accounted twice.  The window is big.  No?

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Oleg Nesterov
b8534d7bd8 [PATCH] taskstats: kill ->taskstats_lock in favor of ->siglock
signal_struct is (mostly) protected by ->sighand->siglock, I think we don't
need ->taskstats_lock to protect ->stats.  This also allows us to simplify the
locking in fill_tgid().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Oleg Nesterov
17b02695b2 [PATCH] taskstats_tgid_alloc: optimization
Every subthread (except first) does unneeded kmem_cache_alloc/kmem_cache_free.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Oleg Nesterov
093a8e8aec [PATCH] taskstats_tgid_free: fix usage
taskstats_tgid_free() is called on copy_process's error path. This is wrong.

	IF (clone_flags & CLONE_THREAD)
		We should not clear ->signal->taskstats, current uses it,
		it probably has a valid accumulated info.
	ELSE
		taskstats_tgid_init() set ->signal->taskstats = NULL,
		there is nothing to free.

Move the callsite to __exit_signal(). We don't need any locking, entire
thread group is exiting, nobody should have a reference to soon to be
released ->signal.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Oleg Nesterov
05d5bcd60e [PATCH] bacct_add_tsk: fix unsafe and wrong parent/group_leader dereference
1. ts = timespec_sub(uptime, current->group_leader->start_time);

   It is possible that current != tsk. Probably it was supposed
   to be 'tsk->group_leader->start_time. But why we are reading
   group_leader's start_time ? This accounting is per thread,
   not per procees, I changed this to 'tsk->start_time.
   Please corect me.

2. stats->ac_ppid = (tsk->parent) ? tsk->parent->pid : 0;

   tsk->parent never == NULL, and it is unsafe to dereference it.
   Both the task and it's parent may exit after the caller unlocks
   tasklist_lock, the memory could be unmapped (DEBUG_SLAB).
   (And we should use ->real_parent->tgid in fact).

Q: I don't understand the 'if (thread_group_leader(tsk))' check.
Why it is needed ?

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Oleg Nesterov
fca178c0c6 [PATCH] fill_tgid: fix task_struct leak and possible oops
1. fill_tgid() forgets to do put_task_struct(first).

2. release_task(first) can happen after fill_tgid() drops tasklist_lock,
   it is unsafe to dereference first->signal.

This is a temporary fix, imho the locking should be reworked.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Michael Holzheu
6e6d9fa6f9 [PATCH] strstrip remove last blank fix
strstrip() does not remove the last blank from strings which only consist
of blanks.

Example:
char string[] = "  ";
strstrip(string);

results in " ", but should produce an empty string!

The following patch solves this problem:

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by Joern Engel <joern@wh.fh-wedel.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
Stephen Rothwell
5fa3839a64 [PATCH] Constify compat_get_bitmap argument
This means we can call it when the bitmap we want to fetch is declared
const.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:54 -07:00
David Howells
f87135762d [PATCH] VFS: Fix an error in unused dentry counting
With Vasily Averin <vvs@sw.ru>

Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree()
in which the count is modified without the dcache_lock held.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Vasily Averin <vvs@sw.ru>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Vasily Averin
6eac3f93f5 [PATCH] missing unused dentry in prune_dcache()?
On the the following patch:
http://linux.bkbits.net:8080/linux-2.6/gnupatch@449b144ecSF1rYskg3q-SeR2vf88zg

# ChangeSet
#   2006/06/22 15:05:57-07:00 neilb@suse.de
#   [PATCH] Fix dcache race during umount

#   If prune_dcache finds a dentry that it cannot free, it leaves it where it
#   is (at the tail of the list) and exits, on the assumption that some other
#   thread will be removing that dentry soon.

However as far as I see this comment is not correct: when we cannot take
s_umount rw_semaphore (for example because it was taken in do_remount) this
dentry is already extracted from dentry_unused list and we do not add it
into the list again.  Therefore dentry will not be found by prune_dcache()
and shrink_dcache_sb() and will leave in memory very long time until the
partition will be unmounted.

The patch adds this dentry into tail of the dentry_unused list.

Signed-off-by: Vasily Averin <vvs@sw.ru>
Cc: Neil Brown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Hugh Dickins
ebed4bfc8d [PATCH] hugetlb: fix absurd HugePages_Rsvd
If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated
area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative".  Reinstate my
preliminary i_size check before attempting to allocate the page (though this
only fixes the most obvious case: more work will be needed here).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Hugh Dickins
856fc29505 [PATCH] hugetlb: fix prio_tree unit
hugetlb_vmtruncate_list was misconverted to prio_tree: its prio_tree is in
units of PAGE_SIZE (PAGE_CACHE_SIZE) like any other, not HPAGE_SIZE (whereas
its radix_tree is kept in units of HPAGE_SIZE, otherwise slots would be
absurdly sparse).

At first I thought the error benign, just calling __unmap_hugepage_range on
more vmas than necessary; but on 32-bit machines, when the prio_tree is
searched correctly, it happens to ensure the v_offset calculation won't
overflow.  As it stood, when truncating at or beyond 4GB, it was liable to
discard pages COWed from lower offsets; or even to clear pmd entries of
preceding vmas, triggering exit_mmap's BUG_ON(nr_ptes).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Hugh Dickins
b9d7e6ae82 [PATCH] hugetlb: fix size=4G parsing
On 32-bit machines, mount -t hugetlbfs -o size=4G gave a 0GB filesystem,
size=5G gave a 1GB filesystem etc: there's no point in masking size with
HPAGE_MASK just before shifting its lower bits away, and since HPAGE_MASK is a
UL, that removed all the higher bits of the unsigned long long size.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Randy Dunlap
7b92aadfda [PATCH] cciss: fix printk format warning
Fix printk format warnings:
drivers/block/cciss.c:2000: warning: long long int format, long unsigned int arg (arg 2)
drivers/block/cciss.c:2035: warning: long long int format, long unsigned int arg (arg 2)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Randy Dunlap
760fe9ad16 [PATCH] ioc4: fix printk format warning
Fix printk format warning:
drivers/misc/ioc4.c:213: warning: long long int format, u64 arg (arg 3)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Jan Dittmer
1d4d262769 [PATCH] Add missing space in module.c for taintskernel
Obvious fix.

Signed-off-by: Jan Dittmer <jdi@l4x.org>
Acked-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:53 -07:00
Andrey Panin
08d892f11a [PATCH] visws build fix
Fix this:

> Subject    : CONFIG_X86_VISWS=3Dy, CONFIG_SMP=3Dn compile error
> References : http://lkml.org/lkml/2006/10/7/51
> Submitter  : Jesper Juhl <jesper.juhl@gmail.com>
> Caused-By  : David Howells <dhowells@redhat.com>
>              commit 7d12e780e0
> Status     : unknown

Via undescribed means.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:52 -07:00
Giridhar Pemmasani
52fd24ca1d [PATCH] __vmalloc with GFP_ATOMIC causes 'sleeping from invalid context'
If __vmalloc is called to allocate memory with GFP_ATOMIC in atomic
context, the chain of calls results in __get_vm_area_node allocating memory
for vm_struct with GFP_KERNEL, causing the 'sleeping from invalid context'
warning.  This patch fixes it by passing the gfp flags along so
__get_vm_area_node allocates memory for vm_struct with the same flags.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:52 -07:00
Pavel Emelianov
6a2aae06cc [PATCH] Fix potential OOPs in blkdev_open()
blkdev_open() calls bc_acquire() to get a struct block_device.  Since
bc_acquire() may return NULL when system is out of memory an appropriate
check is required.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:52 -07:00
Yasunori Goto
f2d0aa5bf8 [PATCH] memory hotplug: __GFP_NOWARN is better for __kmalloc_section_memmap()
Add __GFP_NOWARN flag to calling of __alloc_pages() in
__kmalloc_section_memmap().  It can reduce noisy failure message.

In ia64, section size is 1 GB, this means that order 8 pages are necessary
for each section's memmap.  It is often very hard requirement under heavy
memory pressure as you know.  So, __alloc_pages() gives up allocation and
shows many noisy stack traces which means no page for each sections.
(Current my environment shows 32 times of stack trace....)

But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
can succeed allocation of memmap.  So, its stack trace warning becomes just
noisy.  I suppose it shouldn't be shown.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:52 -07:00
Randy Dunlap
969b755aad [PATCH] md: fix printk format warnings, seen on powerpc64:
drivers/md/raid1.c:1479: warning: long long unsigned int format, long unsigned int arg (arg 4)
drivers/md/raid10.c:1475: warning: long long unsigned int format, long unsigned int arg (arg 4)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:52 -07:00
NeilBrown
750a8f3e8f [PATCH] md: fix up maintenance of ->degraded in multipath
A recent fix which made sure ->degraded was initialised properly exposed a
second bug - ->degraded wasn't been updated when drives failed or were
hot-added.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
NeilBrown
01ab5662f5 [PATCH] md: simplify checking of available size when resizing an array
When "mdadm --grow --size=xxx" is used to resize an array (use more or less of
each device), we check the new siza against the available space in each
device.

We already have that number recorded in rdev->size, so calculating it is
pointless (and wrong in one obscure case).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
NeilBrown
2b6e845986 [PATCH] md: fix bug where spares don't always get rebuilt properly when they become live
If save_raid_disk is >= 0, then the device could be a device that is already
in sync that is being re-added.  So we need to default this value to -1.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
bibo,mao
ae74589cb3 [PATCH] fix efi_memory_present_wrapper()
efi_memory_present_wrapper() parameter start/end is physical address, but
function memory_present parameter is PFN, this patch converts physical
address to PFN.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
Eric Sandeen
9b57988db9 [PATCH] jbd2: journal_dirty_data re-check for unmapped buffers
When running several fsx's and other filesystem stress tests, we found
cases where an unmapped buffer was still being sent to submit_bh by the
ext3 dirty data journaling code.

I saw this happen in two ways, both related to another thread doing a
truncate which would unmap the buffer in question.

Either we would get into journal_dirty_data with a bh which was already
unmapped (although journal_dirty_data_fn had checked for this earlier, the
state was not locked at that point), or it would get unmapped in the middle
of journal_dirty_data when we dropped locks to call sync_dirty_buffer.

By re-checking for mapped state after we've acquired the bh state lock, we
should avoid these races.  If we find a buffer which is no longer mapped,
we essentially ignore it, because journal_unmap_buffer has already decided
that this buffer can go away.

I've also added tracepoints in these two cases, and made a couple other
tracepoint changes that I found useful in debugging this.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
Eric Sandeen
f58a74dca8 [PATCH] jbd: journal_dirty_data re-check for unmapped buffers
When running several fsx's and other filesystem stress tests, we found
cases where an unmapped buffer was still being sent to submit_bh by the
ext3 dirty data journaling code.

I saw this happen in two ways, both related to another thread doing a
truncate which would unmap the buffer in question.

Either we would get into journal_dirty_data with a bh which was already
unmapped (although journal_dirty_data_fn had checked for this earlier, the
state was not locked at that point), or it would get unmapped in the middle
of journal_dirty_data when we dropped locks to call sync_dirty_buffer.

By re-checking for mapped state after we've acquired the bh state lock, we
should avoid these races.  If we find a buffer which is no longer mapped,
we essentially ignore it, because journal_unmap_buffer has already decided
that this buffer can go away.

I've also added tracepoints in these two cases, and made a couple other
tracepoint changes that I found useful in debugging this.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
Randy Dunlap
1939e49a0c [PATCH] ext4: fix printk format warnings
fs/ext4/resize.c:72: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:76: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:81: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:85: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:89: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:89: warning: long long unsigned int format, __u64 arg (arg 5)
fs/ext4/resize.c:93: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:93: warning: long long unsigned int format, __u64 arg (arg 5)
fs/ext4/resize.c:98: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:103: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:109: warning: long long unsigned int format, __u64 arg (arg 4)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
Martin Bligh
bbdb396a60 [PATCH] Use min of two prio settings in calculating distress for reclaim
If try_to_free_pages / balance_pgdat are called with a gfp_mask specifying
GFP_IO and/or GFP_FS, they will reclaim the requisite number of pages, and the
reset prev_priority to DEF_PRIORITY (or to some other high (ie: unurgent)
value).

However, another reclaimer without those gfp_mask flags set (say, GFP_NOIO)
may still be struggling to reclaim pages.  The concurrent overwrite of
zone->prev_priority will cause this GFP_NOIO thread to unexpectedly cease
deactivating mapped pages, thus causing reclaim difficulties.

Fix this is to key the distress calculation not off zone->prev_priority, but
also take into account the local caller's priority by using
min(zone->prev_priority, sc->priority)

Signed-off-by: Martin J. Bligh <mbligh@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
Martin Bligh
3bb1a852ab [PATCH] vmscan: Fix temp_priority race
The temp_priority field in zone is racy, as we can walk through a reclaim
path, and just before we copy it into prev_priority, it can be overwritten
(say with DEF_PRIORITY) by another reclaimer.

The same bug is contained in both try_to_free_pages and balance_pgdat, but
it is fixed slightly differently.  In balance_pgdat, we keep a separate
priority record per zone in a local array.  In try_to_free_pages there is
no need to do this, as the priority level is the same for all zones that we
reclaim from.

Impact of this bug is that temp_priority is copied into prev_priority, and
setting this artificially high causes reclaimers to set distress
artificially low.  They then fail to reclaim mapped pages, when they are,
in fact, under severe memory pressure (their priority may be as low as 0).
This causes the OOM killer to fire incorrectly.

From: Andrew Morton <akpm@osdl.org>

__zone_reclaim() isn't modifying zone->prev_priority.  But zone->prev_priority
is used in the decision whether or not to bring mapped pages onto the inactive
list.  Hence there's a risk here that __zone_reclaim() will fail because
zone->prev_priority ir large (ie: low urgency) and lots of mapped pages end up
stuck on the active list.

Fix that up by decreasing (ie making more urgent) zone->prev_priority as
__zone_reclaim() scans the zone's pages.

This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created.  It should be
possible to remove that now, and to just start out at DEF_PRIORITY?

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:50 -07:00
Nick Piggin
2ae88149a2 [PATCH] mm: clean up pagecache allocation
- Consolidate page_cache_alloc

- Fix splice: only the pagecache pages and filesystem data need to use
  mapping_gfp_mask.

- Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:50 -07:00
Takashi Ohmasa
e0f205d9c6 [ARM] 3900/1: Fix VFP Division by Zero exception handling.
The SIGFPE signal should be generated if Division by Zero exception is detected.

Signed-off-by: Takashi Ohmasa <ohmasa.takashi@jp.panasonic.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Takashi Ohmasa
e816d71a50 [ARM] 3899/1: Fix the normalization of the denormal double precision number.
The significand should be shifted until the value of bit [62] is 1
to normalize the denormal double number.

Signed-off-by: Takashi Ohmasa <ohmasa.takashi@jp.panasonic.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Kevin Hilman
75e31aaaf4 [ARM] 3909/1: Disable UWIND_INFO for ARM (again)
According to Daniel Jacobowitz, UNWIND_INFO is not useful on ARM, and
in fact doesn't even compile.

This patch disables the option for ARM.

Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Russell King
9957329800 [ARM] Add __must_check to uaccess functions
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Russell King
a233bf9ee8 [ARM] Add realview SMP default configuration
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Russell King
c97d4869a2 [ARM] Fix SMP irqflags support
The IRQ changes a while back broke the build for SMP machines.
Fix up the SMP code to use set_irq_regs/get_irq_regs as
appropriate.  Also, fix a warning in arch/arm/kernel/time.c
where 'regs' becomes unused for SMP builds.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-28 10:15:31 +01:00
Linus Torvalds
858cbcdd4f Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC]: Fix bus_id[] string overflow.
2006-10-27 15:36:21 -07:00
Linus Torvalds
fe31eb6797 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
  PCI: Remove quirk_via_abnormal_poweroff
  PCI: reset pci device state to unknown state for resume
  PCI: x86-64: mmconfig missing printk levels
  PCI: fix pci_fixup_video as it blows up on sparc64
  acpiphp: fix latch status
2006-10-27 15:35:28 -07:00
Jesper Juhl
efbfe96c5d [PATCH] silence 'make xmldocs' warning by adding missing description of 'raw' in nand_base.c:1485
Add description of 'raw' in comments for
drivers/mtd/nand/nand_base.c::nand_write_page_syndrome() so 'make xmldocs'
will not spew a warning at us.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-27 15:34:51 -07:00
Andrew Morton
735a7ffb73 [PATCH] drivers: wait for threaded probes between initcall levels
The multithreaded-probing code has a problem: after one initcall level (eg,
core_initcall) has been processed, we will then start processing the next
level (postcore_initcall) while the kernel threads which are handling
core_initcall are still executing.  This breaks the guarantees which the
layered initcalls previously gave us.

IOW, we want to be multithreaded _within_ an initcall level, but not between
different levels.

Fix that up by causing the probing code to wait for all outstanding probes at
one level to complete before we start processing the next level.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-27 15:34:51 -07:00
Andrew Morton
61ce1efe6e [PATCH] vmlinux.lds: consolidate initcall sections
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.

This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-27 15:34:51 -07:00
Karsten Wiese
3560cc5ec3 PCI: Remove quirk_via_abnormal_poweroff
My K8T800 mobo resumes fine from suspend to ram with and without patch
applied against 2.6.18.

quirk_via_abnormal_poweroff makes some boards not boot 2.6.18, so IMO patch
should go to head, 2.6.18.2 and everywhere "ACPI: ACPICA 20060623" has been
applied.


Remove quirk_via_abnormal_poweroff

Obsoleted by "ACPI: ACPICA 20060623":
<snip>
    Implemented support for "ignored" bits in the ACPI
    registers.  According to the ACPI specification, these
    bits should be preserved when writing the registers via
    a read/modify/write cycle. There are 3 bits preserved
    in this manner: PM1_CONTROL[0] (SCI_EN), PM1_CONTROL[9],
    and PM1_STATUS[11].
    http://bugzilla.kernel.org/show_bug.cgi?id=3691
</snip>

Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Cc: Bob Moore <robert.moore@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-10-27 11:20:33 -07:00