It turns out that we can't stop the watchdog from
triggering here. If we touch the timer (which just uses the current jiffie
value) before we enable interrupts, it does nothing because jiffies
are not mass-updated until after we enable interrupts. If we touch the
timer after we enable interrupts, its too late because the softlockup
watchdog will already have triggered. The touch_softlockup_watchdog
call removed below does nothing.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to prod everyone here since this is the only CPU that is
guaranteed to be running after the ibm,suspend-me RTAS call returns.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Correctly return the status from the RTAS call. rtas_call expects
to return the status as a return value.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
arch/powerpc/kernel/rtas.c is getting hvcall.h via spinlock.h, but when we're
building for UP we don't include spinlock.h.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This addresses two items, which are unlikely to be hit if we
trust drivers.
The first is moving a memory barrier below where the vmerged SG count
is passed back, but before the list is set to end. If those
instructions were reordered, there could be an issue in iommu_unmap_sg().
The second is making sure we terminate the list on the failure case of
iommu_map_sg(). If a driver does not look at the failure return code,
it could pass a ill-formed SG list to iommu_unmap_sg().
Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
You can't boot a kdump kernel via OF, not reliably anyway, the kernel being at
32 MB conflicts with the zImage wrapper etc. and it blows up.
It's trivial to check in prom_init though, and this is early enough that we can
actually drop back to OF where a reset-all will get you going again, which is
kinda nice. I think this should go in for 2.6.16.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To prevent problems later in boot, make sure we don't create zero-size lmb
regions.
I've checked all the callers, and at the moment no one should ever hit this.
All callers use a constant size, or they check the computed size before they
call us.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In prom.c we run finish_node() on allnodes twice. The first time we just
calculate how much memory we'll need, the second time we do the actual work.
If the calculation stage determines that we need 0 bytes, then we should skip
the lmb allocation. Although an alloc of zero will work, it has been seen to
lead to a BUG_ON() in reserve_bootmem() on at least one machine.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The last two 8MB TLB entries are being incorrectly set by initial_mmu on 8xx.
The first entry is written with the same virtual/physical address, which
renders it invalid:
BDI>rms 792 0x00001e00
BDI>rms 824 1
BDI>rds 824
SPR 824 : 0xc08000c0 -1065353024
BDI>rds 825
SPR 825 : 0xc0800de0 -1065349664
BDI>rds 826
SPR 826 : 0x00000000 0
And the second entry, in addition, does not have its TLB index set
correctly.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is a small fix to get the spufs init sequence right.
init_spu_base() in spu_base.c should be called (via
module_init(init_spu_base)) before spufs_init() (via
module_init(spufs_init)) in spufs/inode.c gets called.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When loading up the FPU, we were using a 'ld' (load doubleword)
instruction to get the FP exception mode from the thread_struct,
but it's only an int field. This changes the ld to lwz (load
word and zero-extend).
Signed-off-by: Paul Mackerras <paulus@samba.org>
I thought we had fixed up all non-gpl USB drivers, and was wrong to do
this.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The non-NUMA case would do an unmatched "free_alien_cache()" on an alien
pointer that had never been allocated.
It might not matter from a code generation standpoint (since in the
non-NUMA case, the code doesn't actually _do_ anything), but it not only
results in a compiler warning, it's really really ugly too.
Fix the compiler warning by just having a matching dummy allocation.
That also avoids an unnecessary #ifdef in the code.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On a system where libintl.h is present, but the NLS functionality is
supplied by a separate library instead of the system C library, an attempt
to "make config" or "make menuconfig" will fail with link errors, ex:
scripts/kconfig/mconf.o:mconf.c:(.text+0xf63): undefined reference to
`_libintl_gettext'
This patch attempts to correct the problem by detecting whether or not NLS
support requires linking with libintl.
Signed-off-by: Samuel J Robb <sam.robb@timesys.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Due to the usage of set_64bit in include/asm-i386/pgtable-3level.h,
HIGHMEM64G must depend on X86_CMPXCHG64.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix gcc4.1 compile warnings "value computed is not used" with
set_current_state() and set_task_state() on i386/SMP and x86-64.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Show first field of kernel version in register dumps like x86_64 does.
Changes output from e.g.:
(2.6.16-rc1)
to:
(2.6.16-rc1 #12)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 CPU init code accesses freed init memory when booting a newly-started
processor after CPU hotplug. The cpu_devs array is searched to find the
vendor and it contains pointers to freed data.
Fix that by:
1. Zeroing entries for freed vendor data after bootup.
2. Changing Transmeta, NSC and UMC to all __init[data].
3. Printing a warning (once only) and setting this_cpu
to a safe default when the vendor is not found.
This does not change behavior for AMD systems. They were broken already
but no error was reported.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
(for instance NFS) in order to determine whether or not it is looking up
the last component of the path. It this is the case, it may have to look
at the intent information in order to perform various tasks such as atomic
open.
A problem currently occurs when link_path_walk() hits a symlink. In this
case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
path passed by __vfs_follow_link() (i.e. the end of the symlink path)
rather than when we hit the end of the path passed by the user.
The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
if that flag was unset when we entered the function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes locking and bugs in cpu_down and cpu_up paths of the NUMA slab
allocator. Sonny Rao <sonny@burdell.org> reported problems sometime back on
POWER5 boxes, when the last cpu on the nodes were being offlined. We could
not reproduce the same on x86_64 because the cpumask (node_to_cpumask) was not
being updated on cpu down. Since that issue is now fixed, we can reproduce
Sonny's problems on x86_64 NUMA, and here is the fix.
The problem earlier was on CPU_DOWN, if it was the last cpu on the node to go
down, the array_caches (shared, alien) and the kmem_list3 of the node were
being freed (kfree) with the kmem_list3 lock held. If the l3 or the
array_caches were to come from the same cache being cleared, we hit on
badness.
This patch cleans up the locking in cpu_up and cpu_down path. We cannot
really free l3 on cpu down because, there is no node offlining yet and even
though a cpu is not yet up, node local memory can be allocated for it. So l3s
are usually allocated at keme_cache_create and destroyed at
kmem_cache_destroy. Hence, we don't need cachep->spinlock protection to get
to the cachep->nodelist[nodeid] either.
Patch survived onlining and offlining on a 4 core 2 node Tyan box with a 4
dbench process running all the time.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier, we had to disable on chip interrupts while taking the
cachep->spinlock because, at cache_grow, on every addition of a slab to a slab
cache, we incremented colour_next which was protected by the cachep->spinlock,
and cache_grow could occur at interrupt context. Since, now we protect the
per-node colour_next with the node's list_lock, we do not need to disable on
chip interrupts while taking the per-cache spinlock, but we just need to
disable interrupts when taking the per-node kmem_list3 list_lock.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
colour_next is used as an index to add a colouring offset to a new slab in the
cache (colour_off * colour_next). Now with the NUMA aware slab allocator, it
makes sense to colour slabs added on the same node sequentially with
colour_next.
This patch moves the colouring index "colour_next" per-node by placing it on
kmem_list3 rather than kmem_cache.
This also helps simplify locking for CPU up and down paths.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just spent some time researching a Bus Error. Turns out that the huge
page fault handler can return VM_FAULT_SIGBUS for various conditions where
no huge page is available.
Add a note explaining the reasoning in the source.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ben points out that:
When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
significant drop in throughput as the disk sits idle. The patch below
results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
IDE test box) when writing out files using O_SYNC.
So optimise the batching code by omitting it entirely if the process which is
doing a sync write is the same as the one which did the most recent sync
write. If that's true, we're unlikely to get any other processes joining the
transaction.
(Has been in -mm for ages - it took me a long time to get on to performance
testing it)
Numbers, on write-cache-disabled IDE:
/usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name
Unpatched:
40 seconds
Patched:
35 seconds
Batching disabled:
35 seconds
This is the problematic single-process-doing-fsync case. With multiple
fsyncing processes the numbers are AFACIT unaltered by the patch.
Aside: performance testing and instrumentation shows that the transaction
batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
dir-name on non-writeback-caching IDE). This is because by the time one
process is running a synchronous commit, a bunch of other processes already
have a transaction handle open, so they're all going to batch into the same
transaction anyway.
The batching seems to offer maybe 5-10% speedup with this workload, but I'm
pretty sure it was more important than that when it was first developed 4-odd
years ago...
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_REISERFS_FS_XATTR=y, CONFIG_REISERFS_FS_POSIX_ACL=n:
fs/reiserfs/xattr.c: In function `reiserfs_check_acl':
fs/reiserfs/xattr.c:1330: called object is not a function
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dump_stack() on page allocation failure presently has an irritating habit
of shouting just "====" at everyone: please stop it.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make SELinux depend on SECURITY_NETWORK (which depends on SECURITY), as it
requires the socket hooks for proper operation even in the local case.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It may suck something awful, but it shouldn't taint the kernel.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pktcdvd driver uses a compile time macro constant to define the maximum
supported packet length. I changed this from 32 sectors to 128 sectors
because that allows over 100 MB of additional usable space on a 700 MB cdrw,
and increases throughput.
Note that you need a modified cdrwtool program that can format a CDRW disc
with larger packets to benefit from this change.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allocate memory for read-gathering at open time, when it is known just how
much memory is needed. This avoids wasting kernel memory when the real packet
size is smaller than the maximum packet size supported by the driver. This is
always the case when using DVD discs.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Unless the help text is outdated, this seems to be logical.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The version information is not useful for a driver that is maintained in
Linus' kernel tree.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pktcdvd driver was using an 8 bit field to store the packet length
obtained from the disc track info. This causes it to overflow packet length
values of 128KB or more. I changed the field to 32 bits to fix this.
The pktcdvd driver defaulted to its maximum allowed packet length when it
detected a 0 in the track info field. I changed this to fail the operation
and refuse to access the media. This seems more sane than attempting to
access it with a value that almost certainly will not work.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
migration_cost prints after every CPU hotplug event. Make it print only
once at boot.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last fix for this function in fact opened up a much more often
triggering race.
It was uncommented tricky code, that was buggy. Add comment, make it less
tricky and fix bug.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix http://bugzilla.kernel.org/show_bug.cgi?id=5923
When a scsi command failed, an oops would result.
Back-to-back SMART queries would make the Seagate drives unhappy. The
second SMART query would timeout, and the command would be aborted.
Acked-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Kenny Simpson <theonetruekenny@yahoo.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
q->ordcolor must not be flipped on SOFTBARRIER.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously, if a fs request which was being drained failed and got
requeued, blk_do_ordered() didn't allow it to be reissued, which causes
queue stall. This patch makes blk_do_ordered() use the sequence of each
request to determine whether a request can be issued or not. This fixes
the bug and simplifies code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 10f4dc8b27.
Quoth Andi Kleen:
"Kiran decided that it makes the problem worse than it was before.
Fixing it fully requires more work which is too much for 2.6.16. So
please revert that commit for now."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After DNAT the original dst_entry needs to be released if present
so the packet doesn't skip input routing with its new address. The
current check for DNAT in ip_nat_in is reversed and checks for SNAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv4 and IPv6 version of the policy match are identical besides address
comparison and the data structure used for userspace communication. Unify
the data structures to break compatiblity now (before it is released), so
we can port it to x_tables in 2.6.17.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix two bugs in ip6t_policy address matching:
- misorder arguments to ip6_masked_addrcmp, mask must be the second argument
- inversion incorrectly applied to the entire expression instead of just
the address comparison
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>