* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix timekeeping on PowerPC 601
[POWERPC] Don't expose clock vDSO functions when CPU has no timebase
[POWERPC] spusched: Fix null pointer dereference in find_victim
Add a workaround to address warnings generated on the "n" constraint by
GCC 3.3 and below.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map(). Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.
Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.
Default system policy should not require additional ref counting, nor
should the current task's task policy. However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.
This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy. Again, shared policy is already reference counted on lookup. A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy. We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.
Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy. In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage(). The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.
Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:
w/o patch w/ refcount patch
Avg Std Devn Avg Std Devn
Real: 100.59 0.38 100.63 0.43
User: 1209.60 0.37 1209.91 0.31
System: 81.52 0.42 81.64 0.34
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e. MUCH later.
On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.
Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting. The subsequent free_uid() will complete the
user_struct destruction.
For example simple program
#include <sched.h>
char stack[2 * 1024 * 1024];
int f(void *foo)
{
return 0;
}
int main(void)
{
clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
return 0;
}
run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.
This was spotted during OpenVZ kernel testing.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could. Convert it to
hlist_heads.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to the timekeeping code broke support for the PowerPC 601
processor which doesn't have the usual timebase facility but a slightly
different thing called (yuck) the RTC.
This fixes it, boot tested on an old 601 based PowerMac 7200.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson pointed out that swapper_pg_dir actually need to be
PGD_TABLE_SIZE bytes long not PAGE_SIZE. This actually saves 64k in
the bss for a kernel ppc64_defconfig built with CONFIG_PPC_64K_PAGES.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Create a helper function (alloc_maybe_bootmem) that is marked __init_refok
to limit the chances of mistakenly referring to other __init routines.
WARNING: vmlinux.o(.text+0x2a9c4): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.update_dn_pci_info' and '.pci_dn_reconfig_notifier')
WARNING: vmlinux.o(.text+0x36430): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.mpic_msi_init_allocator' and '.find_ht_magic_addr')
WARNING: vmlinux.o(.text+0x5e804): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config')
WARNING: vmlinux.o(.text+0x5e8e8): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config')
WARNING: vmlinux.o(.text+0x5e968): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config')
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Low-power mode implementation for Lite5200b.
Some I/O registers are also saved here.
A recent U-Boot that supports this (lite5200b_PM_config) is needed.
Signed-off-by: Domen Puncer <domen.puncer@telargo.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rework spufs_coredump_extra_notes_write() to check for and return errors.
If we're coredumping to a pipe we can't trust file->f_pos, we need to
maintain the foffset value passed to us. The cleanest way to do this is
to have the low level write routine increment foffset when we've
successfully written.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To start with, arch_notes_size() etc. is a little too ambiguous a name for
my liking, so change the function names to be more explicit.
Calling through macros is ugly, especially with hidden parameters, so don't
do that, call the routines directly.
Use ARCH_HAVE_EXTRA_ELF_NOTES as the only flag, and based on it decide
whether we want the extern declarations or the empty versions.
Since we have empty routines, actually use them in the coredump code to
save a few #ifdefs.
We want to change the handling of foffset so that the write routine updates
foffset as it goes, instead of using file->f_pos (so that writing to a pipe
works). So pass foffset to the write routine, and for now just set it to
file->f_pos at the end of writing.
It should also be possible for the write routine to fail, so change it to
return int and treat a non-zero return as failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because spufs might be built as a module, we can't have other parts of the
kernel calling directly into it, we need stub routines that check first if the
module is loaded.
Currently we have two structures which hold callbacks for these stubs, the
syscalls are in spufs_calls and the coredump calls are in spufs_coredump_calls.
In both cases the logic for registering/unregistering is essentially the same,
so we can simplify things by combining the two.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_create and spu_run are wrapped by the cell syscall layer, so
we don't need the asmlinkage.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, a built-in spufs will not use the spufs_calls callbacks, but
directly call sys_spu_create. This saves us an indirect branch, but
means we have duplicated functions - one for CONFIG_SPU_FS=y and one for
=m.
This change unifies the spufs syscall path, and provides access to the
spufs_calls structure through a get/put pair. At present, the only user
of the spufs_calls structure is spu_syscalls.c, but this will facilitate
adding the coredump calls later.
Everyone likes numbers, right? Here's a before/after comparison with
CONFIG_SPU_FS=y, doing spu_create(); close(); 64k times.
Before:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.075s
user 0m0.146s
sys 0m23.925s
After:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.777s
user 0m0.141s
sys 0m24.631s
So, we're adding around 11us per syscall, at the benefit of having
only one syscall path.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Current status of APUS:
- arch/powerpc/: removed in 2.6.23
- arch/ppc/: marked BROKEN since 2 years
This therefore removes the remaining parts of APUS support from
arch/ppc, include/asm-ppc, arch/powerpc and include/asm-powerpc.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Warn user if cpu is ignored.
[SPARC64]: Fix lockdep, particularly on SMP.
[SPARC64]: Update defconfig.
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[VLAN]: Fix net_device leak.
[PPP] generic: Fix receive path data clobbering & non-linear handling
[PPP] generic: Call skb_cow_head before scribbling over skb
[NET] skbuff: Add skb_cow_head
[BRIDGE]: Kill clone argument to br_flood_*
[PPP] pppoe: Fill in header directly in __pppoe_xmit
[PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
[PPP] pppoe: Fix skb_unshare_check call position
[SCTP]: Convert bind_addr_list locking to RCU
[SCTP]: Add RCU synchronization around sctp_localaddr_list
[PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
[PKTGEN]: srcmac fix
[IPV6]: Fix source address selection.
[IPV4]: Just increment OutDatagrams once per a datagram.
[IPV6]: Just increment OutDatagrams once per a datagram.
[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
[NET_SCHED] protect action config/dump from irqs
[NET]: Fix two issues wrt. SO_BINDTODEVICE.
When CONFIG_ISA is disabled, the isa_driver support will not be compiled
in. Define stubs so that we don't get link-time errors.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.
This can be used in encapsulating paths where we only need to modify the
header. As it is, this can be used in PPPOE and bridging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU. This includes the
sctp_bind_addrs structure and it's list of bound addresses.
This list is currently protected by an external rw_lock and that
looks like an overkill. There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other. These are also relatively rare, so we
should be good with RCU.
The readers are varied and they are easily converted to RCU.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers. As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted by Al Viro, when we try to call prom_set_trap_table()
in the SMP trampoline code we try to take the PROM call spinlock
which doesn't work because the current thread pointer isn't
valid yet and lockdep depends upon that being correct.
Furthermore, we cannot set the current thread pointer register
because it can't be properly dereferenced until we return from
prom_set_trap_table(). Kernel TLB misses only work after that
call.
So do the PROM call to set the trap table directly instead of
going through the OBP library C code, and thus avoid the lock
altogether.
These calls are guarenteed to be serialized fully.
Since there are now no calls to the prom_set_trap_table{_sun4v}()
library functions, they can be deleted.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-xtensa.org/kernel/xtensa-feed:
[patch 1/2] Xtensa: enable arbitary tty speed setting ioctls
[patch 2/2] xtensa console.c: remove duplicate #include
[XTENSA] Add support for cache-aliasing
[XTENSA] Add kernel module support
[XTENSA] Add support for executable/non-executable feature in the mmu
[XTENSA] Use the generic version of get_order
[XTENSA] Initialize semaphore_wake_lock
[XTENSA] Add typecast macro for constants
[XTENSA] Fix timer instabilities.
[XTENSA] Fix fadvise64_64
[XTENSA] Remove extraneous include statement
[XTENSA] Move string-io functions to io.c from pci.c
[XTENSA] Move pre-initialized structures to init_task.c
[XTENSA] Add freestanding option to CFLAGS
[XTENSA] Add getpgrp system-call to unistd.h
[XTENSA] add missing system calls
[XTENSA] fix wrong usage of __init and __initdata in traps.c
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix some bugs in lib/string.h functions found by our string testing modules
Blackfin arch: fix the aliased write macros
Blackfin arch: Update/Fix PM support add new pm_ops valid
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] 20Kc: Disable use of WAIT instruction.
[MIPS] Workaround for 4Kc machine check exception
[MIPS] Malta: Fix off by one bug in interrupt handler.
[MIPS] No ide_default_io_base() if PCI IDE was not found
[MIPS] Add #include <linux/profile.h> to arch/mips/kernel/time.c
[MIPS] N32 needs to use compat_sys_futimesat
[MIPS] rtlx: Fix build error.
[MIPS] rtlx: fix int vs. long bug.
Revert b543858209 and add
no_pci_devices() check to avoid crash due to early calling of
pci_get_class().
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The current definition of struct ccsr_guts in immap_86xx.h was for 85xx.
This patch fixes that and replaces the vague integer types with sized types
of the correct endianness. The unused struct ccsr_pci is also deleted.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch adds the clrsetbits_xxx() macros, which are used to set and clear
multiple bits in a single read-modify-write operation. Specify the bits to
clear in the 'clear' parameter and the bits to set in the 'set' parameter.
These macros can also be used to set a multiple-bit bit pattern using a mask,
by specifying the mask in the 'clear' parameter and the new bit pattern in the
'set' parameter. There are big-endian and little-endian versions for 8, 16,
32, and 64 bits.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is needed to configure and control QE pario pins from the kernel.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
These I/O accessors will be used in code under drivers/,
which is expected to still work in arch/ppc.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We get warnings like the following from the various ppc32 head*.S files:
WARNING: vmlinux.o(.text+0x358): Section mismatch: reference to .init.text:early_init (between 'skpinv' and 'interrupt_base')
WARNING: vmlinux.o(.text+0x380): Section mismatch: reference to .init.text:machine_init (between 'skpinv' and 'interrupt_base')
WARNING: vmlinux.o(.text+0x384): Section mismatch: reference to .init.text:MMU_init (between 'skpinv' and 'interrupt_base')
WARNING: vmlinux.o(.text+0x3aa): Section mismatch: reference to .init.text:start_kernel (between 'skpinv' and 'interrupt_base')
WARNING: vmlinux.o(.text+0x3ae): Section mismatch: reference to .init.text:start_kernel (between 'skpinv' and 'interrupt_base')
Added a .text.head section simliar to what other architectures do since
modpost already excludes this from its warnings.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Make it so that SPE support can be determined at runtime. This is similiar
to how we handle AltiVec. This allows us to have SPE support built in and
work on processors with and without SPE.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Added basic board port for MPC8572 DS reference platform that is
similiar to the MPC8544/33 DS reference platform in uniprocessor mode.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add new error codes that may be returned by the LV1 hypervisor
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some versions of PWRficient 1682M have an interrupt controller in which
the first register in each pair for interrupt sources doesn't always
read with the right polarity/sense values.
To work around this, keep a software copy of the register instead. Since
it's not modified from the mpic itself, it's a feasible solution. Still,
keep it under a config option to avoid wasting memory on other platforms.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's no need to call the runlatch on functions on processors that
don't implement them (CPU_FTR_CTRL).
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Export some of the implementation-specific registers via sysfs.
Useful when debugging, etc.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The majority of irq_host implementations (3 out of 4) are associated
with a device_node, and need to stash it somewhere. Rather than having
it somewhere different for each host, add an optional device_node pointer
to the irq_host structure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit f629307c85 introduced uses of
kernel_termios_to_user_termios_1 and user_termios_to_kernel_termios_1
on all architectures. However, powerpc, s390, avr32 and frv don't
currently define those functions since their termios struct didn't
need to be changed when the arbitrary baud rate stuff was added, and
thus the kernel won't currently build on those architectures.
This adds definitions of kernel_termios_to_user_termios_1 and
user_termios_to_kernel_termios_1 to include/asm-generic/termios.h
which are identical to kernel_termios_to_user_termios and
user_termios_to_kernel_termios respectively. The definitions are the
same because the "old" termios and "new" termios are in fact the same
on these architectures (which are the same ones that use
asm-generic/termios.h).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e1612 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.
The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler. The new code makes ->poll always there and returns 0 by
default, which is not correct. Return DEFAULT_POLLMASK instead.
Steps to reproduce:
install emacs, SBCL, SLIME
emacs
M-x slime in *inferior-lisp* buffer
[watch it doing "Connecting to Swank on port X.."]
Please, apply before 2.6.23.
P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The AdvanSys driver wants to align some pointers, and the ALIGN macro
doesn't work for pointers. Rather than try to make it work, add a new
PTR_ALIGN macro which is typesafe.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>