The runlatch SPR can take a lot of time to write. My original runlatch
code would set it on every exception entry even though most of the time
this was not required. It would also continually set it in the idle
loop, which is an issue on an SMT capable processor.
Now we cache the runlatch value in a threadinfo bit, and only check for
it in decrementer and hardware interrupt exceptions as well as the idle
loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On the 83xx platform to ensure the PCI inbound memory is handled properly we
have to turn on coherency for all pages in the MMU. Otherwise we see
corruption if inbound "prefetching/streaming" is enabled on the PCI controller.
Signed-off-by: Randy Vinson <rvinson@mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For kexec we need to know the size of the MMU hash table.
Currently we calculate the size once in the htab code, and then twice more in
the kexec code, once using htab_hash_mask and once using ppc64_pft_size.
On some machines the ppc64_pft_size calculation is broken because
ppc64_pft_size is not set.
So we need to fix the second calculation, but better still we should just
calculate the size once and use it everywhere else.
Tested on Power5 LPAR, Power4 non-LPAR and Power3.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We often just do an atomic_dec(&x->refcnt) on an xfrm_state object
because we know there is more than 1 reference remaining and thus
we can elide the heavier xfrm_state_put() call.
Do this behind an inline function called __xfrm_state_put() so that is
more obvious and also to allow us to more cleanly add refcount
debugging later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Andrew Victor
This patch adds the at91_set_multi_drive() function to enable/disable
the multi-drive (open collector) pin capability on the AT91RM9200
processor.
This is necessary to fix the UDC (USB Gadget) driver for the AT91RM9200
board as it will not allow the board reset line to be pulled low if the
pullup is not driven as an open collector output as the boards are wired
to the USB connector on both the DK/EK.
This version of the patch updates it to 2.6.16-rc4.
Orignal patch by Jeff Warren.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Alessandro Zummo
The I2C pin assignment for the Iomega NAS100d board was incorrect. This
patch fixes it. The correct assignment has now been tested using the
new RTC class and a new driver for the RTC on the NAS100d.
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This change reverts the 033b96fd30 commit
from Kay Sievers that removed the mount/umount uevents from the kernel.
Some older versions of HAL still depend on these events to detect when a
new device has been mounted. These events are not correctly emitted,
and are broken by design, and so, should not be relied upon by any
future program. Instead, the /proc/mounts file should be polled to
properly detect this kind of event.
A feature-removal-schedule.txt entry has been added, noting when this
interface will be removed from the kernel.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems current get_user() incorrectly sign-extend an unsigned int
value on 64bit kernel. I think this is because '(__typeof__(val))'
cast in final assignment. I suppose the cast should be
'(__typeof__(*(addr))'.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes a bug of include/asm-m32r/system.h:__cmpxchg_u32().
static __inline__ unsigned long
__cmpxchg_u32(volatile unsigned int *p, unsigned int old, unsigned int new);
In __cmpxchg_u32(), the "old" value must not be changed to the previous "*p"
value. But the former code modifies the previous "*p" value.
A deadlock at _atomic_dec_and_lock sometimes happened due to this bug.
Signed-off-by: Hayato Fujiwara <fujiwara@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Revert dasd eer module until we have a common understanding of how the
interface should be.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The compat syscalls are added to sys_ni.c since they are not defined if the
above CONFIG options are off. Also, nfs would not build with CONFIG_SYSCTL
off.
Noticed by Arthur Othieno.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Luke Yang <luke.adi@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, acpi video options can only be set on kernel command line. That's
little inflexible; I'd like userland s2ram application that just works, and
modifying kernel command line according to whitelist is not fun. It is better
to just allow s2ram application to set video options just before suspend
(according to the whitelist).
This implements sysctl to allow setting suspend video options without reboot.
(akpm: Documentation updates for this new sysctl are pending..)
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.
In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).
It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is
similar to what we do when running out of hugepages.
This patch adds a check before we kill processes. At that point
performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should kill the
current process.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes ata_for_each_sg() start with pad_sgent when
qc->n_elem is zero. Previously, ata_for_each_sg() unconditionally
started with qc->__sg, handling the first sg to fill_sg() routines
even when the entry was invalid. And while at it, unwind ?: in
ata_qc_next_sg() into if statement.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
A few symbols are exported twice, remove them from ppc_ksyms.c
Remove users of sys_ctrler in arch/ppc/
WARNING: vmlinux: duplicate symbol '__delay' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__up' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down_interruptible' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'sys_ctrler' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncat' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncmp' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strrchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strnlen' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strpbrk' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'memscan' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strstr' previous definition was in vmlinux
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Patch claiming to remove enable_irq_nosync() had left it alive but killed
disable_irq_nosync() instead...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
One of the parameters to the __pud_free_tlb() macro for powerpc is
incorrect (see patch) . We get away with it by accident, because the one
place the macro is called, the second parameter is a variable named "pud".
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce additional_cpus command line option. By default no additional cpu
can be attached to the system anymore. Only the cpus present at IPL time can
be switched on/off. If it is desired that additional cpus can be attached to
the system the maximum number of additional cpus needs to be specified with
this option.
This change is necessary in order to limit the waste of per_cpu data
structures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do not mask TIF_SINGLESTEP bit in _TIF_WORK_MASK. Masking this stopped
do_notify_resume() from being called when it should have been.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer. The value returned is a fixed-point number in 52.12 format
in nanoseconds. The reason for this format is that it gives the
full precision that the timekeeping code is using internally.
The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used. PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic. (This method of getting the time is also
exported to userspace via the VDSO.) However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.
Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now. It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do. Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.
The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer. If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday. In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.
Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on. The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled. (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.) Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.
Also I moved the variable to a more appropiate place and make
it independent from sysctl
And marked __read_mostly which it is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Nicolas Pitre
With EABI the multiplex sys_ipc and sys_socketcall syscalls are
unavailable and their support code even removed from the compiled
kernel, and the new unmuxed syscalls must be used instead.
Make those syscall numbers visible.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
A change to the SMP initialisation caused the following oops:
CPU1: Booted secondary processor
CPU1: D VIPT write-back cache
CPU1: I cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
CPU1: D cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
<7>Calibrating delay loop... 83.14 BogoMIPS (lpj=415744)
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000001c
...
PC is at enqueue_task+0x1c/0x64
LR is at activate_task+0xcc/0xe4
SMP initialisation now requires cpu_possible_map to be initialised in
setup_arch(). Move this from smp_prepare_cpus() to smp_init_cpus()
and call it from our setup_arch() if CONFIG_SMP is enabled.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For two macros the arguments were expanded twice, change them to inline
functions to avoid it.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all
arches. The idea is to make it possible to use them portably even before
distros include them in libc headers.
Move common flags to asm-generic/mman.h
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There were two mistakes in the register-read-on-(un)blank approach.
- First, without proper register (un)locking the value read back will always
be zero, and this is what I missed entirely until just now. Due to this,
the logic could not be verified at all and I tried some bogus checks which
are completely stupid.
- Second, the LCD status bit will always be set to zero when the backlight
has been turned off. Reading the value back during unblank will disable the
LCD unconditionally, regardless of the state it is supposed to be in, since
we set it to zero beforehand.
So this is what we do now:
- create a new variable in struct neofb_par, and use that to determine
whether to read back registers (initialized to true)
- before actually blanking the screen, read back the register to sense any
possible change made through Fn key combo
- use proper neoUnlock() / neoLock() to actually read something
- every call to neofb_blank() determines if we read back next time: blanking
disables readback, unblanking (FB_BLANK_UNBLANK) enables it
This should give us a nice and clean state machine. Has been thoroughly
tested on a Dell Latitude CPiA / NM220 Chip docked to a C/Dock2 with attached
CRT in all possible combinations of LCD/CRT on/off. I changed the config via
Fn key, let the console blank, unblanked by keypress - works flawlessly.
Signed-off-by: Christian Trefzer <ctrefzer@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nf_hook() is supposed to call the netfilter hook and return control of the
packet back to the caller in case it may pass, the okfn is only used for
queueing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.
This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs: a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.
The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks. This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods. If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().
This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible. It implements two simple
alternative udelay() routines: one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix XPC so that it does not deliver any messages until the connected
callout has returned, as well as, prevent the disconnected callout to
occur before the disconnecting callout has returned.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The __sn_cnodeid_to_nasid array was incorrectly sized at MAX_NUMNODES.
On a large system, this array could overflow. The following patch
corrects this by defining it to MAX_COMPACT_NODES.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
General SN2 code cleanup:
- Do not initialize global variables to zero
- Use kzalloc instead of kmalloc+memset
- Check kmalloc return values
- Do not obfuscate spin lock calls
- Remove some unused code
- Various formatting cleanups
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>