As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.
Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf. The round robin code executes during
lookup and thus with the rwlock taken as a reader. A small local
spinlock tries to provide protection but this does not work at all
for two reasons:
1) The round-robin list manipulation, as coded, goes like this (with
read lock held):
walk routes finding head and tail
spin_lock();
rotate list using head and tail
spin_unlock();
While one thread is rotating the list, another thread can
end up with stale values of head and tail and then proceed
to corrupt the list when it gets the lock. This ends up causing
the OOPS in fib6_add() later onthat many people have been hitting.
2) All the other code paths that run with the rwlock held as
a reader do not expect the list to change on them, they
expect it to remain completely fixed while they hold the
lock in that way.
So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.
Reimplement using a per-fib6_node round-robin pointer. This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself. We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.
The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.
Signed-off-by: David S. Miller <davem@davemloft.net>
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.
The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.
I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a patch from Patrick McHardy.
The fib_rules netlink attribute policy introduced in 2.6.19 broke
userspace compatibilty. When specifying a rule with "from all"
or "to all", iproute adds a zero byte long netlink attribute,
but the policy requires all addresses to have a size equal to
sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a
validation error.
Check attribute length of FRA_SRC/FRA_DST in the generic framework
by letting the family specific rules implementation provide the
length of an address. Report an error if address length is non
zero but no address attribute is provided. Fix actual bug by
checking address length for non-zero instead of relying on
availability of attribute.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported by Julian Deng that configuring the pxa27x i2c SCL line as output generates a short negative pulse on it during the call to pxa_gpio_mode(GPIO117_I2CSCL_MD); as it first switches it to output and then configures it for the alternate function. The SCL line is in fact bidirectional and can also be configured as 117 | GPIO_ALT_FN_1_IN, in which case the pulse is not generated. This is exactly what this patch does.
Author: Julian Deng <dengtj@sitek.cn>
Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The missing cast did result a warning when calling an 32-bit ARC firmware
function that takes 5 arguments where the 5th argument is a pointer from a
64-bit kernel.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In the the sequence:
ei
..
mfc0 $x, $status
the mfc0 may not see the SR_IE bit set. This was a deliberate bug in the
kernel code because we knew this was a safe thing to do on all R2 silicon
so far but new silicon is changing this.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes two places where we used plain 'x - PAGE_OFFSET' to
achieve virtual to physical address convertions. This type of convertion
is no more allowed since commit 6f284a2ce7.
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
[Build fixes for machines that don't use the generic dma-coherence.h]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The segment register slots in struct pt_regs are padded to 32 bits.
Some of these are stored with instructions like "pushl %es", which
leaves the high 16 bits as they were. So the high bits of these
fields in struct pt_regs contain kernel stack garbage. These bits are
ignored by everything and never leak to user space, except in core
dumps. The user struct pt_regs is always at the base of the thread's
kernel stack and so it seems unlikely the information that leaks from
here is ever worthwhile so as to be a security concern, but I'm not
sure about that. It has been this way for ages; userland consumers of
core dumps all mask off these high bits themselves. So it is not urgent.
This change masks off the padding bits of the segment register slots
in core dumps. ptrace already masks off these high bits, so this
makes the values in core dumps consistent with what ptrace would
report just before the process died.
As I read the processor manuals, the cs and ss values will always be
padded with zero bits rather than stack garbage. But unlike "pushl %es",
this is not simple to test with a userland program. So I added the two
instructions rather than wonder if they are really never necessary.
I think that x86_64 does not have this problem (for either 32-bit or
64-bit processes). It only uses "mov" instructions from segment
registers, which zero-extend.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Needed for any architecture that claims ARCH_APICTIMER_STOPS_ON_C3,
not just i386.
I'm hoping Thomas will clean this up a bit later..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.
Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: IA64: fix %ll build warnings
ACPI: IA64: fix allnoconfig build
ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)
ACPI: ibm-acpi: allow module to load when acpi notifiers can't be set (v2)
ACPI: parse 2nd MADT by default
ACPICA: revert "acpi_serialize" changes
sony-laptop: MAINTAINERS fix entry, add L: and W:
ACPI: resolve HP nx6125 S3 immediate wakeup regression
ACPI: Add support to parse 2nd MADT
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Bypass hcall stats until cpu features have run
[POWERPC] Avoid hypervisor statistics calculation in real mode
[POWERPC] Fix atomicity of TIF update in flush_thread()
lockdep found a bug during a run of workqueue function - this could be also
caused by a bug from other code running simultaneously.
lockdep really shouldn't be used when debug_locks == 0!
Reported-by: Folkert van Heusden <folkert@vanheusden.com>
Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix unannotated variable declarations. Variables that have allocation
section annotations (such as __meminitdata) on their definitions must also
have them on their declarations as not doing so may affect the addressing
mode used by the compiler and may result in a linker error.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETFILTER]: nat: avoid rerouting packets if only XFRM policy key changed
[NETFILTER]: nf_conntrack_netlink: add missing dependency on NF_NAT
[NET]: fix up misplaced inlines.
[SCTP]: Correctly reset ssthresh when restarting association
[BRIDGE]: Fix fdb RCU race
[NET]: Fix fib_rules dump race
[XFRM]: ipsecv6 needs a space when printing audit record.
[X25] x25_forward_call(): fix NULL dereferences
[SCTP]: Reset some transport and association variables on restart
[SCTP]: Increment error counters on user requested HBs.
[SCTP]: Clean up stale data during association restart
[IrDA]: Calling ppp_unregister_channel() from process context
[IrDA]: irttp_dup spin_lock initialisation
[IrDA]: Delay needed when uploading firmware chunks
kexec invokes plpar_hcall hypervisor call in real mode. plpar_hcall
refers to per cpu variables for accounting hypervisor statistics.
These variables may not be in the RMO region, so accesses to them
in real mode may result in a data storage exception.
This fixes this problem by using a new plpar_hcall_raw function which
does not update the hypervisor call statistics. Thanks to Anton for
suggesting this idea.
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix wrong /proc/iomem on SGI Altix
[IA64] Altix: ioremap vga_console_iobase
[IA64] Fix typo/thinko in crash.c
[IA64] Fix get_model_name() for mixed cpu type systems
[IA64] min_low_pfn and max_low_pfn calculation fix
We have seen bad_pte_print when testing crashdump on an SN machine in
recent 2.6.20 kernel. There are tons of bad pte print (pfn < max_low_pfn)
reports when the crash kernel boots up, all those reported bad pages
are inside initmem range; That is because if the crash kernel code and
data happens to be at the beginning of the 1st node. build_node_maps in
discontig.c will bypass reserved regions with filter_rsvd_memory. Since
min_low_pfn is calculated in build_node_map, so in this case, min_low_pfn
will be greater than kernel code and data.
Because pages inside initmem are freed and reused later, we saw
pfn_valid check fail on those pages.
I think this theoretically happen on a normal kernel. When I check
min_low_pfn and max_low_pfn calculation in contig.c and discontig.c.
I found more issues than this.
1. min_low_pfn and max_low_pfn calculation is inconsistent between
contig.c and discontig.c,
min_low_pfn is calculated as the first page number of boot memmap in
contig.c (Why? Though this may work at the most of the time, I don't
think it is the right logic). It is calculated as the lowest physical
memory page number bypass reserved regions in discontig.c.
max_low_pfn is calculated include reserved regions in contig.c. It is
calculated exclude reserved regions in discontig.c.
2. If kernel code and data region is happen to be at the begin or the
end of physical memory, when min_low_pfn and max_low_pfn calculation is
bypassed kernel code and data, pages in initmem will report bad.
3. initrd is also in reserved regions, if it is at the begin or at the
end of physical memory, kernel will refuse to reuse the memory. Because
the virt_addr_valid check in free_initrd_mem.
So it is better to fix and clean up those issues.
Calculate min_low_pfn and max_low_pfn in a consistent way.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes some "const"s that I introduced thinking they mean
the same thing as the "const"s introduced here. So it fixes three warnings.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If the association has been restarted, we need to reset the
transport congestion variables as well as accumulated error
counts and CACC variables. If we do not, the association
will use the wrong values and may terminate prematurely.
This was found with a scenario where the peer restarted
the association when lksctp was in the last HB timeout for
its association. The restart happened, but the error counts
have not been reset and when the timeout occurred, a newly
restarted association was terminated due to excessive
retransmits.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During association restart we may have stale data sitting
on the ULP queue waiting for ordering or reassembly. This
data may cause severe problems if not cleaned up. In particular
stale data pending ordering may cause problems with receive
window exhaustion if our peer has decided to restart the
association.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acpi_integer is 64-bits on all platforms, and so was defined as a u64.
i386 and x86_64 define u64 as unsigned long long.
ia64 defines u64 as long.
While these are all 64-bits, the kernel build warns about formating
a "long" with %ll:
drivers/ata/libata-acpi.c:176: warning: long long unsigned int format, acpi_integer arg (arg 5)
So skip using "u64" and define acpi_integer as "unsigned long long"
to make gcc happy with %ll.
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] zcrypt: fix possible race when unloading zcrypt driver modules
[S390] zcrypt: fix possible dead lock in AP bus module
[S390] Wire up sys_utimes.
[S390] reboot from and dump to SCSI under z/VM fails.
[S390] Wire up compat_sys_epoll_pwait.
[S390] strlcpy is smart enough
[S390] memory detection: fix off by one bug.
[S390] cio: qdio slsb setup
We used wrong length values for ipl and dump hardware structures.
Since z/VM checks the ipl parameters more accurately than LPAR,
the operations fail there.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since d9a9cdfb07 <linux/sysfs.h> is using
ENOSYS without including <linux/errno.h> if CONFIG_SYSFS is disabled.
Fixed by including <linux/errno.h>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a missing #define for the platform_kernel_launch_event. Without this
fix, a call to platform_kernel_launch_event() becomes a noop on generic
kernels. SN systems require this fix to successfully kdump/kexec from
certain hardware errors.
[bwalle@suse.de: fix it]
Signed-off-by: John Keller <jpk@sgi.com>
Cc: Bernhard Walle <bwalle@suse.de>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Jay Lan <jlan@sgi.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa:
[ALSA] hda-intel - Fix HDA buffer alignment
[ALSA] hda-codec - Add model for HP Compaq d5750
[ALSA] hda-codec - Add support for MacBook Pro 1st generation
[ALSA] version 1.0.14rc3
[ALSA] hda-codec - Add model for HP Compaq d5700
[ALSA] intel8x0 - Fix Oops at kdump crash kernel
[ALSA] hda-codec - Fix speaker output on MacPro
[ALSA] hda-codec - more systems for Analog Devices
[ALSA] hda-intel - Fix codec probe with ATI contorllers
[ALSA] hda-codec - Add suppoprt for Asus M2N-SLI motherboard
[ALSA] intel8x0 - Fix speaker output after S2RAM
[ALSA] ac97 - fix AD shared shared jack control logic
[ALSA] soc - Fix dependencies in Kconfig files
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
ide: remove CONFIG_IDEDMA_{ICS,PCI}_AUTO config options
ide: don't allow DMA to be enabled if CONFIG_IDEDMA_{ICS,PCI}_AUTO=n
scc_pata: dependency fix
jmicron: make ide jmicron driver play nice with libata ones
ide: remove static prototypes from include/asm-mips/mach-au1x00/au1xxx_ide.h
ide: au1xxx: fix use of mixed declarations and code
cmd64x: fix recovery time calculation (take 3)
This patch removes the static prototypes from the au1xxx_ide.h, some of
them were not even implemented. Also, they caused build breakage since
they differed from the functions actually implemented in
drivers/ide/mips/au1xxx-ide.c.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The maximum seconds value we can handle on 32bit is LONG_MAX.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typo in sync_constant_test_bit()'s name, so sync_bitops.h is consistent
with bitops.h
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current NFS client congestion logic is severly broken, it marks the
backing device congested during each nfs_writepages() call but doesn't
mirror this in nfs_writepage() which makes for deadlocks. Also it
implements its own waitqueue.
Replace this by a more regular congestion implementation that puts a cap on
the number of active writeback pages and uses the bdi congestion waitqueue.
Also always use an interruptible wait since it makes sense to be able to
SIGKILL the process even for mounts without 'intr'.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the console is in VT_AUTO+KD_GRAPHICS mode, switching to the
SUSPEND_CONSOLE fails, resulting in vt_waitactive() waiting indefinitely or
until the task is interrupted. This patch tests if a console switch can
occur in set_console() and returns early if a console switch is not
possible.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Andrew Johnson <ajohnson@intrinsyc.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's been pointed out that output GPIOs should have an initial value, to
avoid signal glitching ... among other things, it can be some time before
a driver is ready. This patch corrects that oversight, fixing
- documentation
- platforms supporting the GPIO interface
- users of that call (just one for now, others are pending)
There's only one user of this call for now since most platforms are still
using non-generic GPIO setup code, which in most cases already couples the
initial value with its "set output mode" request.
Note that most platforms are clear about the hardware letting the output
value be set before the pin direction is changed, but the s3c241x docs are
vague on that topic ... so those chips might not avoid the glitches.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Andrew Victor <andrew@sanpeople.com>
Acked-by: Milan Svoboda <msvoboda@ra.rockwell.com>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a bug in the cleanup of an spi_bitbang bus.
The workqueue associated with the bus was destroyed before the call to
spi_unregister_master. That meant that spi devices on that bus would be
unable to do IO in their remove method. The shutdown flag should have been
able to prevent a segfault, but was never getting set. By waiting to
destroy the workqueue until after the master is unregistered, devices are
able to do IO in their remove methods. An added benefit is that neither
the shutdown flag nor a wait for the queue of messages to empty is needed.
Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch corrects work with time in UFS2 case.
1) According to UFS2 disk layout modification/access and so on "time"
should be hold in two variables one 64bit for seconds and another 32bit for
nanoseconds,
at now for some unknown reason we suppose that "inode time" holds in
three variables 32bit for seconds, 32bit for milliseconds and 32bit for
nanoseconds.
2) We set amount of nanoseconds in "VFS inode" to 0 during read, instead of
getting values from "on disk inode"(this should close
http://bugzilla.kernel.org/show_bug.cgi?id=7991).
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Bjoern Jacke <bjoern@j3e.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>