Urgent events may be delayed if we already have a non-urgent event
queued for that device. This patch changes this by making sure that
an urgent event is always looked at immediately.
I've replaced the LW_RUNNING flag by LW_URGENT since whether work
is scheduled is already kept track by the work queue system.
The only complication is that we have to provide some exclusion for
the setting linkwatch_nextevent which is available in the actual
work function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the jiffies wrap around or when the system boots up for the first
time, down events can be delayed indefinitely since we no longer
update linkwatch_nextevent when only urgent events are processed.
This patch fixes this by setting linkwatch_nextevent when a
wrap-around occurs.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize teql_enqueue so that it first checks limits before enqueing.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
| CC net/mac80211/ieee80211_sta.o
| In file included from linux/net/mac80211/ieee80211_sta.c:31:
| include2/asm/delay.h: In function '__const_udelay':
| include2/asm/delay.h:33: error: 'loops_per_jiffy' undeclared (first use in this function)
| include2/asm/delay.h:33: error: (Each undeclared identifier is reported only once
| include2/asm/delay.h:33: error: for each function it appears in.)
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all link carrier events are delayed by up to a second
before they're processed to prevent link storms. This causes
unnecessary packet loss during that interval.
In fact, we can achieve the same effect in preventing storms by
only delaying down events and unnecssary up events. The latter
is defined as up events when we're already up.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
These days the link watch mechanism is an integral part of the
network subsystem as it manages the carrier status. So it now
makes sense to allocate some memory for it in net_device rather
than allocating it on demand.
In fact, this is necessary because we can't tolerate a memory
allocation failure since that means we'd have to potentially
throw a link up event away.
It also simplifies the code greatly.
In doing so I discovered a subtle race condition in the use
of singleevent. This race condition still exists (and is
somewhat magnified) without singleevent but it's now plugged
thanks to an smp_mb__before_clear_bit.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for struct class_device -> struct device input core
conversion, switch to using input_dev->dev.parent when specifying
device position in sysfs tree.
Also, do not access input_dev->private directly, use helpers and
do not use kfree() on input device, use input_free_device() instead.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] update default configuration.
[S390] Kconfig: no wireless on s390.
[S390] Kconfig: use common Kconfig files for s390.
[S390] Kconfig: common config options for s390.
[S390] Kconfig: unwanted menus for s390.
[S390] Kconfig: menus with depends on HAS_IOMEM.
[S390] Kconfig: refine depends statements.
[S390] Avoid compile warning.
[S390] qdio: re-add lost perf_stats.tl_runs change in qdio_handle_pci
[S390] Avoid sparse warnings.
[S390] dasd: Fix modular build.
[S390] monreader inlining cleanup.
[S390] cio: Make some structures and a function static.
[S390] cio: Get rid of _ccw_device_get_device_number().
[S390] fix subsystem removal fallout
Disable some more menus in the configuration files that are of no
interest to a s390 machine.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
While the comment says:
* To prevent rpciod from hanging, this allocator never sleeps,
* returning NULL if the request cannot be serviced immediately.
The function does not actually check for NULL pointers being returned.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use a cleaner method to find the size of an rpc_buffer. This actually
works on x86-64!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This while loop has an overly complex condition, which performs a couple of
assignments. This hurts readability.
We don't really need a loop at all. We can just return -EAGAIN and (providing
we set SK_DATA), the function will be called again.
So discard the loop, make the complex conditional become a few clear function
calls, and hopefully improve readability.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If I send a RPC_GSS_PROC_DESTROY message to NFSv4 server, it will reply with a
bad rpc reply which lacks an authentication verifier. Maybe this patch is
needed.
Send/recv packets as following:
send:
RemoteProcedureCall
xid
rpcvers = 2
prog = 100003
vers = 4
proc = 0
cred = AUTH_GSS
version = 1
gss_proc = 3 (RPCSEC_GSS_DESTROY)
service = 1 (RPC_GSS_SVC_NONE)
verf = AUTH_GSS
checksum
reply:
RemoteProcedureReply
xid
msg_type
reply_stat
accepted_reply
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have been investigating a module reference count leak on the server for
rpcsec_gss_krb5.ko. It turns out the problem is a reference count leak for
the security context in net/sunrpc/auth_gss/svcauth_gss.c.
The problem is that gss_write_init_verf() calls gss_svc_searchbyctx() which
does a rsc_lookup() but never releases the reference to the context. There is
another issue that rpc.svcgssd sets an "end of time" expiration for the
context
By adding a cache_put() call in gss_svc_searchbyctx(), and setting an
expiration timeout in the downcall, cache_clean() does clean up the context
and the module reference count now goes to zero after unmount.
I also verified that if the context expires and then the client makes a new
request, a new context is established.
Here is the patch to fix the kernel, I will start a separate thread to discuss
what expiration time should be set by rpc.svcgssd.
Acked-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not necessarily correct to assume that the xdr_buf used to hold the
server's reply must have page data whenever it has tail data.
And there's no need for us to deal with that case separately anyway.
Acked-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
register_rpc_pipefs() needs to clean up rpc_inode_cache
by kmem_cache_destroy() on register_filesystem() failure.
init_sunrpc() needs to unregister rpc_pipe_fs by unregister_rpc_pipefs()
when rpc_init_mempool() returns error.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:
RPC request reserved 164 but used 208
While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726
This is probably also a problem for other sec= types and for NFSv4. The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.
This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum. It also fixes up the appropriate callers of svc_reserve to
call the wrapper. For now, it just uses a hardcoded value that I
determined via testing. That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.
Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that sk_defer_lock protects two different things, make the name more
generic.
Also don't bother with disabling _bh as the lock is only ever taken from
process context.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.
Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.
(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net/ipv4/ipvs/ip_vs_core.c
module_exit
ip_vs_cleanup
ip_vs_control_cleanup
cancel_rearming_delayed_work
// done
This is unsafe. The module may be unloaded and the memory may be freed
while defense_work's handler is still running/preempted.
Do flush_work(&defense_work.work) after cancel_rearming_delayed_work().
Alternatively, we could add flush_work() to cancel_rearming_delayed_work(),
but note that we can't change cancel_delayed_work() in the same manner
because it may be called from atomic context.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementation of force feedback for HID devices is
USB-transport only and therefore calling hid_ff_init() from hidp code is
not going to work (plus it creates unwanted dependency of hidp on usbhid).
Remove the hid_ff_init() until either the hid-ff is made
transport-independent, or at least support for bluetooth transport is
added.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit c5a4dd8b7c introduced the following
compiler warnings:
net/sunrpc/sched.c:766: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'size_t'
net/sunrpc/sched.c:785: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'size_t'
- Use %zu to format size_t
- Kill 2 useless casts
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) Introduces a new method in 'struct dentry_operations'. This method
called d_dname() might be called from d_path() to build a pathname for
special filesystems. It is called without locks.
Future patches (if we succeed in having one common dentry for all
pipes/sockets) may need to change prototype of this method, but we now
use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);
2) Adds a dynamic_dname() helper function that eases d_dname() implementations
3) Defines d_dname method for sockets : No more sprintf() at socket
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...
4) Defines d_dname method for pipes : No more sprintf() at pipe
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...
A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :
3.090 s instead of 3.450 s
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ieee80211, the output of scan results lists channels, but not
frequencies, which are needed by NetworkManager. This patch uses
the new ieee80211_channel_to_freq routine to add the frequency to the output.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The routines that interrogate the ieee80211_geo struct are missing a
channel to frequency entry. This patch adds it.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: rfkill: add support for input key to control wireless radio
[NET] net/core: Fix error handling
[TG3]: Update version and reldate.
[TG3]: Eliminate spurious interrupts.
[TG3]: Add ASPM workaround.
[Bluetooth] Correct SCO buffer for another Broadcom based dongle
[Bluetooth] Add support for Targus ACB10US USB dongle
[Bluetooth] Disconnect L2CAP connection after last RFCOMM DLC
[Bluetooth] Check that device is in rfcomm_dev_list before deleting
[Bluetooth] Use in-kernel sockets API
[Bluetooth] Attach host adapters to the Bluetooth bus
[Bluetooth] Fix L2CAP and HCI setsockopt() information leaks
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The RF kill patch that provides infrastructure for implementing
switches controlling radio states on various network and other cards.
[dtor@insightbb.com: address review comments]
[akpm@linux-foundation.org: cleanups, build fixes]
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon failure to register "ptype" procfs entry, "softnet_stat" was not
removed, and an incorrect attempt was made to remove the "ptype" entry.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits)
kconfig: fix mconf segmentation fault
kbuild: enable use of code from a different dir
kconfig: error out if recursive dependencies are found
kbuild: scripts/basic/fixdep segfault on pathological string-o-death
kconfig: correct minor typo in Kconfig warning message.
kconfig: fix path to modules.txt in Kconfig help
usr/Kconfig: fix typo
kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs'
kbuild: be more explicit on missing .config file
kbuild: clarify the creation of the LOCALVERSION_AUTO string.
kbuild: propagate errors from find in scripts/gen_initramfs_list.sh
kconfig: refer to qt3 if we cannot find qt libraries
kbuild: handle compressed cpio initramfs-es
kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text
kbuild: remove stale comment in modpost.c
kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE
kbuild: fix make mrproper for Documentation/DocBook/man
kbuild: remove kconfig binaries during make mrproper
kconfig/menuconfig: do not hardcode '.config'
kbuild: override build timestamp & version
...
Export various mac80211 internal variables through debugfs.
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
CC [M] net/iucv/af_iucv.o
net/iucv/af_iucv.c: In function `iucv_fragment_skb':
net/iucv/af_iucv.c:984: error: structure has no member named `h'
net/iucv/af_iucv.c:985: error: structure has no member named `nh'
net/iucv/af_iucv.c:988: error: incompatible type for argument 1 of
`skb_queue_tail'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits)
NFS: Fix a compile glitch on 64-bit systems
NFS: Clean up nfs_create_request comments
spkm3: initialize hash
spkm3: remove bad kfree, unnecessary export
spkm3: fix spkm3's use of hmac
NFS4: invalidate cached acl on setacl
NFS: Fix directory caching problem - with test case and patch.
NFS: Set meaningful value for fattr->time_start in readdirplus results.
NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
SUNRPC: RPC client should retry with different versions of rpcbind
SUNRPC: remove old portmapper
NFS: switch NFSROOT to use new rpcbind client
SUNRPC: switch the RPC server to use the new rpcbind registration API
SUNRPC: switch socket-based RPC transports to use rpcbind
SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
SUNRPC: Eliminate side effects from rpc_malloc
SUNRPC: RPC buffer size estimates are too large
NLM: Shrink the maximum request size of NLM4 requests
NFS: Use pgoff_t in structures and functions that pass page cache offsets
NFS: Clean up nfs_sync_mapping_wait()
...
The RFCOMM specification says that the device closing the last DLC on
a particular session is responsible for closing the multiplexer by
closing the corresponding L2CAP channel.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If RFCOMM_RELEASE_ONHUP flag is on and rfcomm_release_dev is called
before connection is closed, rfcomm_dev is deleted twice from the
rfcomm_dev_list and refcount is messed up. This patch adds a check
before deleting device that the device actually is listed.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The kernel provides a new convenient way to access the sockets API for
in-kernel users. It is a good idea to actually use it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth host adapters are attached to the Bluetooth class and the
low-level connections are children of these class devices. Having class
devices as parent of bus devices breaks a lot of reasonable assumptions
about sysfs. The host adapters should be attached to the Bluetooth bus
to simplify the dependency resolving. For compatibility an additional
symlink from the Bluetooth class will be used.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The L2CAP and HCI setsockopt() implementations have a small information
leak that makes it possible to leak kernel stack memory to userspace.
If the optlen parameter is 0, no data will be copied by copy_from_user(),
but the uninitialized stack buffer will be read and stored later. A call
to getsockopt() can now retrieve the leaked information.
To fix this problem the stack buffer given to copy_from_user() must be
initialized with the current settings.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
During the INIT/COOKIE-ACK collision cases, it's possible to get
into a situation where the association id is not yet set at the time
of the user event generation. As a result, user events have an
association id set to 0 which will confuse applications.
This happens if we hit case B of duplicate cookie processing.
In the particular example found and provided by Oscar Isaula
<Oscar.Isaula@motorola.com>, flow looks like this:
A B
---- INIT-------> (lost)
<---------INIT------
---- INIT-ACK--->
<------ Cookie ECHO
When the Cookie Echo is received, we end up trying to update the
association that was created on A as a result of the (lost) INIT,
but that association doesn't have the ID set yet.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the SO_REUSEADDR handling to also check for listen state. This
was muliple listening server sockets can't be created and they will
not steal packets from each other.
Reported by Paolo Galtieri <pgaltieri@mvista.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to make sure that all destination ports are the same, since
the association really must not connect to multiple different ports
at once. This was reported on the sctp-impl list.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregate the SPD info TLVs.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregate the SAD info TLVs.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sort out the MTU determination and handling in AF_RXRPC:
(1) If it's present, parse the additional information supplied by the peer at
the end of the ACK packet (struct ackinfo) to determine the MTU sizes
that peer is willing to support.
(2) Initialise the MTU size to that peer from the kernel's routing records.
(3) Send ACKs rather than ACKALLs as the former carry the additional info,
and the latter do not.
(4) Declare the interface MTU size in outgoing ACKs as a maximum amount of
data that can be stuffed into an RxRPC packet without it having to be
fragmented to come in this computer's NIC.
(5) If sendmsg() is given MSG_MORE then it should allocate an skb of the
maximum size rather than one just big enough for the data it's got left
to process on the theory that there is more data to come that it can
append to that packet.
This means, for example, that if AFS does a large StoreData op, all the
packets barring the last will be filled to the maximum unfragmented size.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing section annotations and found and fixed some
Coding Style issues.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.
Using a skb backlog queue is fixing all of these problems .
Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove bogus BUG_ON(mutex_is_locked(nlk_sk(sk)->cb_mutex)), when the
netlink_kernel_create caller specifies an external mutex it might
validly be locked.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the server drops its connection, NFS client reconnects using the
same socket after disconnecting. If the new connection's SYN,ACK
doesn't contain the TCP timestamp option and the old connection's did,
tp->tcp_header_len is recomputed assuming no timestamp header but
tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options()
adds in a timestamp option past the end of the allocated TCP header,
overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[],
overwriting skb_shinfo(skb) causing a crash soon after. (The issue was
debugged from such a crash.)
Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK
packet but not reset on disconnect, since they are zeroed out at
initialization. The patch zeroes out the entire tp->rx_opt struct in
tcp_disconnect() to avoid this sort of problem.
Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reuse limited slow-start (RFC3742) included into tcp_cong instead
of having another implementation in High Speed TCP.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate the common push/pull sequences into a few helper functions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I needed to use this recently to talk to a Cisco server. In my case
I only did SNAT while the Cisco server used a different address for
RTP traffic than the one for SIP. I discovered that nf_nat_sip NATed
the RTP address to the SIP one which was unnecessary but OK. However,
in doing so it did not DNAT the destination address on the RTP traffic
to the Cisco back to the original RTP address.
This patch corrects this by noting down the RTP address and using it
when the expectation fires.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also accept the --random option for DNAT to allow randomly selecting a
destination port from the given range.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user passes in MSG_TRUNC the skb is used after getting freed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we can still receive packets until all references to the
socket are gone, we don't need to kill the CB until that happens.
This also aligns ourselves with the receive queue purging which
happens at that point.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete the apparently unused header file net/ipv4/tcp_yeah.h.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make miscellaneous fixes to AFS and AF_RXRPC:
(*) Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig.
(*) Don't use FS_BINARY_MOUNTDATA.
(*) Remove a done 'TODO' item in a comemnt on afs_get_sb().
(*) Don't pass a void * as the page pointer argument of kmap_atomic() as this
breaks on m68k. Patch from Geert Uytterhoeven <geert@linux-m68k.org>.
(*) Use match_*() functions rather than doing my own parsing.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/modules.txt doesn't exist, but
Documentation/kbuild/modules.txt does.
Signed-off-by: Alexander E. Patrakov
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
There's an initialization step here I missed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We're kfree()'ing something that was allocated on the stack!
Also remove an unnecessary symbol export while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I think I botched an attempt to keep an spkm3 patch up-to-date with a recent
crypto api change.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eventually this interface will support versions 3 and 4 of the rpcbind
protocol, which will allow the Linux RPC server to register services on
IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a version of the portmapper that supports versions 3 and 4
of the rpcbind protocol, use it for new RPC client connections over
sockets.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a replacement for the in-kernel portmapper client that supports
all 3 versions of the rpcbind protocol. This code is not used yet.
Original code by Groupe Bull updated for the latest kernel, with multiple
bug fixes.
Note that rpcb_clnt.c does not yet support registering via versions 3 and
4 of the rpcbind protocol. That is planned for a later patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently rpc_malloc sets req->rq_buffer internally. Make this a more
generic interface: return a pointer to the new buffer (or NULL) and
make the caller set req->rq_buffer and req->rq_bufsize. This looks much
more like kmalloc and eliminates the side effects.
To fix a potential deadlock, this patch also replaces GFP_NOFS with
GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside
the RPC's task scheduler while allocating their buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.
To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.
Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two. That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.
And, we can finally be rid of RPC_SLACK_SPACE!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When allocating local ports, do not allow a bind to a port
with a specific local address when a bind to that port with
a wildcard local address already exists.
Noticed by Linus.
Signed-off-by: David S. Miller <davem@davemloft.net>
I accidently applied an earlier version of Eric Dumazet's patch, from
March 21st. His version from March 30th didn't have these bugs, so
this just interdiffs to the correct patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
[IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
[IPV4] SNMP: Support InMcastPkts and InBcastPkts
[IPV4] SNMP: Support InTruncatedPkts
[IPV4] SNMP: Support InNoRoutes
[SNMP]: Add definitions for {In,Out}BcastPkts
[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
[TCP] FRTO: Delay skb available check until it's mandatory
[XFRM]: Restrict upper layer information by bundle.
[TCP]: Catch skb with S+L bugs earlier
[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
[L2TP]: Add the ability to autoload a pppox protocol module.
[SKB]: Introduce skb_queue_walk_safe()
[AF_IUCV/IUCV]: smp_call_function deadlock
[IPV6]: Fix slab corruption running ip6sic
[TCP]: Update references in two old comments
[XFRM]: Export SPD info
[IPV6]: Track device renames in snmp6.
[SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
[NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
[NETPOLL]: Remove CONFIG_NETPOLL_RX
...
A transmitted IP multicast datagram should be counted as OutMcastPkts.
By the same token, a transmitted IP broadcast datagram should be
counted as OutBcastPkts.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A received IP multicast datagram should be counted as InMcastPkts.
By the same token, a received IP broadcast datagram should be
counted as InBcastPkts.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An IP datagram which is being discarded because the datagram frame
didn't carry enough data should be counted as InTruncatedPkts.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An IP datagram which is being discarded because of no routes in the
forwarding path should be counted as InNoRoutes.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.
Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
No new data is needed until the first ACK comes, so no need to check
for application limitedness until then.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.
It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.
1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.
2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some people want to have many UDP sockets, binded to a single port but
many different addresses. We currently hash all those sockets into a
single chain. Processing of incoming packets is very expensive,
because the whole chain must be examined to find the best match.
I chose in this patch to hash UDP sockets with a hash function that
take into account both their port number and address : This has a
drawback because we need two lookups : one with a given address, one
with a wildcard (null) address.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling smp_call_function can lead to a deadlock if it is called
from tasklet context.
Fixing this deadlock requires to move the smp_call_function from the
tasklet context to a work queue. To do that queue the path pending
interrupts to a separate list and move the path cleanup out of
iucv_path_sever to iucv_path_connect and iucv_path_pending.
This creates a new requirement for iucv_path_connect: it may not be
called from tasklet context anymore.
Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and
another one when walking the cpu_online mask. When doing this,
we must disable cpu hotplug.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This updates references to drafts in comments which must be about 10
years old. Internet draft draft-ietf-tcpimpl-prob-03.txt expired in 1998
and was replaced by RFC 2525 in March 1999.
Section 3.10 of the draft maps almost identically into section 2.17 of RFC
2525: both are entitled "Failure to RST on close with data pending", the
differences in text body amount to a typo and minor sentence change.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch you can use iproute2 in user space to efficiently see
how many policies exist in different directions.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
When network device's are renamed, the IPV6 snmp6 code
gets confused. It doesn't track name changes so it will OOPS
when network device's are removed.
The fix is trivial, just unregister/re-register in notify handler.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_getsockopt_local_addrs_old() in net/sctp/socket.c calls
copy_to_user() while the spinlock addr_lock is held. this should not
be done as copy_to_user() might sleep. the call to
sctp_copy_laddrs_to_user() while holding the lock is also problematic
as it calls copy_to_user()
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu conviced me that a new flag was overkill; every driver
currently overrides get_stats, so we might as well make the internal
one the default. If someone did fail to set get_stats, they would now
get all 0 stats instead of "No statistics available".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>