We need to try to ensure that we always use the same credentials whenever
we re-establish the clientid on the server. If not, the server won't
recognise that we're the same client, and so may not allow us to recover
state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
With the recent change to generic creds, we can no longer use
cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and
AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name
instead...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want to ensure that req->rq_private_buf.len is updated before
req->rq_received, so that call_decode() doesn't use an old value for
req->rq_rcv_buf.len.
In 'call_decode()' itself, instead of using task->tk_status (which is set
using req->rq_received) must use the actual value of
req->rq_private_buf.len when deciding whether or not the received RPC reply
is too short.
Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
request. A typo meant that we were resetting req->rq_private_buf.len in
call_decode(), and then clobbering that value with the old rq_rcv_buf.len
again in xprt_transmit().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
..and always destroy using a 'soft' RPC call. Destroying GSS credentials
isn't mandatory; the server can always cope with a few credentials not
getting destroyed in a timely fashion.
This actually fixes a hang situation. Basically, some servers will decide
that the client is crazy if it tries to destroy an RPC context for which
they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk to it
for a while.
The regression therefor probably was introduced by commit
0df7fb74fb.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.
SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.
Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
call_verify() can, under certain circumstances, free the RPC slot. In that
case, our cached pointer 'req = task->tk_rqstp' is invalid. Bug was
introduced in commit 220bcc2afd (SUNRPC:
Don't call xprt_release in call refresh).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Use new node_to_cpumask_ptr. This creates a pointer to the
cpumask for a given node. This definition is in mm patch:
asm-generic-add-node_to_cpumask_ptr-macro.patch
* Use new set_cpus_allowed_ptr function.
Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
[x86/latest]: x86: add cpus_scnprintf function
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().
So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.
This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.
[AV: merged mkdir handling, added missing nfsd pieces]
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: fix up documentation for security_module_enable
Security: Introduce security= boot parameter
Audit: Final renamings and cleanup
SELinux: use new audit hooks, remove redundant exports
Audit: internally use the new LSM audit hooks
LSM/Audit: Introduce generic Audit LSM hooks
SELinux: remove redundant exports
Netlink: Use generic LSM hook
Audit: use new LSM hooks instead of SELinux exports
SELinux: setup new inode/ipc getsecid hooks
LSM: Introduce inode_getsecid and ipc_getsecid hooks
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
Don't use SELinux exported selinux_get_task_sid symbol.
Use the generic LSM equivalent instead.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Paul Moore <paul.moore@hp.com>
This patch effectively reverts commit d0498d9ae1
aka "[NET]: Do not allocate unneeded memory for dev->priv alignment."
It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST
removal.
For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size"
was also 2048 bytes, but allocator with debugging options turned on started
giving out !32-byte aligned memory resulting in redzones overwrites.
Patch does small optimization in ->priv'less case: bumping size to next
32-byte boundary was always done to ensure ->priv will also be aligned.
But, no ->priv, no need to do that.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes bugzilla #8895
If a super-tree leaf has 'rt' assigned to it and we
get an error from fib6_add_rt2node(), we'll leave
a reference to 'rt' in pn->leaf and then do an
unconditional dst_free().
We should prune such references.
Based upon a report by Vincent Perrier.
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_get_by_index() may return NULL if nothing is found. In
net/netlabel/netlabel_unlabeled.c::netlbl_unlabel_staticlist_gen() the
function is called, but the return value is never checked. If it returns
NULL then we'll deref a NULL pointer on the very next line.
I checked the callers, and I don't think this can actually happen today,
but code changes over time and in the future it might happen and it does
no harm to be defensive and check for the failure, so that if/when it
happens we'll fail gracefully instead of crashing.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
datalen is unsigned so it can never be less than zero,
but that's ok because the attribute passed to nla_len()
has been validated and therefore a negative return
value is impossible.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This deblats ~200 bytes when ipv6 and dccp are 'y'.
Besides, this will ease compilation issues for patches
I'm working on to make inet hash tables more scalable
wrt net namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As I can see from the code, two places (tcp_v6_syn_recv_sock and
dccp_v6_request_recv_sock) that call this one already run with
BHs disabled, so it's safe to call __inet_inherit_port there.
Besides (in case I missed smth with code review) the calltrace
tcp_v6_syn_recv_sock
`- tcp_v4_syn_recv_sock
`- __inet_inherit_port
and the similar for DCCP are valid, but assumes BHs to be disabled.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC4960 7.2.2,
When all of the data transmitted by the sender has
been acknowledged by the recerver, partial_bytes_acked is initialized to 0.
This patch conforms to rfc requirement.
Without this fix, cwnd might be error incremented.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions. Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated. Add this new union to struct
ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().
Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.
Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit. The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing). Remove the flag from
all existing drivers that set it until we know which ones are OK.
The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.
This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch is a largely cosmetic cleanup of the TIPC reference
table code.
- The object reference field in each table entry is now single
32-bit integer instead of a union of two 32-bit integers.
- Variable naming has been made more consistent.
- Error message output has been made more consistent.
- Useless #includes have been eliminated.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies TIPC's reference table code to delay initializing
table entries until they are actually needed by applications.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the TIPC reference table locking routines
into non-inlined routines, since they are mainly called from
non-performance critical areas of TIPC and the added code
footprint incurred through inlining can no longer be justified.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reworks the scanning code (ieee80211_rx_bss_info) to take
more parameters from beacons and keep a BSS info structure alive when
only beacons for it are received. This fixes a problem with iwlwifi
drivers (where we don't understand the root cause of the problem yet)
and another driver for some broken hardware (which cannot send probe
requests unless associated, so can't always actively scan.)
Signed-off-by: Bill Moss <bmoss@clemson.edu>
[jmberg: reformatted comments, make probe_resp a bool]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows creating interfaces in WDS mode or switching
existing ones into WDS mode (both via cfg80211 and wext.)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we add multiple todo entries, we rely on them being executed
mostly in the right order, especially when a key is being replaced.
But when a default key is replaced, the todo list order will differ
from the order when the key being replaced is not a default key, so
problems will happen. Hence, just move each todo item to the end of
the list when it is added so we can in the other code ensure that
hw accel for a key will be disabled before it is enabled for the
replacement.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When STAs are expired, we need to hold the sta_lock. Using
the same lock for keys too would then mean we'd need another
key free function, and that'll just lead to confusion, so just
use a new spinlock for all key lists.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is no need to send BSS changes to driver from beacons processed
during scanning. We are more interested in beacons from an AP with which
we are associated - these will still be used to send updates to driver as
the beacons are received without scanning.
This change·removes the requirement that bss_info_changed needs to be atomic.
The beacons received during scanning are processed from a tasklet, but if we
do not call bss_info_changed for these beacons there is no need for it to be
atomic. This function (bss_info_changed) is called either from workqueue or
ioctl in all other instances.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There were a few more instances of sta_info_get calls not being
protected by RCU, fix them.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The previous key locking patch left a small race: it would be possible
to add a key and take the interface down before the key todo is run so
that hwaccel for that key is enabled on an interface that is down. Avoid
this by running the todo list when an interface is brought up or down.
This patch also fixes a small bug: before this change, a few functions
used the key list without the lock that protects it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This one got renamed, complicating the merge a bit...this should restore
it to its intended state.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Add return value indication to tcp_prune_ofo_queue().
PS3: gelic: fix the oops on the broken IE returned from the hypervisor
b43legacy: fix DMA mapping leakage
mac80211: remove message on receiving unexpected unencrypted frames
Update rt2x00 MAINTAINERS entry
Add rfkill to MAINTAINERS file
rfkill: Fix device type check when toggling states
b43legacy: Fix usage of struct device used for DMAing
ssb: Fix usage of struct device used for DMAing
MAINTAINERS: move to generic repository for iwlwifi
b43legacy: fix initvals loading on bcm4303
rtl8187: Add missing priv->vif assignments
netconsole: only set CON_PRINTBUFFER if the user specifies a netconsole
[CAN]: Update documentation of struct sockaddr_can
MAINTAINERS: isdn4linux@listserv.isdn4linux.de is subscribers-only
[TCP]: Fix never pruned tcp out-of-order queue.
[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop
Describe debug parameters with their names (and not their values).
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alloc_netdev_mq() tries to produce 32-bytes alignment for both
the net_device itself and its private data. The second alignment is
achieved by adding the NETDEV_ALIGN_CONST to the whole size of
the memory to be allocated.
However, for those devices that do not need the private area, this
addition just makes the net_device weight 1024 + 32 = 1068 bytes,
i.e. consume twice as much memory.
Since loopback device is such (sizeof_priv == 0 for it), and each
net namespace creates one, this can save a noticeable amount of
memory for kernel with net namespaces turned on.
After this set the lo device is actually allocated from a size-1024
kmem cache on i386 box even with NETPOLL and WIRELESS_EXT turned on.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_net is called for
- just allocated devices
- devices moving from one namespace to another
release_net has proper check inside to distinguish these cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocol control sockets and netlink kernel sockets should not prevent the
namespace stop request. They are initialized and disposed in a special way by
sk_change_net/sk_release_kernel.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make release_net/hold_net noop for performance-hungry people. This is a debug
staff and should be used in the debug mode only.
Add check for net != NULL in hold/release calls. This will be required
later on.
[ Added minor simplifications suggested by Brian Haley. -DaveM ]
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
And no need in some IPPROTO_XXX enabling, since ipv6 code
doesn't have any filtering.
So, just set proper net and mark device with NETNS_LOCAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the ip_route_output_key(), dev_get_by_...() and ipv6_chk_addr()
calls are now stubbed with init_net.
Fortunately, all the places already have where to get the proper
net from.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move hashes in the struct ip6_tnl_net, replace tnls_xxx[] with
ip6n->tnlx_xxx[] and handle init and exit appropriately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the code, that reference it already has the ip6_tnl_net pointer,
so s/ip6_fb_tnl_dev/ip6n->fb_tnl_dev/ and move creation/releasing
code into net init/exit ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to ip6_tnl_lookup were stubbed with init_net - give them
a proper one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hashes and fallback device used in them will be per-net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sit-generated traffic enter the namespace.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set proper net and mark a new device as NETNS_LOCAL before registering.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. replace init_net stubs in ip_route_output_key() calls.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just move all the hashes on the sit_net structure and
patch the rest of the code appropriately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate and register one in sit_init_net, use sitn->fb_tunnel_dev
over the code and unregister one in sit_exit_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace introduced in the previous patch init_net stubs
with the proper net pointer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
... to make them prepared for future hashes and fallback device
move on the struct sit_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one was also disabled by default for sanity.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. set the proper net and mark as NETNS_LOCAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for the IPIP tunnel, there are some ip_route_output_key()
calls in there that require a proper net so give one to them.
And a proper net for the __get_dev_by_index hanging around.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Very similar to what was done for the IPIP code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Everything is prepared for this change now. Create on in
init callback, use it over the code and destroy on net exit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the part#2 of the patch #2 - get the proper net for
these functions. This change in a separate patch in order not
to get lost in a large previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fallback device and hashes are to become per-net, but many
code doesn't have anything to get the struct net pointer from.
So pass the proper net there with an extra argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the proper net before calling register_netdev and disable
the tunnel device netns changing.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some ip_route_output_key() calls in there that require
a proper net so give one to them.
Besides - give a proper net to a single __get_dev_by_index call
in ipip_tunnel_bind_dev().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Either net or ipip_net already exists in all the required
places, so just use one.
Besides, tune net_init and net_exit calls to respectively
initialize the hashes and destroy devices.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the part#2 of the previous patch - get the proper
net for these functions.
I make it in a separate patch, so that this change does not
get lost in a large previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashes of tunnels will be per-net too, so prepare all the
functions that uses them for this change by adding an argument.
Use init_net temporarily in places, where the net does not exist
explicitly yet.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create on in ipip_init_net(), use it all over the code (the
proper place to get the net from already exists) and destroy
in ipip_net_exit().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When van device is moved to another namespace proc files,
related to this device, should also change one.
Use the netdev REGISTER and UNREGISTER event handlers for this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is similar to what I've done for TUN - set the proper
net after device allocation and clean VLANs on net exit (use the
rtnl_kill_links helper finally).
Plus, drop explicit init_net usage and net != &init_net checks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes moving one on the struct vlan_net and
s/vlan_name_type/vn->name_type/ over the code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is created in a proper net, so make is show info, related
to this particular net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc_vlan_dir and proc_vlan_conf migrate on the struct
vlan_net and their creation uses the struct net.
The devices' entries use the corresponding device's net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All proc files will be created in each net, so prepare them for
this change now, not to mess it with real creation patch.
The net != &init_net checks in them are for git-bisect sanity,
but I will drop them soon.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike TUN, it is empty from the very beginning, and will
be eventually populated later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently vlan group is searched using one key - the ifindex.
We'll have to lookup the vlan_group by two keys - ifindex and
net. Turning the vlan_group lookup key to struct net_device
pointer will make this process easier.
Besides, this will eliminate one more place in the networking,
that assumes that indexes are unique in the kernel.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each potential list_del (happening from inside a ->dellink call)
is followed by goto restart, so there's no need in _safe iteration.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed can only be more strict than what was checked by the
earlier common case check for non-tail skbs, thus
cwnd_len <= needed will never match in that case anyway.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returns non-zero if tp->out_of_order_queue was seen non-empty.
This allows tcp_try_rmem_schedule() to return early.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is not possible to read/write to an eeprom larger than
128k in size because the buffer used for temporarily storing the
eeprom contents is allocated using kmalloc. kmalloc can only allocate
a maximum of 128k depending on architecture.
Modified ethtool_get/set_eeprom to only allocate a page of memory and
then copy the eeprom a page at a time.
Updated original patch as per suggestions from Joe Perches.
Signed-off-by: Mandeep Singh Baines <msb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of hrtimers to support high resolution capabilities, when
provided by the system clocksource.
The conversion to hrtimers additionally discovered and solved an
unlikely race condition that has been reproduced under (unrealistic)
massive receive load, which can only be produced on vcan software devices.
[ Fix printf format warnings on 64-bit -DaveM ]
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that TIPC properly handles incoming messages
that have incorrect or unexpected formats. Most significantly,
it now ensures that each sl_buff has at least as much data as
the message header indicates it should, and that the entire
message header is stored contiguously; this prevents TIPC from
accidentally accessing memory that is not part of the sk_buff.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows TIPC to process incoming messages that are
stored in a fragmented sk_buff, by forcing the linearization
of any such messages it receives.
Note: This is an interim solution to allow TIPC to operate with
Ethernet devices that generate non-linear buffers (such as the
gianfar driver), until such time as the rest of TIPC is enhanced
to handle sk_buffs with multiple data areas.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch causes TIPC to allocate fast clonable sk_buffs,
rather than standard ones. This speeds up the cloning
operation done by the link code each time a message is sent
off-node.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates a null pointer check when discarding a
TIPC message buffer, since kfree_skb() already handles this
situation.
Acknowledgements to Florian Westphal (fw@strlen.de> for
suggesting this enhancement.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some people are getting this message a lot, and we have traced it to
broken access points that much too often send completely empty frames
(all bytes zeroed, which they shouldn't do at all.)
Since we cannot do anything about such frames in any case except the
special case where we're debugging an AP, just remove the message.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rfkill_switch_all() is supposed to only switch all the interfaces of a
given type, but does not actually do this; instead, it just switches
everything currently in the same state.
Add the necessary type check in.
(This fixes a bug I've been seeing while developing an rfkill laptop
driver, with both bluetooth and wireless simultaneously changing state
after only pressing either KEY_WLAN or KEY_BLUETOOTH).
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the elastic array of void * pointer to the struct net.
The access rules are simple:
1. register the ops with register_pernet_gen_device to get
the id of your private pointer
2. call net_assign_generic() to put the private data on the
struct net (most preferably this should be done in the
->init callback of the ops registered)
3. do not store any private reference on the net_generic array;
4. do not change this pointer while the net is alive;
5. use the net_generic() to get the pointer.
When adding a new pointer, I copy the old array, replace it
with a new one and schedule the old for kfree after an RCU
grace period.
Since the net_generic explores the net->gen array inside rcu
read section and once set the net->gen->ptr[x] pointer never
changes, this grants us a safe access to generic pointers.
Quoting Paul: "... RCU is protecting -only- the net_generic
structure that net_generic() is traversing, and the [pointer]
returned by net_generic() is protected by a reference counter
in the upper-level struct net."
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make some per-net generic pointers, we need some way to address
them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
subsystems.
Addressing questions about potential checkpoint/restart problems:
these IDs are "lite-offsets" within the net structure and are by no
means supposed to be exported to the userspace.
Since it will be used in the nearest future by devices only (tun,
vlan, tunnels, bridge, etc), I make it resemble the functionality
of register_pernet_device().
The new ids is stored in the *id pointer _before_ calling the init
callback to make this id available in this callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_prune_queue() doesn't prune an out-of-order queue at all.
Therefore sk_rmem_schedule() can fail but the out-of-order queue isn't
pruned . This can lead to tcp deadlock state if the next two
conditions are held:
1. There are a sequence hole between last received in
order segment and segments enqueued to the out-of-order queue.
2. Size of all segments in the out-of-order queue is more than tcp_mem[2].
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even kernel 2.2.26 (sic) already contains the
#undef CONFIG_IRLAN_SEND_GRATUITOUS_ARP
with the comment "but for some reason the machine crashes if you use DHCP".
Either someone finally looks into this or it's simply time to remove
this dead code.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies TIPC's socket code to follow the same approach
used by other protocols. This change eliminates the need for a
mutex in the TIPC-specific portion of the socket protocol data
structure -- in its place, the standard Linux socket backlog queue
and associated locking routines are utilized. These changes fix
a long-standing receive queue bug on SMP systems, and also enable
individual read and write threads to utilize a socket without
unnecessarily interfering with each other.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes TIPC's connect routine to conform to Linux
kernel style norms of indentation, line length, etc.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch causes TIPC to return an error indication if the non-
blocking form of connect() is requested (which TIPC does not yet
support).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bug that allowed TIPC to queue 1 more message
than allowed by the socket receive queue threshold limits. The
patch also improves the threshold code's logic and naming to help
prevent this sort of error from recurring in the future.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that padding bytes appearing at the end of
an incoming TIPC message are not returned as valid stream data.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a stream socket to receive data from multiple
TIPC messages in its receive queue, without requiring the use of
the MSG_WAITALL flag.
Acknowledgements to Florian Westphal <fw-tipc@strlen.de> for
identifying this issue and suggesting how to correct it.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimizes the receive path for SOCK_DGRAM and SOCK_RDM
messages by skipping over code that handles connection-based flow
control.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC_H_MAJ(parentid) for root classes is the same as for ingress, and if
ingress qdisc is created qdisc_lookup() returns its pointer (without
ingress NULL is returned). After this all qdisc_lookups give the same,
and we get endless loop. (I don't know how this could hide for so long
- it should trigger with every leaf class deleted if it's qdisc isn't
empty.)
After this fix qdisc_lookup() is omitted both for ingress and root
parents, but looking for root is only wasting a little time here...
Many thanks to Enrico Demarin for finding a test for catching this
bug, which probably bothered quite a lot of admins.
Reported-by: Enrico Demarin <enrico@superclick.com>,
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SECURITY_NETWORK_XFRM is undefined the following warnings appears:
net/xfrm/xfrm_user.c: In function 'xfrm_add_pol_expire':
net/xfrm/xfrm_user.c:1576: warning: 'ctx' may be used uninitialized in this function
net/xfrm/xfrm_user.c: In function 'xfrm_get_policy':
net/xfrm/xfrm_user.c:1340: warning: 'ctx' may be used uninitialized in this function
(security_xfrm_policy_alloc is noop for the case).
It seems that they are result of the commit
03e1ad7b5d ("LSM: Make the Labeled IPsec
hooks more stack friendly")
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
[BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
[NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
[IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
[IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
[IPV6]: Use appropriate sock tclass setting for routing lookup.
[IPV6]: IPv6 extension header structures need to be packed.
[IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
[NET]: Return more appropriate error from eth_validate_addr().
[ISDN]: Do not validate ISDN net device address prior to interface-up
[NET]: Fix kernel-doc for skb_segment
[SOCK] sk_stamp: should be initialized to ktime_set(-1L, 0)
net: check for underlength tap writes
net: make struct tun_struct private to tun.c
[SCTP]: IPv4 vs IPv6 addresses mess in sctp_inet[6]addr_event.
[SCTP]: Fix compiler warning about const qualifiers
[SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK
[SCTP]: Add check for hmac_algo parameter in sctp_verify_param()
[NET_SCHED] cls_u32: refcounting fix for u32_delete()
[DCCP]: Fix skb->cb conflicts with IP
[AX25]: Potential ax25_uid_assoc-s leaks on module unload.
...
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.
Can we please add a comment about why we do not need one?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These sockets now have a bit other names and are no longer global.
Shame on me, I haven't provided a good comment for this when
sending DCCP netnsization patches.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ebtables nflog watcher to the kernel in order to
allow ebtables log through the nfnetlink_log backend.
Signed-off-by: Peter Warasin <peter@endian.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Directly call IPv4 and IPv6 variants where the address family is
easily known.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Invalid states can cause out-of-bound memory accesses of the state table.
Also don't insist on having a new state contained in the netlink message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.
The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.
Also change NF_CT_ASSERT to use WARN_ON to get backtraces.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.
The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Locally generated ICMP packets have a reference to the conntrack entry
of the original packet manually attached by icmp_send(). Therefore the
check for locally originated untracked ICMP redirects can never be
true.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.
Return EAGAIN so userspace can retry the operation.
Fixes (with current iptables SVN version) netfilter bugzilla #104.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Some callers pass uninitialized structures, clear the address to make
sure later comparisions work properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>