This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.
The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.
We define a large value :
#define LISTENING_NULLS_BASE (1U << 29)
So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch extends existing code:
* Confirm options divide into the confirmed value plus an optional preference
list for SP values. Previously only the preference list was echoed for SP
values, now the confirmed value is added as per RFC 4340, 6.1;
* length and sanity checks are added to avoid illegal memory (or NULL) access.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for Mandatory options is provided by this patch, which will
be used by subsequent feature-negotiation patches.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This extends the scope of two available functions,
encode|decode_value_var, to work up to 6 (8) bytes, to match maximum
requirements in the RFC.
These functions are going to be used both by general option processing
and feature negotiation code, hence declarations have been put into
feat.h.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides function to query the current TX/RX CCID dynamically,
without reliance on the minisock value, using dynamic information
available in the currently loaded CCID module.
This query function is then used to
(a) provide the getsockopt part for getting/setting CCIDs via sockopts;
(b) replace the current test for "which CCID is in use" in probe.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, TX/RX CCIDs can now be changed on a per-connection
basis, which overrides the defaults set by the global sysctl variables
for TX/RX CCIDs.
To make full use of this facility, the remaining patches of this patch
set are needed, which track dependencies and activate negotiated
feature values.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have relevant information for NETLINK protocol, in
/proc/net/protocols, we should use sock_prot_inuse_add() to
update a (percpu and pernamespace) counter of inuse sockets.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Use eq_net() in inet_netns_ok() to speedup socket creation if
!CONFIG_NET_NS
2) Reorder the tests about inet_ehash_secret generation (once only)
Use the unlikely() macro when testing if inet_ehash_secret already
generated.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove redundant argument comments in files of net/*
Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove converting the MAC address to a string by a direct byte
conversion and use %pM instead, since the code is now boilerplate
use a macro to define the show functions, and also use the shorter
__ATTR_RO macro to define the attributes.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The old ieee80211 code only remains as a support library for the ipw2100
and ipw2200 drivers. So, move the code and rename it appropriately to
reflects it's true purpose and status.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These bits are shared already between ipw2x00 and hostap, and could
probably be shared both more cleanly and with other drivers. This
commit simply relocates the code to lib80211 and adjusts the drivers
appropriately.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes unnecessary #include <linux/netdevice.h> from
/net/mac80211/mlme.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch eliminates sparse warnings in pid rate scale algorithm
'debugfs: allow access to signed values' patch hit the dead end
year ago w/o much echo so I guess there is no real need to address this
properly.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Delete kernel-doc struct descriptions for fields that don't exist:
Warning(include/net/mac80211.h:1263): Excess struct/union/enum/typedef member 'conf_ht' description in 'ieee80211_ops'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'addr' description in 'sta_info'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'aid' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mesh interfaces are currently opened with the FIF_ALLMULTI rx filter flag set,
however there is no BSSID in mesh so BSSID filtering should be disabled by
setting the FIF_OTHER_BSS flag as well. Also explicitly call
ieee80211_configure_filter for mesh.
Signed-off-by: Andrey Yurovsky <andrey@cozybit.com>
Signed-off-by: Javier Cardona <javier@cozbit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adds an interface to configure the Backward Congestion Notification
(BCN) feature. In a BCN capabale network, congestion notifications
from congested points out in the network can cause the end station
limit the rate of a given traffic flow.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a netlink interface for Data Center Bridging (DCB) to get and set
the enable state of the Priority Flow Control (PFC) feature.
Primarily, this is a way to turn off PFC in the driver while DCB
remains enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds interface for Data Center Bridging (DCB) to query (and set if
supported) the number of traffic classes currently supported by the
device for the two (DCB) features: priority groups (PG) and priority
flow control (PFC).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds to the netlink interface for Data Center Bridging (DCB), allowing
the DCB capabilities supported by a device to be queried.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.
/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.
This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like IPV4, convert the tunnel virtual devices to use net_device_ops.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to network device ops. Needed to change to directly call
the init routine since two sides share same ops. In the process
found by inspection a device ref count leak if register_netdevice failed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the HIPPI infrastructure for use with net_device_ops.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to ethernet. Convert infrastructure and the one lone FDDI
driver (for the one lone user of that hardware??). Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to new network device ops interface.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a regression reported by Max Kellermann whereby kernel profiling
showed that his clients were spending 45% of their time in
rpcauth_lookup_credcache.
It turns out that although his processes had identical uid/gid/groups,
generic_match() was failing to detect this, because the task->group_info
pointers were not shared. This again lead to the creation of a huge number
of identical credentials at the RPC layer.
The regression is fixed by comparing the contents of task->group_info
if the actual pointers are not identical.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
net: fix tiny output corruption of /proc/net/snmp6
atl2: don't request irq on resume if netif running
ipv6: use seq_release_private for ip6mr.c /proc entries
pkt_sched: fix missing check for packet overrun in qdisc_dump_stab()
smc911x: Fix printf format typo in smc911x driver.
asix: Fix asix-based cards connecting to 10/100Mbs LAN.
mv643xx_eth: fix recycle check bound
mv643xx_eth: fix the order of mdiobus_{unregister, free}() calls
sh: sh_eth: Update to change of mii_bus
TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header()
TPROXY: fill struct flowi->flags in udp_sendmsg()
net: ipg.c fix bracing on endian swapping
phylib: Fix auto-negotiation restart avoidance
net: jme.c rxdesc.flags is __le16, other missing endian swaps
phylib: fix phy name example in documentation
net: Do not fire linkwatch events until the device is registered.
phonet: fix compilation with gcc-3.4
ixgbe: fix compilation with gcc-3.4
pktgen: fix multiple queue warning
net: fix ip_mr_init() error path
...
1. Make device driver to allocate memory for netdev.
2. Convert all directly reference of netdev->priv to netdev_priv().
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because "name" is static, it can be occasionally be filled with
somewhat garbage if two processes read /proc/net/snmp6.
Also, remove useless casts and "-1" -- snprintf() correctly terminates it's
output.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ip6mr.c, /proc entries /proc/net/ip6_mr_cache and /proc/net/ip6_mr_vif
are opened with seq_open_private(), thus seq_release_private() should be
used to release them.
Should fix a small memory leak.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of xchg() hasn't been necessary since 2.2.something when proper
locking was added to packet schedulers. In the case of classifiers they
mostly weren't even necessary before that since they're mainly used
to assign a NULL pointer to the filter root in the ->destroy path;
the root is destroyed immediately after that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of xchg() hasn't been necessary since 2.2.something when proper
locking was added to packet schedulers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add classful DRR scheduler as a more flexible replacement for SFQ.
The main difference to the algorithm described in "Efficient Fair Queueing
using Deficit Round Robin" is that this implementation doesn't drop packets
from the longest queue on overrun because its classful and limits are
handled by each individual child qdisc.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start() might return NULL, causing a NULL pointer dereference.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes sparse warnings:
net/ipv4/ip_sockglue.c:146:15: warning: incorrect type in assignment (different base types)
net/ipv4/ip_sockglue.c:146:15: expected restricted __be16 [assigned] [usertype] sin_port
net/ipv4/ip_sockglue.c:146:15: got unsigned short [unsigned] [short] [usertype] <noident>
net/ipv4/ip_sockglue.c:130:6: warning: symbol 'ip_cmsg_recv_dstaddr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_sk_rebuild_header() does a new route lookup if the dst_entry
associated with a socket becomes stale. However inet_sk_rebuild_header()
didn't use struct flowi->flags, causing the route lookup to
fail for foreign-bound IP_TRANSPARENT sockets, causing an error
state to be set for the sockets in question.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
udp_sendmsg() didn't fill struct flowi->flags, which means that
the route lookup would fail for non-local IPs even if the
IP_TRANSPARENT sockopt was set.
This prevents sendto() to work properly for UDP sockets, whereas
bind(foreign-ip) + connect() + send() worked fine.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
SKF_AD_NLATTR allows us to find the first matching attribute in a
stream of netlink attributes from one offset to the end of the
netlink message. This is not suitable to look for a specific
matching inside a set of nested attributes.
For example, in ctnetlink messages, if we look for the CTA_V6_SRC
attribute in a message that talks about an IPv4 connection,
SKF_AD_NLATTR returns the offset of CTA_STATUS which has the same
value of CTA_V6_SRC but outside the nest. To differenciate
CTA_STATUS and CTA_V6_SRC, we would have to make assumptions on the
size of the attribute and the usual offset, resulting in horrible
BSF code.
This patch adds SKF_AD_NLATTR_NEST, which is a variant of
SKF_AD_NLATTR, that looks for an attribute inside the limits of
a nested attributes, but not further.
This patch validates that we have enough room to look for the
nested attributes - based on a suggestion from Patrick McHardy.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>