RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add to validate initiate tag and chunk type if verification
tag is 0 when handling ICMP message.
RFC 4960, Appendix C. ICMP Handling
ICMP6) An implementation MUST validate that the Verification Tag
contained in the ICMP message matches the Verification Tag of the peer.
If the Verification Tag is not 0 and does NOT match, discard the ICMP
message. If it is 0 and the ICMP message contains enough bytes to
verify that the chunk type is an INIT chunk and that the Initiate Tag
matches the tag of the peer, continue with ICMP7. If the ICMP message
is too short or the chunk type or the Initiate Tag does not match,
silently discard the packet.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
genetlink has a circular locking dependency when dumping the registered
families:
- dump start:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time
- dump continuance:
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex
Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.
Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.
Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 608961a5ec.
The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.
This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).
In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key. The ESP protocol code in the IPSEC stack can be used as a
model for implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
In net/ipv6/tcp_ipv6.c:
- Remove unneeded tcp_v6_send_check() declaration.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.
Suggested (although only half-seriously) by Evgeniy Polyakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.
The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.
Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tihomir Heidelberg - 9a4gl, reports:
--------------------
I would like to direct you attention to one problem existing in ax.25
kernel since 2.4. If listening socket is closed and its SKB queue is
released but those sockets get weird. Those "unAccepted()" sockets
should be destroyed in ax25_std_heartbeat_expiry, but it will not
happen. And there is also a note about that in ax25_std_timer.c:
/* Magic here: If we listen() and a new link dies before it
is accepted() it isn't 'dead' so doesn't get removed. */
This issue cause ax25d to stop accepting new connections and I had to
restarted ax25d approximately each day and my services were unavailable.
Also netstat -n -l shows invalid source and device for those listening
sockets. It is strange why ax25d's listening socket get weird because of
this issue, but definitely when I solved this bug I do not have problems
with ax25d anymore and my ax25d can run for months without problems.
--------------------
Actually as far as I can see, this problem is even in releases
as far back as 2.2.x as well.
It seems senseless to special case this test on TCP_LISTEN state.
Anything still stuck in state 0 has no external references and
we can just simply kill it off directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
In commits 33c732c361 ([IPV4]: Add raw
drops counter) and a92aa318b4 ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6
This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).
This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.
# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0
# cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt ---
uid timeout inode ref pointer drops
75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2286 2 ffff81042dd35c80 146348
In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Permit bonding to function rationally if max_bonds is set to
zero. This will load the module, but create no master devices (which can
be created via sysfs).
Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible. Instead, an interface from net/core is
created to create and destroy files in net_class.
Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add NETDEV_BONDING_FAILOVER event to be used in a successive patch
by bonding to announce fail-over for the active-backup mode through the
netdev events notifier chain mechanism. Such an event can be of use for the
RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER)
always be aligned with the IP stack, in the sense that they use the same
ports/links as the stack does. More usages can be done to allow monitoring
tools based on netlink events being aware to bonding fail-over.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ROSE network is organized through nodes connected via hamradio or Internet.
AX25 packet radio frames sent to a remote ROSE address destination are routed
through these nodes.
Without the present patch, automatic routing mechanism did not work optimally
due to an improper parameter checking.
rose_get_neigh() function is called either by rose_connect() or by
rose_route_frame().
In the case of a call from rose_connect(), f0 timer is checked to find if a connection
is already pending. In that case it returns the address of the neighbour, or returns a NULL otherwise.
When called by rose_route_frame() the purpose was to route a packet AX25 frame
through an adjacent node given a destination rose address.
However, in that case, t0 timer checked does not indicate if the adjacent node
is actually connected even if the timer is not null. Thus, for each frame sent, the
function often tried to start a new connexion even if the adjacent node was already connected.
The patch adds a "new" parameter that is true when the function is called by
rose route_frame().
This instructs rose_get_neigh() to check node parameter "restarted".
If restarted is true it means that the route to the destination address is opened via a neighbour
node already connected.
If "restarted" is false the function returns a NULL.
In that case the calling function will initiate a new connection as before.
This results in a fast routing of frames, from nodes to nodes, until
destination is reached, as originaly specified by ROSE protocole.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When generating the ip header for the transformed packet we just copy
the frag_off field of the ip header from the original packet to the ip
header of the new generated packet. If we receive a packet as a chain
of fragments, all but the last of the new generated packets have the
IP_MF flag set. We have to mask the frag_off field to only keep the
IP_DF flag from the original packet. This got lost with git commit
36cf9acf93 ("[IPSEC]: Separate
inner/outer mode processing on output")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix bridge netfilter code so that it uses CONFIG_IPV6 as needed:
net/built-in.o: In function `ebt_filter_ip6':
ebt_ip6.c:(.text+0x87c37): undefined reference to `ipv6_skip_exthdr'
net/built-in.o: In function `ebt_log_packet':
ebt_log.c:(.text+0x88dee): undefined reference to `ipv6_skip_exthdr'
make[1]: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, the bridge just chooses the smallest mac address as the
bridge id and mac address of bridge device. But if the administrator
has explictly set the interface address then don't change it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any frame addressed to link-local addresses should be processed by local
receive path. The earlier code would process them only if STP was enabled.
Since there are other frames like LACP for bonding, we should always
process them.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the sctp_remaddr_proc_init failed, the proper rollback is
not the sctp_remaddr_proc_exit, but the sctp_assocs_proc_exit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:
CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000004 c00a67f0
$ 4 : 802a5ad0 81657e00 00000000 00000000
$ 8 : 00000008 801461c8 00000000 80570050
$12 : 819b0280 819b04b0 00000006 00000000
$16 : 802a5a60 80000000 80b46000 80321010
$20 : 00000000 00000004 802a5ad0 00000001
$24 : 00000000 802257a8
$28 : 802a4000 802a59e8 00000004 801d4e7c
Hi : 0000000b
Lo : 00506320
epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P
ra : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403 KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
...
Call Trace:
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4f8c>] nf_hook_slow+0x9c/0x1a4
One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.
Reported-by: liannan <liannan@twsz.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly free h323_buffer when helper registration fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix three ct_extend/NAT extension related races:
- When cleaning up the extension area and removing it from the bysource hash,
the nat->ct pointer must not be set to NULL since it may still be used in
a RCU read side
- When replacing a NAT extension area in the bysource hash, the nat->ct
pointer must be assigned before performing the replacement
- When reallocating extension storage in ct_extend, the old memory must
not be freed immediately since it may still be used by a RCU read side
Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In nr_release(), one code path calls sock_orphan() which
will NULL out sk->sk_socket already.
In the other case, handling states other than NR_STATE_{0,1,2,3},
seems to not be possible other than due to bugs. Even for an
uninitialized nr->state value, that would be zero or NR_STATE_0.
It might be wise to stick a WARN_ON() here.
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't grab the sk_callback_lock, it doesn't NULL out
the sk->sk_sleep waitqueue pointer, etc.
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't grab the sk_callback_lock, it doesn't NULL out
the sk->sk_sleep waitqueue pointer, etc.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the x25 variant of changeset
9375cb8a12
("ax25: Use sock_graft() and remove bogus sk_socket and sk_sleep init.")
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the rose variant of changeset
9375cb8a12
("ax25: Use sock_graft() and remove bogus sk_socket and sk_sleep init.")
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the netrom variant of changeset
9375cb8a12
("ax25: Use sock_graft() and remove bogus sk_socket and sk_sleep init.")
Signed-off-by: David S. Miller <davem@davemloft.net>
The way that listening sockets work in ax25 is that the packet input
code path creates new socks via ax25_make_new() and attaches them
to the incoming SKB. This SKB gets queued up into the listening
socket's receive queue.
When accept()'d the sock gets hooked up to the real parent socket.
Alternatively, if the listening socket is closed and released, any
unborn socks stuff up in the receive queue get released.
So during this time period these sockets are unreachable in any
other way, so no wakeup events nor references to their ->sk_socket
and ->sk_sleep members can occur. And even if they do, all such
paths have to make NULL checks.
So do not deceptively initialize them in ax25_make_new() to the
values in the listening socket. Leave them at NULL.
Finally, use sock_graft() in ax25_accept().
Signed-off-by: David S. Miller <davem@davemloft.net>
Three major portions to this change:
1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
and IW_EV_COMPAT_POINT_LEN helper defines.
2) Delete iw_stream_check_add_*(), they are unused.
3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
size the event and pointer lengths correctly depending upon whether
IW_REQUEST_FLAG_COMPAT is set or not.
4) The mechanical transformations to the drivers and wireless stack
bits to get the iw_request_info passed down into the routines
modified in #3. Also, explicit references to IW_EV_LCP_LEN are
replaced with iwe_stream_lcp_len(info).
With a lot of help and bug fixes from Masakazu Mokuno.
Signed-off-by: David S. Miller <davem@davemloft.net>
Next we can kill the hacks in fs/compat_ioctl.c and also
dispatch compat ioctls down into the driver and 80211 protocol
helper layers in order to handle iw_point objects embedded in
stream replies which need to be translated.
Signed-off-by: David S. Miller <davem@davemloft.net>
It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Same as for inet_hashfn, prepare its ipv6 incarnation.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although this hash takes addresses into account, the ehash chains
can also be too long when, for instance, communications via lo occur.
So, prepare the inet_hashfn to take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Listening-on-one-port sockets in many namespaces produce long
chains in the listening_hash-es, so prepare the inet_lhashfn to
take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Binding to some port in many namespaces may create too long
chains in bhash-es, so prepare the hashfn to take struct net
into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every caller already has this one. The new argument is currently
unused, but this will be fixed shortly.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
They both calculate the hash chain, but currently do not have
a struct net pointer, so pass one there via additional argument,
all the more so their callers already have such.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the chain to store a UDP socket is calculated with
simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account
would make this calculation a bit more complex, so moving it into
a function would help.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Remove ICMP_MIN_LENGTH, as it is unused.
2) Remove unneeded tcp_v4_send_check() declaration.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators.
I believe this was introduced by commit 8cd850efa4
([RAW]: Cleanup IPv4 raw_seq_show.)
This trivial patch restores correct behavior, and applies to current
Linus tree (should also be applied to stable tree as well.)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Selected device feature bits can be propagated to VLAN devices, so we
can make use of TX checksum offload and TSO on VLAN-tagged packets.
However, if the physical device does not do VLAN tag insertion or
generic checksum offload then the test for TX checksum offload in
dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield
false.
This splits the checksum offload test into two functions:
- can_checksum_protocol() tests a given protocol against a feature bitmask
- dev_can_checksum() first tests the skb protocol against the device
features; if that fails and the protocol is htons(ETH_P_8021Q) then
it tests the encapsulated protocol against the effective device
features for VLANs
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, any time we set a primary transport we set
the changeover_active flag. As a result, we invoke SFR-CACC
even when there has been no changeover events.
Only set changeover_active, when there is a true changeover
event, i.e. we had a primary path and we are changing to
another transport.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove the proc fs entry which has been created if fail to
set up proc fs entry for the SCTP protocol.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo's system is still seeing strange behavior, and he
reports that is goes away if the rest of the deferred
accept changes are reverted too.
Therefore this reverts e4c7884028
("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and
539fae89be ("[TCP]: TCP_DEFER_ACCEPT
updates - defer timeout conflicts with max_thresh").
Just like the other revert, these ideas can be revisited for
2.6.27
Signed-off-by: David S. Miller <davem@davemloft.net>
We've introduced extra need of compat layer for ip_tunnel_prl{}
for PRL (Potential Router List) management. Though compat_ioctl
is still missing in ipv4/ipv6, let's make the interface more
straight-forward and eliminate extra need for nasty compat layer
anyway since the interface is new for 2.6.26.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a htb_hysteresis parameter to htb_sch.ko and by sysfs magic make
it runtime adjustable via
/sys/module/sch_htb/parameters/htb_hysteresis mode 640.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HTB hysteresis mode reduce the CPU load, but at the
cost of scheduling accuracy.
On ADSL links (512 kbit/s upstream), this inaccuracy introduce
significant jitter, enought to disturbe VoIP. For details see my
masters thesis (http://www.adsl-optimizer.dk/thesis/), chapter 7,
section 7.3.1, pp 69-70.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change struct proto destroy function pointer to return void. Noticed
by Al Viro.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In IBSS mode prior to join/creation of new IBSS it is possible that
a frame from unknown station is received and an ibss_add_sta() is
called. This will cause a warning in rate_lowest_index() since the
list of supported rates of our station is not initialized yet.
The fix is to add ibss stations with a rate we received that frame
at; this single-element set will be extended later based on beacon
data. Also there is no need to store stations from a foreign IBSS.
Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Also change the arguments of the phase1, 2 key mixing to take
a pointer to the encrytion key and the tkip_ctx in the same
order.
Do the dereference of the encryption key in the callers.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Take a __le16 directly rather than a host-endian value.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes setting beacon interval
1. in register_hw it honors value requested by the driver
2. It uses default 100 instead of 1000 or 10000. Scanning for beacon
interval ~1sec and above is not sane
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch denies the use of framentation while ampdu is used.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Implement missing EU regulatory domain for mac80211. Based on the
information in IEEE 802.11-2007 (specifically pages 1142, 1143 & 1148)
and ETSI 301 893 (V1.4.1).
With thanks to Johannes Berg.
Signed-off-by: Tony Vroon <tony@linx.net>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds '\n' in debug printk (wme.c HT DEBUG)
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The patch checks interface status, if it is in IBSS_JOINED mode
show cell id it is associated with.
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Revert 'process defer accept as established' changes.
ipv6: Fix duplicate initialization of rawv6_prot.destroy
bnx2x: Updating the Maintainer
net: Eliminate flush_scheduled_work() calls while RTNL is held.
drivers/net/r6040.c: correct bad use of round_jiffies()
fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP
ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list()
netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
netfilter: Make nflog quiet when no one listen in userspace.
ipv6: Fail with appropriate error code when setting not-applicable sockopt.
ipv6: Check IPV6_MULTICAST_LOOP option value.
ipv6: Check the hop limit setting in ancillary data.
ipv6 route: Fix route lifetime in netlink message.
ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
dccp: Bug in initial acknowledgment number assignment
dccp ccid-3: X truncated due to type conversion
dccp ccid-3: TFRC reverse-lookup Bug-Fix
dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
dccp: Fix sparse warnings
dccp ccid-3: Bug-Fix - Zero RTT is possible
This reverts two changesets, ec3c0982a2
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adb
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.
Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.
Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.
Next is a problem noticed by Vitaliy Gusev, he noted:
----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;
Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------
Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:
----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------
So revert this thing for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
In changeset 22dd485022
("raw: Raw socket leak.") code was added so that we
flush pending frames on raw sockets to avoid leaks.
The ipv4 part was fine, but the ipv6 part was not
done correctly. Unlike the ipv4 side, the ipv6 code
already has a .destroy method for rawv6_prot.
So now there were two assignments to this member, and
what the compiler does is use the last one, effectively
making the ipv6 parts of that changeset a NOP.
Fix this by removing the:
.destroy = inet6_destroy_sock,
line, and adding an inet6_destroy_sock() call to the
end of raw6_destroy().
Noticed by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:
BUG: unable to handle kernel paging request at 00120fbd
IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
[<c038e728>] nla_parse+0x5c/0xb0
[<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
[<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
[<c0119aee>] update_curr+0x3d/0x52
[<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
[<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
[<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
[<c038d2ce>] netlink_rcv_skb+0x2d/0x71
[<c0390205>] nfnetlink_rcv+0x19/0x24
[<c038d0f5>] netlink_unicast+0x1b3/0x216
...
Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875
Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message "nf_log_packet: can't log since no backend logging module loaded
in! Please either load one, or disable logging explicitly" was displayed for
each logged packet when no userspace application is listening to nflog events.
The message seems to warn for a problem with a kernel module missing but as
said before this is not the case. I thus propose to suppress the message (I
don't see any reason to flood the log because a user application has crashed.)
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
When specifing the outgoing hop limit as ancillary data for sendmsg(),
the kernel doesn't check the integer hop limit value as specified in
[RFC-3542] section 6.3.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
1) We may have route lifetime larger than INT_MAX.
In that case we had wired value in lifetime.
Use INT_MAX if lifetime does not fit in s32.
2) Lifetime is valid iif RTF_EXPIRES is set.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This pacth makes IPv6 address labels per network namespace.
It keeps the global label tables, ip6addrlbl_table, but
adds a 'net' member to each ip6addrlbl_entry.
This new member is taken into account when matching labels.
Changelog
=========
* v1: Initial version
* v2:
* Minize the penalty when network namespaces are not configured:
* the 'net' member is added only if CONFIG_NET_NS is
defined. This saves space when network namespaces are not
configured.
* 'net' value is retrieved with the inlined function
ip6addrlbl_net() that always return &init_net when
CONFIG_NET_NS is not defined.
* 'net' member in ip6addrlbl_entry renamed to the less generic
'lbl_net' name (helps code search).
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This inline function, for readability, returns if the route
is a "prefix" route regardless if it was installed by RA or by
hand.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
MRT6_VERSION should be used instead of MRT_VERSION in ip6mr.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patch removes MLDV2_QQIC macro from mcast.c
as it is unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
net: Fix routing tables with id > 255 for legacy software
sky2: Hold RTNL while calling dev_close()
s2io iomem annotations
atl1: fix suspend regression
qeth: start dev queue after tx drop error
qeth: Prepare-function to call s390dbf was wrong
qeth: reduce number of kernel messages
qeth: Use ccw_device_get_id().
qeth: layer 3 Oops in ip event handler
virtio: use callback on empty in virtio_net
virtio: virtio_net free transmit skbs in a timer
virtio: Fix typo in virtio_net_hdr comments
virtio_net: Fix skb->csum_start computation
ehea: set mac address fix
sfc: Recover from RX queue flush failure
add missing lance_* exports
ixgbe: fix typo
forcedeth: msi interrupts
ipsec: pfkey should ignore events when no listeners
pppoe: Unshare skb before anything else
...
Step 8.5 in RFC 4340 says for the newly cloned socket
Initialize S.GAR := S.ISS,
but what in fact the code (minisocks.c) does is
Initialize S.GAR := S.ISR,
which is wrong (typo?) -- fixed by the patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in computing the inter-packet-interval t_ipi = s/X:
scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
than 2^26 Bps (~537 Mbps).
Using full 64-bit division now.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).
The smallest tabulated value of
10^6 * f(p) = sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)
for p=0.0001 is 8172.
Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.
This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.
Also corrected the printk format (%u for u32).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes an oversight from an earlier patch, ensuring that Ack Vectors
are not processed on request sockets.
The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:
"Using CCID-specific options and feature options during a negotiation
for the corresponding CCID feature is NOT RECOMMENDED [...]"
And it is not even possible: when the server receives the Request from the
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch fixes the following sparse warnings:
* nested min(max()) expression:
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
* Declaration of function prototypes in .c instead of .h file, resulting in
"should it be static?" warnings.
* Declared "struct dccpw" static (local to dccp_probe).
* Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
("Receivers SHOULD implement delayed acknowledgement timers ...").
* Used a different local variable name to avoid
net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
net/dccp/ackvec.c:238:33: originally declared here
* Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
In commit $(825de27d9e) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.
Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.
All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.
The fix is to use the fallback-RTT from RFC 4340, 3.4.
This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).
This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes people happy who try to keep a list of addresses up to date by
listening to notifications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When pfkey has no km listeners, it still does a lot of work
before finding out there aint nobody out there.
If a tree falls in a forest and no one is around to hear it, does it make
a sound? In this case it makes a lot of noise:
With this short-circuit adding 10s of thousands of SAs using
netlink improves performance by ~10%.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- No need to perform data_len = 0 in the switch command, since data_len
is initialized to 0 in the beginning of the ipq_build_packet_message()
method.
- {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is
sure to be NULL when getting there; since skb is NULL, there is no need
to check this fact and call kfree_skb().
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a conntrack is destroyed, the connection status does not get
exported to netlink. I don't see a reason for not doing so. This patch
exports the status on all conntrack events.
Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the last packet of a connection isn't accounted when its causing
abnormal termination.
Introduces nf_ct_kill_acct() which increments the accounting counters on
conntrack kill. The new function was necessary, because there are calls
to nf_ct_kill() which don't need accounting:
nf_conntrack_proto_tcp.c line ~847:
Kills ct and returns NF_REPEAT. We don't want to count twice.
nf_conntrack_proto_tcp.c line ~880:
Kills ct and returns NF_DROP. I think we don't want to count dropped
packets.
nf_conntrack_netlink.c line ~824:
As far as I can see ctnetlink_del_conntrack() is used to destroy a
conntrack on behalf of the user. There is an sk_buff, but I don't think
this is an actual packet. Incrementing counters here is therefore not
desired.
Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Encapsulate the common
if (del_timer(&ct->timeout))
ct->timeout.function((unsigned long)ct)
sequence in a new function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ksize() API is going away because it is being abused and it doesn't even
work consistenly across different allocators. Therefore, convert
net/netfilter/nf_conntrack_extend.c to use krealloc().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a port of the IPv4 security table for IPv6.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch implements a new "security" table for iptables, so
that MAC (SELinux etc.) networking rules can be managed separately to
standard DAC rules.
This is to help with distro integration of the new secmark-based
network controls, per various previous discussions.
The need for a separate table arises from the fact that existing tools
and usage of iptables will likely clash with centralized MAC policy
management.
The SECMARK and CONNSECMARK targets will still be valid in the mangle
table to prevent breakage of existing users.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds full support for SCTP to ctnetlink. This includes three
new attributes: state, original vtag and reply vtag.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch groups ctnetlink errors into three logical sets:
* Malformed messages: if ctnetlink receives a message without some mandatory
attribute, then it returns EINVAL.
* Unsupported operations: if userspace tries to perform an unsupported
operation, then it returns EOPNOTSUPP.
* Unchangeable: if userspace tries to change some attribute of the
conntrack object that can only be set once, then it returns EBUSY.
This patch reduces the number of -EINVAL from 23 to 14 and it results in
5 -EBUSY and 6 -EOPNOTSUPP.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It implements matching functions for IPv6 address & traffic class
(merged from the patch sent by Jan Engelhardt [jengelh@computergmbh.de]
http://marc.info/?l=netfilter-devel&m=120182168424052&w=2), protocol,
and layer-4 port id. Corresponding watcher logging function is also
added for IPv6.
Signed-off-by: Kuo-lang Tseng <kuo-lang.tseng@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bindv6only is tuned via sysctl. It is already on a struct net
and per-net sysctls allow for its modification (ipv6_sysctl_net_init).
Despite this the value configured in the init net is used for the
rest of them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first 4 bytes of data to be sent are stored additionally into
the message class field of the send request. A receiving target
program (not an af_iucv socket program) can make use of this
information to pre-screen incoming messages.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code used preempt_disable() to prevent cpu hotplug, however that
doesn't protect for cpus being added. So use get_online_cpus() instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WARNING: net/iucv/built-in.o(.exit.text+0x9c): Section mismatch in
reference from the function iucv_exit() to the variable
.cpuinit.data:iucv_cpu_notifier
This warning is caused by a reference from unregister_hotcpu_notifier()
from an exit function to a cpuinitdata annotated data structurre.
This is a false positive warning since for the non CPU_HOTPLUG case
unregister_hotcpu_notifier() is a nop.
Use __refdata instead of __cpuinitdata to get rid of the warning.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default sack frequency should be 2. Also fix copy/paste
error when updating all transports.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a check to the set_channel flow. When attempting to change
the channel while in IBSS mode, and the new channel does not support IBSS
mode, the flow return with an error value with no consequences on the
mac80211 and driver state.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sufficient scans (at least 2 or 3) should have been done within 7
seconds to find an existing IBSS to join. This should improve IBSS
creation latency; and since IBSS merging is still in effect, shouldn't
have detrimental effects on eventual IBSS convergence.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes the issue of slow reconnection to an IBSS cell after
disconnection from it. Now the interface's bssid is reset upon ifdown.
ieee80211_sta_find_ibss:
if (found && memcmp(ifsta->bssid, bssid, ETH_ALEN) != 0 &&
(bss = ieee80211_rx_bss_get(dev, bssid,
local->hw.conf.channel->center_freq,
ifsta->ssid, ifsta->ssid_len)))
Note:
In general disconnection is still not handled properly in mac80211
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise userspace has no idea the IBSS creation succeeded.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- Don't trust a length which is greater than the working buffer.
An invalid length could cause overflow when calculating buffer size
for decoding oid.
- An oid length of zero is invalid and allows for an off-by-one error when
decoding oid because the first subid actually encodes first 2 subids.
- A primitive encoding may not have an indefinite length.
Thanks to Wei Wang from McAfee for report.
Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch defines a few new message header manipulation routines,
and generalizes the usefulness of another, in preparation for upcoming
rework of TIPC's message rejection code.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that TIPC doesn't try to access non-existent
message header fields when rejecting a message with a short header.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates several cases where message header fields
were being set to the same value twice.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch increases the "sequence gap" field of the LINK_PROTOCOL
message header from 8 bits to 13 bits (utilizing 5 previously
unused 0 bits). This ensures that the field is big enough to
indicate the loss of up to 8191 consecutive messages on the link,
thereby accommodating the current worst-case scenario of 4000
lost messages.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
tcp: Increment OUTRSTS in tcp_send_active_reset()
raw: Raw socket leak.
lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
ipw2200: expire and use oldest BSS on adhoc create
airo warning fix
b43legacy: Fix controller restart crash
sctp: Fix ECN markings for IPv6
sctp: Flush the queue only once during fast retransmit.
sctp: Start T3-RTX timer when fast retransmitting lowest TSN
sctp: Correctly implement Fast Recovery cwnd manipulations.
sctp: Move sctp_v4_dst_saddr out of loop
sctp: retran_path update bug fix
tcp: fix skb vs fack_count out-of-sync condition
sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
xfrm: xfrm_algo: correct usage of RIPEMD-160
...
This patch ensures that the display code that traverses the
publication lists belonging to a name table entry take its
associated spinlock, to protect against a possible change to
one of its "head of list" pointers caused by a simultaneous
name table lookup operation by another thread of control.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a check to prevent TIPC's name table display code
from listing a name type entry if it exists only to hold subscription
info, rather than published names.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates the rarely-used "error code" argument
when initializing a TIPC message header, since the default
value of zero is the desired result in most cases; the few
exceptional cases now set the error code explicitly.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates a case where TIPC's link code could try reading
a field that is not present in a short message header. (The random
value obtained was not being used, but the read operation could result
in an invalid memory access exception in extremely rare circumstances.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enhances TIPC's handler for incoming messages in two
ways:
- the trivial, single-use routine for processing non-sequenced
messages has been merged into the main handler
- the interface that received a message is now identified without
having to access and/or modify the associated sk_buff
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new, out-of-range value to indicate that
a link endpoint does not have an existing session established
with its peer, eliminating the risk that the previously used
"invalid session number" value (i.e. zero) might eventually be
assigned as a valid session number and cause incorrect link
behavior.
The patch also introduces explicit bit masking when assigning a
new link session number to ensure it does not exceed 16 bits.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects two problems in the display of error code
information in TIPC messages when debugging:
- no longer tries to display error code in NAME_DISTRIBUTOR
messages, which don't have the error field
- now displays error code in 24 byte data messages, which do
have the error field
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch re-orders & re-groups the error checks performed on
messages being delivered to native API ports, in order to clarify the
similarities and differences required for the various message types.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bug that prevented TIPC from receiving a
connection setup request message on a native TIPC port.
The revised connection setup logic ensures that validation
of the source of a connection-based message is skipped if
the port is not yet connected to a peer.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.
This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.
Based on idea and initial patch from Evgeniy Polyakov.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP "resets sent" counter is not incremented when a TCP Reset is
sent via tcp_send_active_reset().
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program below just leaks the raw kernel socket
int main() {
int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
inet_aton("127.0.0.1", &addr.sin_addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(2048);
sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr));
return 0;
}
Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
1) DSACK in the middle of previous SACK block must be generated.
2) In order to take that particular branch, part or all of the
DSACKed segment must already be SACKed so that we have that
in cache in the first place.
3) The new info must be top enough so that fackets_out will be
updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.
It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.
Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).
This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.
This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).
Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is not allowed to change underlying protocol for
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified. If the ipi6_ifindex is the not-exist interface,
it should be fail.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);
This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
helper functions: addrconf_timeout_fixup() and
addrconf_finite_timeout().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).
Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.
[Use unsigned int instead to generate better code - yoshfuji]
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.
Bug report and initial patch by Denis V. Lunev <den@openvz.org>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.
I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).
This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ 63.531438] =================================
[ 63.531520] [ INFO: inconsistent lock state ]
[ 63.531520] 2.6.26-rc4 #7
[ 63.531520] ---------------------------------
[ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[ 63.531520] {softirq-on-W} state was registered at:
[ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080
[ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0
[ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40
[ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...
According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4
Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field was supposed to allow the creation of an anycast route by
assigning an anycast address to an address prefix. It was never
implemented so this field is unused and serves no purpose. Remove it.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.
Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an unused policy entry for an attribute which is
only used in kernel->user direction.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an obsolete check for the unused flag RTCF_MASQ.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to compute copy twice in the frags loop in
dma_skb_copy_datagram_iovec().
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbor table time of last use information is returned in the
incorrect unit. Kernel to user space ABI's need to use USER_HZ (or
milliseconds), otherwise the application has to try and discover the
real system HZ value which is problematic. Linux has standardized on
keeping USER_HZ consistent (100hz) even when kernel is running
internally at some other value.
This change is small, but it breaks the ABI for older version of
iproute2 utilities. But these utilities are already broken since they
are looking at the psched_hz values which are completely different. So
let's just go ahead and fix both kernel and user space. Older
utilities will just print wrong values.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bad type/protocol specified result in sk leak.
Fix is simple - release the sk if bad values are given,
but to make it possible just to call sk_free(), I move
some sk initialization a bit lower.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Jarek Poplawski <jarkao2@gmail.com>
There is only one function in AX25 calling skb_append(), and it really
looks suspicious: appends skb after previously enqueued one, but in
the meantime this previous skb could be removed from the queue.
This patch Fixes it the simple way, so this is not fully compatible with
the current method, but testing hasn't shown any problems.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's logic in __rfcomm_dlc_close:
rfcomm_dlc_lock(d);
d->state = BT_CLOSED;
d->state_changed(d, err);
rfcomm_dlc_unlock(d);
In rfcomm_dev_state_change, it's possible that rfcomm_dev_put try to
take the dlc lock, then we will deadlock.
Here fixed it by unlock dlc before rfcomm_dev_get in
rfcomm_dev_state_change.
why not unlock just before rfcomm_dev_put? it's because there's
another problem. rfcomm_dev_get/rfcomm_dev_del will take
rfcomm_dev_lock, but in rfcomm_dev_add the lock order is :
rfcomm_dev_lock --> dlc lock
so I unlock dlc before the taken of rfcomm_dev_lock.
Actually it's a regression caused by commit
1905f6c736 ("bluetooth :
__rfcomm_dlc_close lock fix"), the dlc state_change could be two
callbacks : rfcomm_sk_state_change and rfcomm_dev_state_change. I
missed the rfcomm_sk_state_change that time.
Thanks Arjan van de Ven <arjan@linux.intel.com> for the effort in
commit 4c8411f8c1 ("bluetooth: fix
locking bug in the rfcomm socket cleanup handling") but he missed the
rfcomm_dev_state_change lock issue.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes doubly defined sband variable
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes unbalanced locking in ieee80211_get_buffered_bc
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
e039fa4a41 ("mac80211: move TX info into
skb->cb") misplaced code for setting hardware WEP keys. Move it back.
This fixes kernel panic in b43 if WEP is used and hardware encryption
is enabled.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In commit 2e92e6f2c5 ("mac80211: use rate
index in TX control") I forgot to initialise a few new variables to -1 which
means that the rate control algorithm is never triggered and 0 is used as
the only rate index, effectively fixing the transmit bitrate at the lowest
supported.
This patch adds the missing initialisation.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>