Commit Graph

9051 Commits

Author SHA1 Message Date
Patrick McHardy
ceeff7541e netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:

BUG: unable to handle kernel paging request at 00120fbd
IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP

Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
       00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
       00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
 [<c038e728>] nla_parse+0x5c/0xb0
 [<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
 [<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
 [<c0119aee>] update_curr+0x3d/0x52
 [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71
 [<c0390205>] nfnetlink_rcv+0x19/0x24
 [<c038d0f5>] netlink_unicast+0x1b3/0x216
 ...

Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875

Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 17:51:10 -07:00
Eric Leblond
b66985b11b netfilter: Make nflog quiet when no one listen in userspace.
The message "nf_log_packet: can't log since no backend logging module loaded
in! Please either load one, or disable logging explicitly" was displayed for
each logged packet when no userspace application is listening to nflog events.
The message seems to warn for a problem with a kernel module missing but as
said before this is not the case. I thus propose to suppress the message (I
don't see any reason to flood the log because a user application has crashed.)

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 17:50:27 -07:00
YOSHIFUJI Hideaki
1717699cd5 ipv6: Fail with appropriate error code when setting not-applicable sockopt.
IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:09 +09:00
YOSHIFUJI Hideaki
28d4488216 ipv6: Check IPV6_MULTICAST_LOOP option value.
Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:09 +09:00
Shan Wei
e8766fc86b ipv6: Check the hop limit setting in ancillary data.
When specifing the outgoing hop limit as ancillary data for sendmsg(),
the kernel doesn't check the integer hop limit value as specified in
[RFC-3542] section 6.3.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
36e3deae8b ipv6 route: Fix route lifetime in netlink message.
1) We may have route lifetime larger than INT_MAX.
In that case we had wired value in lifetime.
Use INT_MAX if lifetime does not fit in s32.

2) Lifetime is valid iif RTF_EXPIRES is set.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
20c61fbd8d ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
9501f97229 tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki
8d26d76dd4 tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6.
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki
076fb72233 tcp md5sig: Remove redundant protocol argument.
Protocol is always TCP, so remove useless protocol argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki
7d5d5525bd tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:18 +09:00
YOSHIFUJI Hideaki
81b302a321 key: Use xfrm_addr_cmp() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:17 +09:00
YOSHIFUJI Hideaki
5f95ac9111 key: Share common code path to extract address from sockaddr{}.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:17 +09:00
YOSHIFUJI Hideaki
e5b56652c1 key: Share common code path to fill sockaddr{}.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:16 +09:00
YOSHIFUJI Hideaki
9e8b4ed8bb key: Introduce pfkey_sockaddr_len() for raw sockaddr{} length.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:15 +09:00
Benjamin Thery
3de232554a ipv6 netns: Address labels per namespace
This pacth makes IPv6 address labels per network namespace.
It keeps the global label tables, ip6addrlbl_table, but
adds a 'net' member to each ip6addrlbl_entry.
This new member is taken into account when matching labels.

Changelog
=========
* v1: Initial version
* v2:
  * Minize the penalty when network namespaces are not configured:
      *  the 'net' member is added only if CONFIG_NET_NS is
         defined. This saves space when network namespaces are not
         configured.
      * 'net' value is retrieved with the inlined function
         ip6addrlbl_net() that always return &init_net when
         CONFIG_NET_NS is not defined.
  * 'net' member in ip6addrlbl_entry renamed to the less generic
    'lbl_net' name (helps code search).

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:15 +09:00
YOSHIFUJI Hideaki
2b5ead4644 ipv6 addrconf: Introduce addrconf_is_prefix_route() helper.
This inline function, for readability, returns if the route
is a "prefix" route regardless if it was installed by RA or by
hand.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:14 +09:00
Rami Rosen
7d120c55df ipv6 mroute: Use MRT6_VERSION instead of MRT_VERSION in ip6mr.c.
MRT6_VERSION should be used instead of MRT_VERSION in ip6mr.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:13 +09:00
Rami Rosen
9cba632e24 ipv6 mcast: Remove unused macro (MLDV2_QQIC) from mcast.c.
This patch removes  MLDV2_QQIC macro from mcast.c
as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:12 +09:00
Linus Torvalds
f7f866eed0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  net: Fix routing tables with id > 255 for legacy software
  sky2: Hold RTNL while calling dev_close()
  s2io iomem annotations
  atl1: fix suspend regression
  qeth: start dev queue after tx drop error
  qeth: Prepare-function to call s390dbf was wrong
  qeth: reduce number of kernel messages
  qeth: Use ccw_device_get_id().
  qeth: layer 3 Oops in ip event handler
  virtio: use callback on empty in virtio_net
  virtio: virtio_net free transmit skbs in a timer
  virtio: Fix typo in virtio_net_hdr comments
  virtio_net: Fix skb->csum_start computation
  ehea: set mac address fix
  sfc: Recover from RX queue flush failure
  add missing lance_* exports
  ixgbe: fix typo
  forcedeth: msi interrupts
  ipsec: pfkey should ignore events when no listeners
  pppoe: Unshare skb before anything else
  ...
2008-06-11 08:39:51 -07:00
Gerrit Renker
be4c798a41 dccp: Bug in initial acknowledgment number assignment
Step 8.5 in RFC 4340 says for the newly cloned socket

           Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

           Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:10 +01:00
Gerrit Renker
7deb0f8510 dccp ccid-3: X truncated due to type conversion
This fixes a bug in computing the inter-packet-interval t_ipi = s/X: 

 scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
 sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
 than 2^26 Bps (~537 Mbps).

Using full 64-bit division now.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:10 +01:00
Gerrit Renker
1e8a287c79 dccp ccid-3: TFRC reverse-lookup Bug-Fix
This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).

The smallest tabulated value of
	 
   10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p) 

for p=0.0001 is 8172. 

Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.

This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.

Also corrected the printk format (%u for u32).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:10 +01:00
Gerrit Renker
65907a433a dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
This fixes an oversight from an earlier patch, ensuring that Ack Vectors
are not processed on request sockets.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

 "Using CCID-specific options and feature options during a negotiation
  for the corresponding CCID feature is NOT RECOMMENDED [...]"

And it is not even possible: when the server receives the Request from the 
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:09 +01:00
Gerrit Renker
1e2f0e5e83 dccp: Fix sparse warnings
This patch fixes the following sparse warnings:
 * nested min(max()) expression:
   net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
   net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
   
 * Declaration of function prototypes in .c instead of .h file, resulting in
   "should it be static?" warnings. 

 * Declared "struct dccpw" static (local to dccp_probe).
 
 * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
   ("Receivers SHOULD implement delayed acknowledgement timers ...").

 * Used a different local variable name to avoid
   net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
   net/dccp/ackvec.c:238:33: originally declared here

 * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:09 +01:00
Gerrit Renker
3294f202dc dccp ccid-3: Bug-Fix - Zero RTT is possible
In commit $(825de27d9e) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.

Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.

All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.

The fix is to use the fallback-RTT from RFC 4340, 3.4.

This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11 11:19:09 +01:00
Krzysztof Piotr Oledzki
709772e6e0 net: Fix routing tables with id > 255 for legacy software
Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).

This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 15:44:49 -07:00
Thomas Graf
573bf470e6 ipv4 addr: Send netlink notification for address label changes
Makes people happy who try to keep a list of addresses up to date by
listening to notifications.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 15:40:04 -07:00
Jamal Hadi Salim
99c6f60e72 ipsec: pfkey should ignore events when no listeners
When pfkey has no km listeners, it still does a lot of work
before finding out there aint nobody out there.
If a tree falls in a forest and no one is around to hear it, does it make
a sound? In this case it makes a lot of noise:
With this short-circuit adding 10s of thousands of SAs using
netlink improves performance by ~10%.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 14:25:34 -07:00
Arnaldo Carvalho de Melo
ce4a7d0d48 inet{6}_request_sock: Init ->opt and ->pktopts in the constructor
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 12:39:35 -07:00
John W. Linville
9a727a250c net/mac80211/ieee80211_i.h: fix-up merge damage
These definitions were originally removed in "mac80211: remove channel
use statistics".

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-10 13:31:23 -04:00
David S. Miller
65b53e4cc9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/tg3.c
	drivers/net/wireless/rt2x00/rt2x00dev.c
	net/mac80211/ieee80211_i.h
2008-06-10 02:22:26 -07:00
David S. Miller
788c0a5316 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:

	drivers/net/ps3_gelic_wireless.c
	drivers/net/wireless/libertas/main.c
2008-06-10 01:54:31 -07:00
Rami Rosen
e64bda89b8 netfilter: {ip,ip6,nfnetlink}_queue: misc cleanups
- No need to perform data_len = 0 in the switch command, since data_len
  is initialized to 0 in the beginning of the ipq_build_packet_message()
  method.

- {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is
  sure to be NULL when getting there; since skb is NULL, there is no need
  to check this fact and call kfree_skb().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 16:00:45 -07:00
Fabian Hugelshofer
e57dce60c7 netfilter: ctnetlink: include conntrack status in destroy event message
When a conntrack is destroyed, the connection status does not get
exported to netlink. I don't see a reason for not doing so. This patch
exports the status on all conntrack events.

Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:58 -07:00
Fabian Hugelshofer
718d4ad98e netfilter: nf_conntrack: properly account terminating packets
Currently the last packet of a connection isn't accounted when its causing
abnormal termination.

Introduces nf_ct_kill_acct() which increments the accounting counters on
conntrack kill. The new function was necessary, because there are calls
to nf_ct_kill() which don't need accounting:

nf_conntrack_proto_tcp.c line ~847:
Kills ct and returns NF_REPEAT. We don't want to count twice.

nf_conntrack_proto_tcp.c line ~880:
Kills ct and returns NF_DROP. I think we don't want to count dropped
packets.

nf_conntrack_netlink.c line ~824:
As far as I can see ctnetlink_del_conntrack() is used to destroy a
conntrack on behalf of the user. There is an sk_buff, but I don't think
this is an actual packet. Incrementing counters here is therefore not
desired.

Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:40 -07:00
Patrick McHardy
51091764f2 netfilter: nf_conntrack: add nf_ct_kill()
Encapsulate the common

	if (del_timer(&ct->timeout))
		ct->timeout.function((unsigned long)ct)

sequence in a new function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:06 -07:00
Pekka Enberg
31d8519c9c netfilter: nf_conntrack_extend: use krealloc() in nf_conntrack_extend.c V2
The ksize() API is going away because it is being abused and it doesn't even
work consistenly across different allocators. Therefore, convert
net/netfilter/nf_conntrack_extend.c to use krealloc().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:58:39 -07:00
James Morris
17e6e59f0a netfilter: ip6_tables: add ip6tables security table
This is a port of the IPv4 security table for IPv6.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:58:05 -07:00
James Morris
560ee653b6 netfilter: ip_tables: add iptables security table for mandatory access control rules
The following patch implements a new "security" table for iptables, so
that MAC (SELinux etc.) networking rules can be managed separately to
standard DAC rules.

This is to help with distro integration of the new secmark-based
network controls, per various previous discussions.

The need for a separate table arises from the fact that existing tools
and usage of iptables will likely clash with centralized MAC policy
management.

The SECMARK and CONNSECMARK targets will still be valid in the mangle
table to prevent breakage of existing users.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:57:24 -07:00
Pablo Neira Ayuso
a258860e01 netfilter: ctnetlink: add full support for SCTP to ctnetlink
This patch adds full support for SCTP to ctnetlink. This includes three
new attributes: state, original vtag and reply vtag.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:56:39 -07:00
Pablo Neira Ayuso
0adf9d6748 netfilter: ctnetlink: group errors into logical errno sets
This patch groups ctnetlink errors into three logical sets:

* Malformed messages: if ctnetlink receives a message without some mandatory
attribute, then it returns EINVAL.
* Unsupported operations: if userspace tries to perform an unsupported
operation, then it returns EOPNOTSUPP.
* Unchangeable: if userspace tries to change some attribute of the
conntrack object that can only be set once, then it returns EBUSY.

This patch reduces the number of -EINVAL from 23 to 14 and it results in
5 -EBUSY and 6 -EOPNOTSUPP.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:56:20 -07:00
Kuo-lang Tseng
93f6515872 netfilter: ebtables: add IPv6 support
It implements matching functions for IPv6 address & traffic class
(merged from the patch sent by Jan Engelhardt [jengelh@computergmbh.de]
http://marc.info/?l=netfilter-devel&m=120182168424052&w=2), protocol,
and layer-4 port id. Corresponding watcher logging function is also
added for IPv6.

Signed-off-by: Kuo-lang Tseng <kuo-lang.tseng@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:55:45 -07:00
Pavel Emelyanov
2e761e0532 ipv6 netns: init net is used to set bindv6only for new sock
The bindv6only is tuned via sysctl. It is already on a struct net
and per-net sysctls allow for its modification (ipv6_sysctl_net_init).

Despite this the value configured in the init net is used for the
rest of them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:53:30 -07:00
Ursula Braun
469689a4dd af_iucv: exploit target message class support of IUCV
The first 4 bytes of data to be sent are stored additionally into
the message class field of the send request. A receiving target
program (not an af_iucv socket program) can make use of this
information to pre-screen incoming messages.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:51:03 -07:00
Heiko Carstens
7b9d1b22a3 iucv: prevent cpu hotplug when walking cpu_online_map.
The code used preempt_disable() to prevent cpu hotplug, however that
doesn't protect for cpus being added. So use get_online_cpus() instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:50:30 -07:00
Heiko Carstens
f1494ed1d3 iucv: fix section mismatch warning.
WARNING: net/iucv/built-in.o(.exit.text+0x9c): Section mismatch in
reference from the function iucv_exit() to the variable
.cpuinit.data:iucv_cpu_notifier

This warning is caused by a reference from unregister_hotcpu_notifier()
from an exit function to a cpuinitdata annotated data structurre.
This is a false positive warning since for the non CPU_HOTPLUG case
unregister_hotcpu_notifier() is a nop.
Use __refdata instead of __cpuinitdata to get rid of the warning.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:49:57 -07:00
Vlad Yasevich
7bfe8bdb80 sctp: Fix problems with the new SCTP_DELAYED_ACK code
The default sack frequency should be 2.  Also fix copy/paste
error when updating all transports.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:45:05 -07:00
Assaf Krauss
be038b3764 mac80211: Checking IBSS support while changing channel in ad-hoc mode
This patch adds a check to the set_channel flow. When attempting to change
the channel while in IBSS mode, and the new channel does not support IBSS
mode, the flow return with an error value with no consequences on the
mac80211 and driver state.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-09 15:53:37 -04:00
Dan Williams
872ba53395 mac80211: decrease IBSS creation latency
Sufficient scans (at least 2 or 3) should have been done within 7
seconds to find an existing IBSS to join.  This should improve IBSS
creation latency; and since IBSS merging is still in effect, shouldn't
have detrimental effects on eventual IBSS convergence.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-09 15:51:26 -04:00
Assaf Krauss
ad81b2f97d mac80211: Fixing slow IBSS rejoin
This patch fixes the issue of slow reconnection to an IBSS cell after
disconnection from it. Now the interface's bssid is reset upon ifdown.

ieee80211_sta_find_ibss:
if (found && memcmp(ifsta->bssid, bssid, ETH_ALEN) != 0 &&
	    (bss = ieee80211_rx_bss_get(dev, bssid,
					local->hw.conf.channel->center_freq,
					ifsta->ssid, ifsta->ssid_len)))

Note:
In general disconnection is still not handled properly in mac80211

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-09 15:50:20 -04:00
Dan Williams
507b06d062 mac80211: send association event on IBSS create
Otherwise userspace has no idea the IBSS creation succeeded.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-09 15:50:19 -04:00
Chris Wright
ddb2c43594 asn1: additional sanity checking during BER decoding
- Don't trust a length which is greater than the working buffer.
  An invalid length could cause overflow when calculating buffer size
  for decoding oid.

- An oid length of zero is invalid and allows for an off-by-one error when
  decoding oid because the first subid actually encodes first 2 subids.

- A primitive encoding may not have an indefinite length.

Thanks to Wei Wang from McAfee for report.

Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-05 14:24:54 -07:00
Denis V. Lunev
9457afee85 netlink: Remove nonblock parameter from netlink_attachskb
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-05 11:23:39 -07:00
Allan Stephens
40aecb1b13 tipc: Message rejection rework preparatory changes
This patch defines a few new message header manipulation routines,
and generalizes the usefulness of another, in preparation for upcoming
rework of TIPC's message rejection code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:54:48 -07:00
Allan Stephens
99c145939b tipc: Fix bugs in rejection of message with short header
This patch ensures that TIPC doesn't try to access non-existent
message header fields when rejecting a message with a short header.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:48:25 -07:00
Allan Stephens
9bef54383d tipc: Message header creation optimizations
This patch eliminates several cases where message header fields
were being set to the same value twice.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:47:55 -07:00
Allan Stephens
bd7845337b tipc: Expand link sequence gap field to 13 bits
This patch increases the "sequence gap" field of the LINK_PROTOCOL
message header from 8 bits to 13 bits (utilizing 5 previously
unused 0 bits).  This ensures that the field is big enough to
indicate the loss of up to 8191 consecutive messages on the link,
thereby accommodating the current worst-case scenario of 4000
lost messages.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:47:30 -07:00
Linus Torvalds
3e387fcdc4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
  tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
  tcp: Increment OUTRSTS in tcp_send_active_reset()
  raw: Raw socket leak.
  lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
  USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
  ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
  libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
  ipw2200: expire and use oldest BSS on adhoc create
  airo warning fix
  b43legacy: Fix controller restart crash
  sctp: Fix ECN markings for IPv6
  sctp: Flush the queue only once during fast retransmit.
  sctp: Start T3-RTX timer when fast retransmitting lowest TSN
  sctp: Correctly implement Fast Recovery cwnd manipulations.
  sctp: Move sctp_v4_dst_saddr out of loop
  sctp: retran_path update bug fix
  tcp: fix skb vs fack_count out-of-sync condition
  sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
  xfrm: xfrm_algo: correct usage of RIPEMD-160
  ...
2008-06-04 17:39:33 -07:00
Allan Stephens
307fdf5e7d tipc: Add missing spinlock in name table display code
This patch ensures that the display code that traverses the
publication lists belonging to a name table entry take its
associated spinlock, to protect against a possible change to
one of its "head of list" pointers caused by a simultaneous
name table lookup operation by another thread of control.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:38:22 -07:00
Allan Stephens
0f15d36453 tipc: Prevent display of name table types with no publications
This patch adds a check to prevent TIPC's name table display code
from listing a name type entry if it exists only to hold subscription
info, rather than published names.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:37:59 -07:00
Allan Stephens
7571521756 tipc: Optimize message initialization routine
This patch eliminates the rarely-used "error code" argument
when initializing a TIPC message header, since the default
value of zero is the desired result in most cases; the few
exceptional cases now set the error code explicitly.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:37:34 -07:00
Allan Stephens
9c396a7bfb tipc: Prevent access of non-existent field in short message header
This patch eliminates a case where TIPC's link code could try reading
a field that is not present in a short message header.  (The random
value obtained was not being used, but the read operation could result
in an invalid memory access exception in extremely rare circumstances.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:36:58 -07:00
Allan Stephens
1265a02108 tipc: Minor optimizations to received message processing
This patch enhances TIPC's handler for incoming messages in two
ways:
- the trivial, single-use routine for processing non-sequenced
  messages has been merged into the main handler
- the interface that received a message is now identified without
  having to access and/or modify the associated sk_buff

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:32:35 -07:00
Allan Stephens
a686e6859e tipc: Fix minor bugs in link session number handling
This patch introduces a new, out-of-range value to indicate that
a link endpoint does not have an existing session established
with its peer, eliminating the risk that the previously used
"invalid session number" value (i.e. zero) might eventually be
assigned as a valid session number and cause incorrect link
behavior.

The patch also introduces explicit bit masking when assigning a
new link session number to ensure it does not exceed 16 bits.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:29:39 -07:00
Allan Stephens
e0d4e3d0d7 tipc: Fix bugs in message error code display when debugging
This patch corrects two problems in the display of error code
information in TIPC messages when debugging:
- no longer tries to display error code in NAME_DISTRIBUTOR
  messages, which don't have the error field
- now displays error code in 24 byte data messages, which do
  have the error field

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:29:09 -07:00
Allan Stephens
5307e46957 tipc: Standardize error checking on incoming messages via native API
This patch re-orders & re-groups the error checks performed on
messages being delivered to native API ports, in order to clarify the
similarities and differences required for the various message types.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:28:45 -07:00
Allan Stephens
84b07c1638 tipc: Fix bug in connection setup via native API
This patch fixes a bug that prevented TIPC from receiving a
connection setup request message on a native TIPC port.
The revised connection setup logic ensures that validation
of the source of a connection-based message is skipped if
the port is not yet connected to a peer.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 17:28:21 -07:00
Octavian Purdila
293ad60401 tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.

This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.

Based on idea and initial patch from Evgeniy Polyakov.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 15:45:58 -07:00
Sridhar Samudrala
26af65cbeb tcp: Increment OUTRSTS in tcp_send_active_reset()
TCP "resets sent" counter is not incremented when a TCP Reset is 
sent via tcp_send_active_reset().

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 15:19:35 -07:00
Denis V. Lunev
22dd485022 raw: Raw socket leak.
The program below just leaks the raw kernel socket

int main() {
        int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
        struct sockaddr_in addr;

        memset(&addr, 0, sizeof(addr));
        inet_aton("127.0.0.1", &addr.sin_addr);
        addr.sin_family = AF_INET;
        addr.sin_port = htons(2048);
        sendto(fd,  "a", 1, MSG_MORE, &addr, sizeof(addr));
        return 0;
}

Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 15:16:12 -07:00
Vlad Yasevich
b9031d9d87 sctp: Fix ECN markings for IPv6
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6.  As a
result, SCTP was not marking ECN capablity because the traffic class
was never set.  This patch brings back the markings for IPv6 traffic.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:40:15 -07:00
Vlad Yasevich
8b750ce54b sctp: Flush the queue only once during fast retransmit.
When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens.  Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:39:36 -07:00
Vlad Yasevich
62aeaff5cc sctp: Start T3-RTX timer when fast retransmitting lowest TSN
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:39:11 -07:00
Vlad Yasevich
a646523481 sctp: Correctly implement Fast Recovery cwnd manipulations.
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:38:43 -07:00
Gui Jianfeng
159c6bea37 sctp: Move sctp_v4_dst_saddr out of loop
There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:38:07 -07:00
Gui Jianfeng
4141ddc02a sctp: retran_path update bug fix
If the current retran_path is the only active one, it should
update it to the the next inactive one.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:37:33 -07:00
David S. Miller
aed5a833fb Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix 2008-06-04 12:10:21 -07:00
Ilpo Järvinen
a6604471db tcp: fix skb vs fack_count out-of-sync condition
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
  1) DSACK in the middle of previous SACK block must be generated.
  2) In order to take that particular branch, part or all of the
     DSACKed segment must already be SACKed so that we have that
     in cache in the first place.
  3) The new info must be top enough so that fackets_out will be
     updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.

It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.

Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).

This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.

This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing & 
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:07:44 -07:00
Adrian-Ken Rueegsegger
a13366c632 xfrm: xfrm_algo: correct usage of RIPEMD-160
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).

Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:04:55 -07:00
Denis V. Lunev
9596cc826e [IPV6]: Do not change protocol for UDPv6 sockets with pending sent data.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:38 +09:00
Denis V. Lunev
36d926b94a [IPV6]: inet_sk(sk)->cork.opt leak
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:38 +09:00
Denis V. Lunev
49d074f400 [IPV6]: Do not change protocol for raw IPv6 sockets.
It is not allowed to change underlying protocol for
   int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:37 +09:00
YOSHIFUJI Hideaki
91e1908f56 [IPV6] NETNS: Handle ancillary data in appropriate namespace.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:36 +09:00
YOSHIFUJI Hideaki
187e38384c [IPV6]: Check outgoing interface even if source address is unspecified.
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified.  If the ipi6_ifindex is the not-exist interface,
it should be fail.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:35 +09:00
Yang Hongyang
95b496b666 [IPV6]: Fix the data length of get destination options with short length
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
 This is because ipv6_getsockopt_sticky() returns the real length of
the option.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:35 +09:00
Yang Hongyang
05335c2220 [IPV6]: Fix the return value of get destination options with NULL data pointer
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
    getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);

This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:34 +09:00
YOSHIFUJI Hideaki
4bed72e4f5 [IPV6] ADDRCONF: Allow longer lifetime on 64bit archs.
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
  by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
  helper functions: addrconf_timeout_fixup() and
  addrconf_finite_timeout().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:34 +09:00
YOSHIFUJI Hideaki
baa2bfb8ae [IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:33 +09:00
Colin
8283637231 [IPV6] TUNNEL6: Fix incoming packet length check for inter-protocol tunnel.
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:32 +09:00
Thomas Graf
24ef0da7b8 [IPV6] ADDRCONF: Check range of prefix length
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).

Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.

[Use unsigned int instead to generate better code - yoshfuji]

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:31 +09:00
YOSHIFUJI Hideaki
a3c960899e [IPV6] UDP: Possible dst leak in udpv6_sendmsg.
ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.

Bug report and initial patch by Denis V. Lunev <den@openvz.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>
2008-06-05 04:02:31 +09:00
YOSHIFUJI Hideaki
e51171019b [SCTP]: Fix NULL dereference of asoc.
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:30 +09:00
Ilpo Järvinen
8aca6cb117 tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp))
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.

I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).

This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 11:34:22 -07:00
Jarek Poplawski
b9c6989646 netfilter: nf_conntrack_ipv6: fix inconsistent lock state in nf_ct_frag6_gather()
[   63.531438] =================================
[   63.531520] [ INFO: inconsistent lock state ]
[   63.531520] 2.6.26-rc4 #7
[   63.531520] ---------------------------------
[   63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   63.531520]  (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[   63.531520] {softirq-on-W} state was registered at:
[   63.531520]   [<c0143bba>] __lock_acquire+0x3aa/0x1080
[   63.531520]   [<c0144906>] lock_acquire+0x76/0xa0
[   63.531520]   [<c07a8f0b>] _spin_lock+0x2b/0x40
[   63.531520]   [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
 ...

According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.

Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 09:58:27 -07:00
Dong Wei
d2ee3f2c4b netfilter: xt_connlimit: fix accouning when receive RST packet in ESTABLISHED state
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4

Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 09:57:51 -07:00
Al Viro
d430a227d2 bogus format in ip6mr
ptrdiff_t is %t..., not %Z...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-04 08:06:02 -07:00
Thomas Graf
ab32cd793d route: Remove unused ifa_anycast field
The field was supposed to allow the creation of an anycast route by
assigning an anycast address to an address prefix. It was never
implemented so this field is unused and serves no purpose. Remove it.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:37:33 -07:00
Thomas Graf
bc3ed28caa netlink: Improve returned error codes
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:36:54 -07:00
Thomas Graf
1f9d11c7c9 route: Mark unused routing attributes as such
Also removes an unused policy entry for an attribute which is
only used in kernel->user direction.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:36:27 -07:00