android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Luis Carlos Cobo	8dbc1722a7	mac80211: keep mesh ifaces in allmulti mode Currently a mesh node will not forward a multicast frame if it is not subscribed to the specific multicast address. This patch addresses the issue and fixes mesh multicast forwarding. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-07 09:49:04 -04:00
Luis Carlos Cobo	e32f85f7b9	mac80211: fix use of skb->cb for mesh forwarding Now we deal with mesh forwarding before the 802.11->802.3 conversion, thus eliminating a few unnecessary steps. The next hop lookup is called from ieee80211_master_start_xmit() instead of subif_start_xmit(). Until the next hop is found, RA in the frame will be all zeroes for frames originating from the device. For forwarded frames, RA will contain the TA of the received frame, which will be necessary to send a path error if a next hop is not found. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-07 09:49:04 -04:00
Robert Olsson	e6fce5b916	pktgen: multiqueue etc. Sofar far pktgen have had a restriction to only use one device per kernel thread. With the new multiqueue architecture this is no longer adequate. The patch below is an effort to remove this by in pktgen configuration adding a tag to the device name a la eth0@0 etc. The tag is used for usual device config just as before. Also a new flag is introduced to mirror queue_map with sending threads smp_processor_id() QUEUE_MAP_CPU. An example: We use 4 CPU's to send to one 10g interface (eth0) and we use the new tagging to send a mix of packet sizes, 64, 576 and 1500 bytes. Also we use TX queues according to smp_processor_id() PGDEV=/proc/net/pktgen/kpktgend_0 pgset "add_device eth0@0" PGDEV=/proc/net/pktgen/kpktgend_1 pgset "add_device eth0@1" PGDEV=/proc/net/pktgen/kpktgend_2 pgset "add_device eth0@2" PGDEV=/proc/net/pktgen/kpktgend_3 pgset "add_device eth0@3" .... PGDEV=/proc/net/pktgen/eth0@0 pgset "pkt_size 64" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@1 pgset "pkt_size 572" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@2 pgset "pkt_size 1496" PGDEV=/proc/net/pktgen/eth0@3 pgset "pkt_size 1496" pgset "flag QUEUE_MAP_CPU" Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 02:23:01 -07:00
David S. Miller	32bb93b02d	Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	2008-08-07 02:10:27 -07:00
Jeff Garzik	3859069bc3	Merge branch 'for-jeff' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 into tmp	2008-08-07 04:05:46 -04:00
Joe Eykholt	f982307f22	net/core: Allow receive on active slaves. If a packet_type specifies an active slave to bonding and not just any interface, allow it to receive frames that came in on that interface. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 04:00:01 -04:00
Joe Eykholt	0d7a368123	net/core: Allow certain receives on inactive slave. Allow a packet_type that specifies the exact device to receive even on an inactive bonding slave devices. This is important for some L2 protocols such as LLDP and FCoE. This can eventually be used for the bonding special cases as well. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 03:59:59 -04:00
Joe Eykholt	cc9bd5cebc	net/core: Uninline skb_bond(). Otherwise subsequent changes need multiple return values. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 03:59:58 -04:00
Gui Jianfeng	6edafaaf6f	tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup If the following packet flow happen, kernel will panic. MathineA MathineB SYN ----------------------> SYN+ACK <---------------------- ACK(bad seq) ----------------------> When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr)) is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is NULL at that moment, so kernel panic happens. This patch fixes this bug. OOPS output is as following: [ 302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 [ 302.817075] Oops: 0000 [#1] SMP [ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] [ 302.849946] [ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5) [ 302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0 [ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42 [ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046 [ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54 [ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000) [ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 [ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 [ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 [ 302.900140] Call Trace: [ 302.902392] [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35 [ 302.907060] [<c05d28f8>] tcp_check_req+0x156/0x372 [ 302.910082] [<c04250a3>] printk+0x14/0x18 [ 302.912868] [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf [ 302.917423] [<c05d26be>] tcp_v4_rcv+0x563/0x5b9 [ 302.920453] [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183 [ 302.923865] [<c05bb10a>] ip_rcv_finish+0x286/0x2a3 [ 302.928569] [<c059e438>] dev_alloc_skb+0x11/0x25 [ 302.931563] [<c05a211f>] netif_receive_skb+0x2d6/0x33a [ 302.934914] [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32] [ 302.938735] [<c05a3b48>] net_rx_action+0x5c/0xfe [ 302.941792] [<c042856b>] __do_softirq+0x5d/0xc1 [ 302.944788] [<c042850e>] __do_softirq+0x0/0xc1 [ 302.948999] [<c040564b>] do_softirq+0x55/0x88 [ 302.951870] [<c04501b1>] handle_fasteoi_irq+0x0/0xa4 [ 302.954986] [<c04284da>] irq_exit+0x35/0x69 [ 302.959081] [<c0405717>] do_IRQ+0x99/0xae [ 302.961896] [<c040422b>] common_interrupt+0x23/0x28 [ 302.966279] [<c040819d>] default_idle+0x2a/0x3d [ 302.969212] [<c0402552>] cpu_idle+0xb2/0xd2 [ 302.972169] ======================= [ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 [ 303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54 [ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:50:04 -07:00
David S. Miller	ee7af8264d	pkt_sched: Fix "parent is root" test in qdisc_create(). As noticed by Stephen Hemminger, the root qdisc is denoted by TC_H_ROOT, not zero. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:35:59 -07:00
David S. Miller	11d46123bf	ipv4: Fix over-ifdeffing of ip_static_sysctl_init. Noticed by Paulius Zaleckas. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 18:30:43 -07:00
Joakim Koskela	abf5cdb89d	ipsec: Interfamily IPSec BEET, ipv4-inner ipv6-outer Here's a revised version, based on Herbert's comments, of a fix for the ipv4-inner, ipv6-outer interfamily ipsec beet mode. It fixes the network header adjustment during interfamily, as well as makes sure that we reserve enough room for the new ipv6 header if we might have something else as the inner family. Also, the ipv4 pseudo header construction was added. Signed-off-by: Joakim Koskela <jookos@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:40:25 -07:00
Joakim Koskela	eb49e63093	ipsec: Interfamily IPSec BEET Here's a revised version, based on Herbert's comments, of a fix for the ipv6-inner, ipv4-outer interfamily ipsec beet mode. It fixes the network header adjustment in interfamily, and doesn't reserve space for the pseudo header anymore when we have ipv6 as the inner family. Signed-off-by: Joakim Koskela <jookos@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:39:30 -07:00
Krzysztof Piotr Oledzki	9714be7da8	netfilter: fix two recent sysctl problems Starting with `9043476f72` ("[PATCH] sanitize proc_sysctl") we have two netfilter releated problems: - WARNING: at kernel/sysctl.c:1966 unregister_sysctl_table+0xcc/0x103(), caused by wrong order of ini/fini calls - net.netfilter is duplicated and has truncated set of records Thanks to very useful guidelines from Al Viro, this patch fixes both of them. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:35:44 -07:00
Rami Rosen	1ca615fb81	ipv6: replace dst_metric() with dst_mtu() in net/ipv6/route.c. This patch replaces dst_metric() with dst_mtu() in net/ipv6/route.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:34:21 -07:00
Rami Rosen	6d273f8d01	ipv4: replace dst_metric() with dst_mtu() in net/ipv4/route.c. This patch replaces dst_metric() with dst_mtu() in net/ipv4/route.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:33:49 -07:00
Ralf Baechle	ffb208479b	AX.25: Fix sysctl registration if !CONFIG_AX25_DAMA_SLAVE Since `49ffcf8f99` ("sysctl: update sysctl_check_table") setting struct ctl_table.procname = NULL does no longer work as it used to the way the AX.25 code is expecting it to resulting in the AX.25 sysctl registration code to break if CONFIG_AX25_DAMA_SLAVE was not set as in some distribution kernels. Kernel releases from 2.6.24 are affected. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:46:57 -07:00
Robert Olsson	ff2a79a5a9	pktgen: mac count dst_mac_count and src_mac_count patch from Eneas Hunguana We have sent one mac address to much. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:45:05 -07:00
Robert Olsson	1211a64554	pktgen: random flow Random flow generation has not worked. This fixes it. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:44:26 -07:00
Stephen Hemminger	ef647f1300	bridge: Eliminate unnecessary forward delay From: Stephen Hemminger <shemminger@vyatta.com> Based upon original patch by Herbert Xu, which contained the following problem description: -------------------- When the forward delay is set to zero, we still delay the setting of the forwarding state by one or possibly two timers depending on whether STP is enabled. This could either turn out to be instantaneous, or horribly slow depending on the load of the machine. As there is nothing preventing us from enabling forwarding straight away, this patch eliminates this potential delay by executing the code directly if the forward delay is zero. The effect of this problem is that immediately after the carrier comes on a port, the bridge will drop all packets received from that port until it enters forwarding mode, thus causing unnecessary packet loss. Note that this patch doesn't fully remove the delay due to the link watcher. We should also check the carrier state when we are about to drop an incoming packet because the port is disabled. But that's for another patch. -------------------- This version of the fix takes a different approach, in that it just does the state change directly. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:42:51 -07:00
David S. Miller	33e334950a	Merge branch 'no-ath9k' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-05 01:28:35 -07:00
Rami Rosen	ad619800e4	bridge: fix compile warning in net/bridge/br_netfilter.c This patch fixes the following warning due to incompatible pointer assignment: net/bridge/br_netfilter.c: In function 'br_netfilter_rtable_init': net/bridge/br_netfilter.c:116: warning: assignment from incompatible pointer type This warning is due to commit `4adf0af681` from July 30 (send correct MTU value in PMTU (revised)). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 01:21:22 -07:00
Jarek Poplawski	c27f339af9	net_sched: Add qdisc __NET_XMIT_BYPASS flag Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:39:11 -07:00
Jarek Poplawski	378a2f090f	net_sched: Add qdisc __NET_XMIT_STOLEN flag Patrick McHardy <kaber@trash.net> noticed: "The other problem that affects all qdiscs supporting actions is TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS even though the packet is not queued, corrupting upper qdiscs' qlen counters." and later explained: "The reason why it translates it at all seems to be to not increase the drops counter. Within a single qdisc this could be avoided by other means easily, upper qdiscs would still increase the counter when we return anything besides NET_XMIT_SUCCESS though. This means we need a new NET_XMIT return value to indicate this to the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN, return that to upper qdiscs and translate it to NET_XMIT_SUCCESS in dev_queue_xmit, similar to NET_XMIT_BYPASS." David Miller <davem@davemloft.net> noticed: "Maybe these NET_XMIT_* values being passed around should be a set of bits. They could be composed of base meanings, combined with specific attributes. So you could say "NET_XMIT_DROP \| __NET_XMIT_NO_DROP_COUNT" The attributes get masked out by the top-level ->enqueue() caller, such that the base meanings are the only thing that make their way up into the stack. If it's only about communication within the qdisc tree, let's simply code it that way." This patch is trying to realize these ideas. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:31:03 -07:00
Tomas Winkler	4c43e0d0ec	iwlwifi: HW bug fixes This patch adds few HW bug fixes. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:12 -04:00
Daniel Drake	80693ceb78	mac80211: automatic IBSS channel selection When joining an ad-hoc network, the user is currently required to specify the channel. The network will not be joined otherwise, unless it happens to be sitting on the currently active channel. This patch implements automatic channel selection when the user has not locked the interface onto a specific channel. Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:10 -04:00
Tomas Winkler	ea95bba41e	mac80211: make listen_interval be limited by low level driver This patch makes possible for a driver to specify maximal listen interval The possibility for user to configure listen interval is not implemented yet, currently the maximum provided by the driver or 1 is used. Mac80211 uses config handler to set listen interval for to the driver. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:07 -04:00
Emmanuel Grumbach	98f7dfd86c	mac80211: pass dtim_period to low level driver This patch adds the dtim_period in ieee80211_bss_conf, this allows the low level driver to know the dtim_period, and to plan power save accordingly. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:07 -04:00
Stephen Hemminger	6e583ce524	net: eliminate refcounting in backlog queue Avoid the overhead of atomic increment/decrement on each received packet. This helps performance of non-NAPI devices (like loopback). Use cleanup function to walk queue on each cpu and clean out any left over packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:29:57 -07:00
Wei Yongjun	283d07ac20	ipv6: Do not drop packet if skb->local_df is set to true The old code will drop IPv6 packet if ipfragok is not set, since ipfragok is obsoleted, will be instead by used skb->local_df, so this check must be changed to skb->local_df. This patch fix this problem and not drop packet if skb->local_df is set to true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:15:59 -07:00
Herbert Xu	f880374c2f	sctp: Drop ipfargok in sctp_xmit function The ipfragok flag controls whether the packet may be fragmented either on the local host on beyond. The latter is only valid on IPv4. In fact, we never want to do the latter even on IPv4 when PMTU is enabled. This is because even though we can't fragment packets within SCTP due to the prtocol's inherent faults, we can still fragment it at IP layer. By setting the DF bit we will improve the PMTU process. RFC 2960 only says that we SHOULD clear the DF bit in this case, so we're compliant even if we set the DF bit. In fact RFC 4960 no longer has this statement. Once we make this change, we only need to control the local fragmentation. There is already a bit in the skb which controls that, local_df. So this patch sets that instead of using the ipfragok argument. The only complication is that there isn't a struct sock object per transport, so for IPv4 we have to resort to changing the pmtudisc field for every packet. This should be safe though as the protocol is single-threaded. Note that after this patch we can remove ipfragok from the rest of the stack too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:15:08 -07:00
Yang Hongyang	cfb266c0ee	ipv6: Fix the return value of Set Hop-by-Hop options header with NULL data pointer When Set Hop-by-Hop options header with NULL data pointer and optlen is not zero use setsockopt(), the kernel successfully return 0 instead of return error EINVAL or EFAULT. This patch fix the problem. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 18:16:15 -07:00
Florian Westphal	1730554f25	ipv6: syncookies: free reqsk on xfrm_lookup error cookie_v6_check() did not call reqsk_free() if xfrm_lookup() fails, leaking the request sock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 18:13:44 -07:00
Sven Wegener	adf044c877	net: Add missing extra2 parameter for ip_default_ttl sysctl Commit `76e6ebfb40` ("netns: add namespace parameter to rt_cache_flush") acceses the extra2 parameter of the ip_default_ttl ctl_table, but it is never set to a meaningful value. When `e84f84f276` ("netns: place rt_genid into struct net") is applied, we'll oops in rt_cache_invalidate(). Set extra2 to init_net, to avoid that. Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 14:06:44 -07:00
Lennert Buytenhek	e5a4a72d4f	net: use software GSO for SG+CSUM capable netdevices If a netdevice does not support hardware GSO, allowing the stack to use GSO anyway and then splitting the GSO skb into MSS-sized pieces as it is handed to the netdevice for transmitting is likely still a win as far as throughput and/or CPU usage are concerned, since it reduces the number of trips through the output path. This patch enables the use of GSO on any netdevice that supports SG. If a GSO skb is then sent to a netdevice that supports SG but does not support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take care of doing the necessary GSO segmentation in software. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:23:10 -07:00
Chris Larson	745e203164	net: fix missing pneigh entries in the neighbor seq_file code When pneigh entries exist, but the user's read buffer isn't sufficient to hold them all, one of the pneigh entries will be missing from the results. In neigh_get_idx_any, the number of elements which neigh_get_idx encountered is not correctly subtracted from the position number before the call to pneigh_get_idx. neigh_get_idx reduces the position by 1 for each call to neigh_get_next, but it does not reduce it by one for the first element (neigh_get_first). The patch alters the neigh_get_idx and pneigh_get_idx functions to subtract one from pos, for the first element, when pos is non-zero. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:10:55 -07:00
Chris Larson	bff69732c9	net: in the first call to neigh_seq_next, call neigh_get_first, not neigh_get_idx. neigh_seq_next won't be called both with *pos > 0 && v == SEQ_START_TOKEN, so there's no point calling neigh_get_idx when we're on the start token, just call neigh_get_first directly. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:02:41 -07:00
David S. Miller	35ed4e7598	mac80211: Use queue_lock() in ieee80211_ht_agg_queue_remove(). qdisc_root_lock() is only %100 safe to use when the RTNL semaphore is held. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 23:25:50 -07:00
David S. Miller	5fb662297b	pkt_sched: Use qdisc_lock() on already sampled root qdisc. Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 20:02:43 -07:00
David S. Miller	e9e80ea5f2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-01 22:08:51 -07:00
Tomas Winkler	f8e79ddd31	mac80211: fix fragmentation kludge This patch make mac80211 transmit correctly fragmented packet after queue was stopped Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Dmitry Baryshkov	96185664f1	RFKILL: set the status of the leds on activation. Provide default activate function to set the state of the led when the led becomes bound to the trigger Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Dmitry Baryshkov	7c4f4578fc	RFKILL: allow one to specify led trigger name Allow the rfkill driver to specify led trigger name. By default it still defaults to the name of rfkill switch. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Henrique de Moraes Holschuh	6e28fbef0f	rfkill: query EV_SW states when rfkill-input (re)?connects to a input device Every time a new input device that is capable of one of the rfkill EV_SW events (currently only SW_RFKILL_ALL) is connected to rfkill-input, we must check the states of the input EV_SW switches and take action. Otherwise, we will ignore the initial switch state. We also need to re-check the states of the EV_SW switches after a device that was under an exclusive grab is released back to us, since we got no input events from that device while it was grabbed. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:32 -04:00
Linus Torvalds	9a5467fd60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits) tcp: MD5: Fix IPv6 signatures skbuff: add missing kernel-doc for do_not_encrypt net/ipv4/route.c: fix build error tcp: MD5: Fix MD5 signatures on certain ACK packets ipv6: Fix ip6_xmit to send fragments if ipfragok is true ipvs: Move userspace definitions to include/linux/ip_vs.h netdev: Fix lockdep warnings in multiqueue configurations. netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged irda: replace __FUNCTION__ with __func__ nsc-ircc: default to dongle type 9 on IBM hardware bluetooth: add quirks for a few hci_usb devices hysdn: remove the packed attribute from PofTimStamp_tag isdn: use the common ascii hex helpers tg3: adapt tg3 to use reworked PCI PM code atm: fix direct casts of pointers to u32 in the InterPhase driver atm: fix const assignment/discard warnings in the ATM networking driver net: use the common ascii hex helpers random32: seeding improvement ...	2008-08-01 11:35:16 -07:00
Al Viro	a1bc6eb4b4	[PATCH] ipv4_static_sysctl_init() should be under CONFIG_SYSCTL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:22 -04:00
Adam Langley	00b1304c4c	tcp: MD5: Fix IPv6 signatures Reported by Stefanos Harhalakis; although 2.6.27-rc1 talks to itself using IPv6 TCP MD5 packets just fine, Stefanos noted that tcpdump claimed that the signatures were invalid. I broke this in `49a72dfb88` ("tcp: Fix MD5 signatures for non-linear skbs"), it was just a typo. Note that tcpdump will still sometimes claim that the signatures are incorrect. A patch to tcpdump has been submitted for this[1]. [1] http://tinyurl.com/6a4fl2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 21:36:07 -07:00
Ingo Molnar	8a9204db66	net/ipv4/route.c: fix build error fix: net/ipv4/route.c: In function 'ip_static_sysctl_init': net/ipv4/route.c:3225: error: 'ipv4_route_path' undeclared (first use in this function) net/ipv4/route.c:3225: error: (Each undeclared identifier is reported only once net/ipv4/route.c:3225: error: for each function it appears in.) net/ipv4/route.c:3225: error: 'ipv4_route_table' undeclared (first use in this function) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:51:22 -07:00
Adam Langley	90b7e1120b	tcp: MD5: Fix MD5 signatures on certain ACK packets I noticed, looking at tcpdumps, that timewait ACKs were getting sent with an incorrect MD5 signature when signatures were enabled. I broke this in `49a72dfb88` ("tcp: Fix MD5 signatures for non-linear skbs"). I didn't take into account that the skb passed to tcp_*_send_ack was the inbound packet, thus the source and dest addresses need to be swapped when calculating the MD5 pseudoheader. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:49:48 -07:00
Wei Yongjun	77e2f14f71	ipv6: Fix ip6_xmit to send fragments if ipfragok is true SCTP used ip6_xmit() to send fragments after received ICMP packet too big message. But while send packet used ip6_xmit, the skb->local_df is not initialized. So when skb if enter ip6_fragment(), the following code will discard the skb. ip6_fragment(...) { if (!skb->local_df) { ... return -EMSGSIZE; } ... } SCTP do the following step: 1. send packet ip6_xmit(skb, ipfragok=0) 2. received ICMP packet too big message 3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1) This patch fixed the problem by set local_df if ipfragok is true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:46:47 -07:00
David S. Miller	c3f26a269c	netdev: Fix lockdep warnings in multiqueue configurations. When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 16:58:50 -07:00
Pavel Emelyanov	967ab999a0	netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc Deleting a timer with del_timer doesn't guarantee, that the timer function is not running at the moment of deletion. Thus in the xt_hashlimit case we can get into a ticklish situation when the htable_gc rearms the timer back and we'll actually delete an entry with a pending timer. Fix it with using del_timer_sync(). AFAIK del_timer_sync checks for the timer to be pending by itself, so I remove the check. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:52 -07:00
Pavel Emelyanov	a8ddc9163c	netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations The thing is that recent_mt_destroy first flushes the entries from table with the recent_table_flush and only after this removes the proc file, corresponding to that table. Thus, if we manage to write to this file the '+XXX' command we will leak some entries. If we manage to write there a 'clean' command we'll race in two recent_table_flush flows, since the recent_mt_destroy calls this outside the recent_lock. The proper solution as I see it is to remove the proc file first and then go on with flushing the table. This flushing becomes safe w/o the lock, since the table is already inaccessible from the outside. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:31 -07:00
Patrick McHardy	ae375044d3	netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged In order to time out dead connections quicker, keep track of outstanding data and cap the timeout. Suggested by Herbert Xu. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:01 -07:00
David Howells	cba5cbd155	atm: fix const assignment/discard warnings in the ATM networking driver Fix const assignment/discard warnings in the ATM networking driver. The lane2_assoc_ind() function needed its arguments changing to match changes in the lane2_ops struct (patch `61c33e0129` "atm: use const where reasonable"). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:31:46 -07:00
Harvey Harrison	6a8341b68b	net: use the common ascii hex helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:30:15 -07:00
Simon Wunderlich	4adf0af681	bridge: send correct MTU value in PMTU (revised) When bridging interfaces with different MTUs, the bridge correctly chooses the minimum of the MTUs of the physical devices as the bridges MTU. But when a frame is passed which fits through the incoming, but not through the outgoing interface, a "Fragmentation Needed" packet is generated. However, the propagated MTU is hardcoded to 1500, which is wrong in this situation. The sender will repeat the packet again with the same frame size, and the same problem will occur again. Instead of sending 1500, the (correct) MTU value of the bridge is now sent via PMTU. To achieve this, the corresponding rtable structure is stored in its net_bridge structure. Modified to get rid of fake_net_device as well. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:27:55 -07:00
Robert P. J. Day	031cf19e6f	net: Make "networking" one-click deselectable. Use a menuconfig directive to make all of networking support one-click deselectable from the top-level menu. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:53 -07:00
Daniel Lezcano	17ef51fce0	ipv6: Fix useless proc net sockstat6 removal This call is no longer needed, sockstat6 is per namespace so it is removed at the namespace subsystem destruction. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:52 -07:00
David S. Miller	785957d3e8	tcp: MD5: Use MIB counter instead of warning for MD5 mismatch. From a report by Matti Aarnio, and preliminary patch by Adam Langley. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:25 -07:00
David S. Miller	8d50b53d66	pkt_sched: Fix OOPS on ingress qdisc add. Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 02:44:25 -07:00
Miao Xie	4a36702e01	IPv6: datagram_send_ctl() should exit immediately when an error occured When an error occured, datagram_send_ctl() should exit immediately rather than continue to run the for loop. Otherwise, the variable err might be changed and the error might be hidden. Fix this bug by using "goto" instead of "break". Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-29 23:57:58 -07:00
David S. Miller	e93dc4891d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-07-29 21:51:00 -07:00
Luis Carlos Cobo	56a6d13dfd	mac80211: fix mesh beaconing This patch fixes mesh beaconing, which was broken by "mac80211: revamp beacon configuration". Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:09 -04:00
Johannes Berg	14db74bcc3	mac80211: fix cfg80211 hooks for master interface The master interface is a virtual interface that is registered to mac80211, changing that does not seem like a good idea at the moment. However, since it has no sdata, we cannot accept any configuration for it. This patch makes the cfg80211 hooks reject any such attempt. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Johannes Berg	bba95fefb8	nl80211: fix dump callbacks Julius Volz pointed out that the dump callbacks in nl80211 were broken and fixed one of them. This patch fixes the other three and also addresses the TODOs there. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Julius Volz <juliusv@google.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Johannes Berg	d0f0980414	mac80211: partially fix skb->cb use This patch fixes mac80211 to not use the skb->cb over the queue step from virtual interfaces to the master. The patch also, for now, disables aggregation because that would still require requeuing, will fix that in a separate patch. There are two other places (software requeue and powersaving stations) where requeue can happen, but that is not currently used by any drivers/not possible to use respectively. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Rami Rosen	5422399518	mac80211: append CONFIG_ to MAC80211_VERBOSE_PS_DEBUG in net/mac80211/tx.c. In net/mac80211/tx.c, there are some #ifdef which checks MAC80211_VERBOSE_PS_DEBUG (which in fact is never set) instead of CONFIG_MAC80211_VERBOSE_PS_DEBUG, as should be. This patch replaces MAC80211_VERBOSE_PS_DEBUG with CONFIG_MAC80211_VERBOSE_PS_DEBUG in these #ifdef commands in net/mac80211/tx.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Jeremy Fitzhardinge	023a04bebe	mac80211: return correct error return from ieee80211_wep_init Return the proper error code rather than a hard-coded ENOMEM from ieee80211_wep_init. Also, print the error code on failure. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Jiri Slaby	1b0241656b	mac80211: tx, use dev_kfree_skb_any for beacon_get Use dev_kfree_skb_any(); instead of dev_kfree_skb();, since ieee80211_beacon_get function might be called from atomic. (It's in a fail path.) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Henrique de Moraes Holschuh	435307a365	rfkill: yet more minor kernel-doc fixes For some stupid reason, I sent and old version of the patch minor kernel doc-fix patch, and it got merged before I noticed the problem. This is an incremental fix on top. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:03 -04:00
Henrique de Moraes Holschuh	064af1117b	rfkill: mutex fixes There are two mutexes in rfkill: rfkill->mutex, which protects some of the fields of a rfkill struct, and is also used for callback serialization. rfkill_mutex, which protects the global state, the list of registered rfkill structs and rfkill->claim. Make sure to use the correct mutex, and to not miss locking rfkill->mutex even when we already took rfkill_mutex. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:35 -04:00
Henrique de Moraes Holschuh	37f55e9d78	rfkill: fix led-trigger unregister order in error unwind rfkill needs to unregister the led trigger AFTER a call to rfkill_remove_switch(), otherwise it will not update the LED state, possibly leaving it ON when it should be OFF. To make led-trigger unregistering safer, guard against unregistering a trigger twice, and also against issuing trigger events to a led trigger that was unregistered. This makes the error unwind paths more resilient. Refer to "rfkill: Register LED triggers before registering switch". Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Michael Buesch <mb@bu3sch.de> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:32 -04:00
Henrique de Moraes Holschuh	2fd9b2212e	rfkill: document rfkill_force_state as required (v2) While the rfkill class does work with just get_state(), it doesn't work well on devices that are subject to external events that cause rfkill state changes. Document that rfkill_force_state() is required in those cases. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:32 -04:00
Ingo Molnar	9e3ee1c39c	Merge branch 'linus' into cpus4096 Conflicts: kernel/stop_machine.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 23:32:00 +02:00
Ingo Molnar	414f746d23	Merge branch 'linus' into cpus4096	2008-07-28 21:14:43 +02:00
Al Viro	eeb61f719c	missing bits of net-namespace / sysctl Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-27 09:45:34 -07:00
David S. Miller	2ab61b0111	Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-2.6	2008-07-27 05:00:25 -07:00
Al Viro	6f9f489a4e	net: missing bits of net-namespace / sysctl Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-27 04:40:51 -07:00
David S. Miller	15d3b4a262	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-07-27 04:40:08 -07:00
David S. Miller	2c3abab7c9	ipcomp: Fix warnings after ipcomp consolidation. net/ipv4/ipcomp.c: In function ‘ipcomp4_init_state’: net/ipv4/ipcomp.c:109: warning: unused variable ‘calg_desc’ net/ipv4/ipcomp.c:108: warning: unused variable ‘ipcd’ net/ipv4/ipcomp.c:107: warning: ‘err’ may be used uninitialized in this function net/ipv6/ipcomp6.c: In function ‘ipcomp6_init_state’: net/ipv6/ipcomp6.c:139: warning: unused variable ‘calg_desc’ net/ipv6/ipcomp6.c:138: warning: unused variable ‘ipcd’ net/ipv6/ipcomp6.c:137: warning: ‘err’ may be used uninitialized in this function Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-27 03:59:24 -07:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Linus Torvalds	2284284281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON	2008-07-26 20:17:56 -07:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
Al Viro	bd7b1533cd	[PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns ones Massage ipv4 initialization - make sure that net.ipv4 appears as non-per-net-namespace before it shows up in per-net-namespace sysctls. That's the only change outside of sysctl.c needed to get sane ordering rules and data structures for sysctls (esp. for procfs side of that mess). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:10 -04:00
Al Viro	734550921e	[PATCH] beginning of sysctl cleanup - ctl_table_set New object: set of sysctls [currently - root and per-net-ns]. Contains: pointer to parent set, list of tables and "should I see this set?" method (->is_seen(set)). Current lists of tables are subsumed by that; net-ns contains such a beast. ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of that to ->list of that ctl_table_set. [folded compile fixes by rdd for configs without sysctl] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:08 -04:00
Hugh Dickins	6c3b8fc618	netns: fix ip_rt_frag_needed rt_is_expired Running recent kernels, and using a particular vpn gateway, I've been having to edit my mails down to get them accepted by the smtp server. Git bisect led to commit `e84f84f276` - netns: place rt_genid into struct net. The conversion from a != test to rt_is_expired() put one negative too many: and now my mail works. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:51:06 -07:00
Patrick McHardy	6c64825bf4	netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences As Linus points out, "ct->ext" and "new" are always equal, avoid unnecessary dereferences and use "new" directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:50:05 -07:00
Pekka Enberg	93bc4e89c2	netfilter: fix double-free and use-after free As suggested by Patrick McHardy, introduce a __krealloc() that doesn't free the original buffer to fix a double-free and use-after-free bug introduced by me in netfilter that uses RCU. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Tested-by: Dieter Ries <clip2@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:49:33 -07:00
Alexey Dobriyan	3918fed5f3	netfilter: arptables in netns for real IN, FORWARD -- grab netns from in device, OUT -- from out device. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:59 -07:00
Alexey Dobriyan	f858b4869a	netfilter: ip{,6}tables_security: fix future section mismatch Currently not visible, because NET_NS is mutually exclusive with SYSFS which is required by SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:38 -07:00
Alexey Dobriyan	e40f51a36a	netfilter: ebtables: use nf_register_hooks() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:47:53 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Mike Travis	0bc3cc03fa	cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu * Replace previous instances of the cpumask_of_cpu_ptr* macros with a the new (lvalue capable) generic cpumask_of_cpu(). Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 16:40:33 +02:00
Wei Yongjun	860239c56b	dccp: Add check for truncated ICMPv6 DCCP error packets This patch adds a minimum-length check for ICMPv6 packets, as per the previous patch for ICMPv4 payloads. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:11 +01:00
Wei Yongjun	18e1d83600	dccp: Fix incorrect length check for ICMPv4 packets Unlike TCP, which only needs 8 octets of original packet data, DCCP requires minimally 12 or 16 bytes for ICMP-payload sequence number checks. This patch replaces the insufficient length constant of 8 with a two-stage test, making sure that 12 bytes are available, before computing the basic header length required for sequence number checks. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Wei Yongjun	e0bcfb0c6a	dccp: Add check for sequence number in ICMPv6 message This adds a sequence number check for ICMPv6 DCCP error packets, in the same manner as it has been done for ICMPv4 in the previous patch. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Wei Yongjun	d68f0866f7	dccp: Fix sequence number check for ICMPv4 packets The payload of ICMP message is a part of the packet sent by ourself, so the sequence number check must use AWL and AWH, not SWL and SWH. For example: Endpoint A Endpoint B DATA-ACK --------> (SEQ=X) <-------- ICMP (Fragmentation Needed) (SEQ=X) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Gerrit Renker	73f18fdbca	dccp: Bug-Fix - AWL was never updated The AWL lower Ack validity window advances in proportion to GSS, the greatest sequence number sent. Updating AWL other than at connection setup (in the DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code. This bug lead to syslog messages such as "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] P.ackno exists or LAWL(82947089) <= P.ackno(82948208) <= S.AWH(82948728), sending SYNC..." The difference between AWL/AWH here is 1639 packets, while the expected value (the Sequence Window) would have been 100 (the default). A closer look showed that LAWL = AWL = 82947089 equalled the ISS on the Response. The patch now updates AWL with each increase of GSS. Further changes: ---------------- The patch also enforces more stringent checks on the ISS sequence number: * AWL is initialised to ISS at connection setup and remains at this value; * AWH is then always set to GSS (via dccp_update_gss()); * so on the first Request: AWL = AWH = ISS, and on the n-th Request: AWL = ISS, AWH = ISS + n. As a consequence, only Response packets that refer to Requests sent by this host will pass, all others are discarded. This is the intention and in effect implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-07-26 11:59:10 +01:00
Gerrit Renker	59435444a1	dccp: Allow to distinguish original and retransmitted packets This patch allows the sender to distinguish original and retransmitted packets, which is in particular needed for the retransmission of DCCP-Requests: * the first Request uses ISS (generated in net/dccp/ip.c), and sets GSS = ISS; all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted Request has sequence number ISS + n (mod 48). To add generic support, the patch reorganises existing code so that: * icsk_retransmits == 0 for the original packet and * icsk_retransmits = n > 0 for the n-th retransmitted packet at the time dccp_transmit_skb() is called, via dccp_retransmit_skb(). Thanks to Wei Yongjun for pointing this problem out. Further changes: ---------------- * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head is used for all retransmissions (the exception is client-Acks in PARTOPEN state, but these do not use sk_send_head); * since sk_send_head always contains the original skb (via dccp_entail()), skb_cloned() never evaluated to true and thus pskb_copy() was never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:09 +01:00
David S. Miller	cdec7e50a4	Revert "pkt_sched: sch_sfq: dump a real number of flows" This reverts commit `f867e6af94`. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:28:09 -07:00
Florian Westphal	16df845f45	syncookies: Make sure ECN is disabled ecn_ok is not initialized when a connection is established by cookies. The cookie syn-ack never sets ECN, so ecn_ok must be set to 0. Spotted using ns-3/network simulation cradle simulator and valgrind. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:21:54 -07:00
Ilpo Järvinen	547b792cac	net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:43:18 -07:00
Linus Torvalds	1ff8419871	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipsec: ipcomp - Decompress into frags if necessary ipsec: ipcomp - Merge IPComp implementations pkt_sched: Fix locking in shutdown_scheduler_queue()	2008-07-25 17:40:16 -07:00
Stephen Hemminger	4ecb90090c	sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN Extend the permission check for networking sysctl's to allow modification when current process has CAP_NET_ADMIN capability and is not root. This version uses the until now unused permissions hook to override the mode value for /proc/sys/net if accessed by a user with capabilities. Found while working with Quagga. It is impossible to turn forwarding on/off through the command interface because Quagga uses secure coding practice of dropping privledges during initialization and only raising via capabilities when necessary. Since the dameon has reset real/effective uid after initialization, all attempts to access /proc/sys/net variables will fail. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Dave Young	717115e1a5	printk ratelimiting rewrite All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Paul E. McKenney	696adfe84c	list_for_each_rcu must die: networking All uses of list_for_each_rcu() can be profitably replaced by the easier-to-use list_for_each_entry_rcu(). This patch makes this change for networking, in preparation for removing the list_for_each_rcu() API entirely. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Herbert Xu	7d7e5a60c6	ipsec: ipcomp - Decompress into frags if necessary When decompressing extremely large packets allocating them through kmalloc is prone to failure. Therefore it's better to use page frags instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 02:55:33 -07:00
Herbert Xu	6fccab671f	ipsec: ipcomp - Merge IPComp implementations This patch merges the IPv4/IPv6 IPComp implementations since most of the code is identical. As a result future enhancements will no longer need to be duplicated. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 02:54:40 -07:00
David S. Miller	cffe1c5d7a	pkt_sched: Fix locking in shutdown_scheduler_queue() Qdisc locks need to be held with BH disabled. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 01:25:04 -07:00
Linus Torvalds	c3c2233d84	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: sch_sfq: dump a real number of flows atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups netfilter: make security table depend on NETFILTER_ADVANCED tcp: Clear probes_out more aggressively in tcp_ack(). e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call net: Update entry in af_family_clock_key_strings netdev: Remove warning from __netif_schedule(). sky2: don't stop queue on shutdown	2008-07-24 12:14:58 -07:00
Ulrich Drepper	e38b36f325	flag parameters: check magic constants This patch adds test that ensure the boundary conditions for the various constants introduced in the previous patches is met. No code is generated. [akpm@linux-foundation.org: fix alpha] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	77d2720059	flag parameters: NONBLOCK in socket and socketpair This patch introduces support for the SOCK_NONBLOCK flag in socket, socketpair, and paccept. To do this the internal function sock_attach_fd gets an additional parameter which it uses to set the appropriate flag for the file descriptor. Given that in modern, scalable programs almost all socket connections are non-blocking and the minimal additional cost for the new functionality I see no reason not to add this code. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_NONBLOCK O_NONBLOCK static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); return NULL; } int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("socket(0) set non-blocking mode"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_NONBLOCK, 0); if (fd == -1) { puts ("socket(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_NONBLOCK, 0, fds) == -1) { puts ("socketpair(SOCK_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, NULL) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr ) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } fl = fcntl (s2, F_GETFL); if (fl & O_NONBLOCK) { puts ("paccept(0) set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK); if (s2 < 0) { puts ("paccept(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (s2, F_GETFL); if ((fl & O_NONBLOCK) == 0) { puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	c019bbc612	flag parameters: paccept w/out set_restore_sigmask Some platforms do not have support to restore the signal mask in the return path from a syscall. For those platforms syscalls like pselect are not defined at all. This is, I think, not a good choice for paccept() since paccept() adds more value on top of accept() than just the signal mask handling. Therefore this patch defines a scaled down version of the sys_paccept function for those platforms. It returns -EINVAL in case the signal mask is non-NULL but behaves the same otherwise. Note that I explicitly included <linux/thread_info.h>. I saw that it is currently included but indirectly two levels down. There is too much risk in relying on this. The header might change and then suddenly the function definition would change without anyone immediately noticing. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	aaca0bdca5	flag parameters: paccept This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void ) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 \|\| errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	a677a039be	flag parameters: socket and socketpair This patch adds support for flag values which are ORed to the type passwd to socket and socketpair. The additional code is minimal. The flag values in this implementation can and must match the O_* flags. This avoids overhead in the conversion. The internal functions sock_alloc_fd and sock_map_fd get a new parameters and all callers are changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #define PORT 57392 /* For Linux these must be the same. */ #define SOCK_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("socket(0) set close-on-exec flag"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_CLOEXEC, 0); if (fd == -1) { puts ("socket(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_CLOEXEC, 0, fds) == -1) { puts ("socketpair(SOCK_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Jarek Poplawski	f867e6af94	pkt_sched: sch_sfq: dump a real number of flows Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 21:34:27 -07:00
Linus Torvalds	26dcce0fab	Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually	2008-07-23 18:37:44 -07:00
Patrick McHardy	70eed75d76	netfilter: make security table depend on NETFILTER_ADVANCED Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 16:42:42 -07:00
David S. Miller	4b53fb67e3	tcp: Clear probes_out more aggressively in tcp_ack(). This is based upon an excellent bug report from Eric Dumazet. tcp_ack() should clear ->icsk_probes_out even if there are packets outstanding. Otherwise if we get a sequence of ACKs while we do have packets outstanding over and over again, we'll never clear the probes_out value and eventually think the connection is too sick and we'll reset it. This appears to be some "optimization" added to tcp_ack() in the 2.4.x timeframe. In 2.2.x, probes_out is pretty much always cleared by tcp_ack(). Here is Eric's original report: ---------------------------------------- Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost. In order to reproduce the problem, please try the following program on linux-2.6.25.* Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000 iptables -N SLOWLO iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT iptables -A SLOWLO -j DROP iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO Then run the attached program and see the output : # ./loop State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15) write(): Connection timed out wrote 890 bytes but was interrupted after 9 seconds ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455 Exiting read() because no data available (4000 ms timeout). read 860 bytes While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15) tcpdump : 15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257 15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257 15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257 15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257 15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257 15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257 15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257 15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257 15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257 15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257 15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257 15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257 15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257 15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257 15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257 15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257 15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257 15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257 15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257 15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257 15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257 15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257 15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257 15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257 15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257 15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257 15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257 15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257 15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257 15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257 15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257 15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257 15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257 15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257 15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257 15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0 Source of program : /* * small producer/consumer program. * setup a listener on 127.0.0.1:12000 * Forks a child * child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms * Father accepts connection, and read all data / #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <stdio.h> #include <time.h> #include <sys/poll.h> int port = 12000; char buffer[4096]; int main(int argc, char argv[]) { int lfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in socket_address; time_t t0, t1; int on = 1, sfd, res; unsigned long total = 0; socklen_t alen = sizeof(socket_address); pid_t pid; time(&t0); socket_address.sin_family = AF_INET; socket_address.sin_port = htons(port); socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK); if (lfd == -1) { perror("socket()"); return 1; } setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int)); if (bind(lfd, (struct sockaddr )&socket_address, sizeof(socket_address)) == -1) { perror("bind"); close(lfd); return 1; } if (listen(lfd, 1) == -1) { perror("listen()"); close(lfd); return 1; } pid = fork(); if (pid == 0) { int i, cfd = socket(AF_INET, SOCK_STREAM, 0); close(lfd); if (connect(cfd, (struct sockaddr )&socket_address, sizeof(socket_address)) == -1) { perror("connect()"); return 1; } for (i = 0 ; ;) { res = write(cfd, "blablabla\n", 10); if (res > 0) total += res; else if (res == -1) { perror("write()"); break; } else break; usleep(100000); if (++i == 10) { system("ss -on dst 127.0.0.1:12000"); i = 0; } } time(&t1); fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0)); system("ss -on \| grep 127.0.0.1:12000"); close(cfd); return 0; } sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen); if (sfd == -1) { perror("accept"); return 1; } close(lfd); while (1) { struct pollfd pfd[1]; pfd[0].fd = sfd; pfd[0].events = POLLIN; if (poll(pfd, 1, 4000) == 0) { fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n"); break; } res = read(sfd, buffer, sizeof(buffer)); if (res > 0) total += res; else if (res == 0) break; else perror("read()"); } fprintf(stderr, "read %lu bytes\n", total); close(sfd); return 0; } ---------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 16:38:45 -07:00
Oliver Hartkopp	b4942af650	net: Update entry in af_family_clock_key_strings In the merge phase of the CAN subsystem the af_family_clock_key_strings[] have been added to sock.c in commit `443aef0edd` (lockdep: fixup sk_callback_lock annotation). This trivial patch adds the missing name for address family 29 (AF_CAN). Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 14:06:04 -07:00
David S. Miller	5b3ab1dbd4	netdev: Remove warning from __netif_schedule(). It isn't helping anything and we aren't going to be able to change all the drivers that do queue wakeups in strange situations. Just letting a noop_qdisc get scheduled will work because when qdisc_run() executes via net_tx_work() it will simply find no packets pending when it makes the ->dequeue() call in qdisc_restart. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 14:01:29 -07:00
Krzysztof Hałasa	a8817d2f6d	wanmain.c doesn't need syncppp.h Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>	2008-07-23 23:00:36 +02:00
Krzysztof Hałasa	a24e202e3f	Remove dead code from wanmain.c, CONFIG_WANPIPE_MULTPPP doesn't exist Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>	2008-07-23 23:00:36 +02:00
Linus Torvalds	5554b35933	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits) I/OAT: I/OAT version 3.0 support I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT: Add watchdog/reset functionality to ioatdma iop_adma: cleanup iop_chan_xor_slot_count iop_adma: document how to calculate the minimum descriptor pool size iop_adma: directly reclaim descriptors on allocation failure async_tx: make async_tx_test_ack a boolean routine async_tx: remove depend_tx from async_tx_sync_epilog async_tx: export async_tx_quiesce async_tx: fix handling of the "out of descriptor" condition in async_xor async_tx: ensure the xor destination buffer remains dma-mapped async_tx: list_for_each_entry_rcu() cleanup dmaengine: Driver for the Synopsys DesignWare DMA controller dmaengine: Add slave DMA interface dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap dmaengine: Add dma_client parameter to device_alloc_chan_resources dmatest: Simple DMA memcpy test client dmaengine: DMA engine driver for Marvell XOR engine iop-adma: fix platform driver hotplug/coldplug dmaengine: track the number of clients using a channel ... Fixed up conflict in drivers/dca/dca-sysfs.c manually	2008-07-23 12:03:18 -07:00
Linus Torvalds	c010b2f76c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits) ipw2200: Call netif_*_queue() interfaces properly. netxen: Needs to include linux/vmalloc.h [netdrvr] atl1d: fix !CONFIG_PM build r6040: rework init_one error handling r6040: bump release number to 0.18 r6040: handle RX fifo full and no descriptor interrupts r6040: change the default waiting time r6040: use definitions for magic values in descriptor status r6040: completely rework the RX path r6040: call napi_disable when puting down the interface and set lp->dev accordingly. mv643xx_eth: fix NETPOLL build r6040: rework the RX buffers allocation routine r6040: fix scheduling while atomic in r6040_tx_timeout r6040: fix null pointer access and tx timeouts r6040: prefix all functions with r6040 rndis_host: support WM6 devices as modems at91_ether: use netstats in net_device structure sfc: Create one RX queue and interrupt per CPU package by default sfc: Use a separate workqueue for resets sfc: I2C adapter initialisation fixes ...	2008-07-22 19:09:51 -07:00
Maciej Sosnowski	16a37acaaf	I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT DMA performance tuning showed different optimal values of tcp_dma_copybreak for different I/OAT versions (4096 for 1.2 and 2048 for 2.0). This patch lets ioatdma driver set tcp_dma_copybreak value according to these results. [dan.j.williams@intel.com: remove some ifdefs] Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-22 17:30:57 -07:00
Stephen Hemminger	3d0f24a74e	ipv6: icmp6_dst_gc return change Change icmp6_dst_gc to return the one value the caller cares about rather than using call by reference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:35:50 -07:00
Stephen Hemminger	75307c0fe7	ipv6: use kcalloc Th fib_table_hash is an array, so use kcalloc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:35:07 -07:00
Stephen Hemminger	a76d7345a3	ipv6: use spin_trylock_bh Now there is spin_trylock_bh, use it rather than open coding. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:34:35 -07:00
Stephen Hemminger	c8a4522245	ipv6: use round_jiffies This timer normally happens once a minute, there is no need to cause an early wakeup for it, so align it to next second boundary to safe power. It can't be deferred because then it could take too long on cleanup or DoS. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:34:09 -07:00
Stephen Hemminger	417f28bb34	netns: dont alloc ipv6 fib timer list FIB timer list is a trivial size structure, avoid indirection and just put it in existing ns. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:33:45 -07:00
Adrian Bunk	888c848ed3	ipv6: make struct ipv6_devconf static struct ipv6_devconf can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:21:58 -07:00
Adrian Bunk	4d6971e909	sctp: remove sctp_assoc_proc_exit() Commit `20c2c1fd6c` (sctp: add sctp/remaddr table to complete RFC remote address table OID) added an unused sctp_assoc_proc_exit() function that seems to have been unintentionally created when copying the assocs code. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:21:30 -07:00
Adrian Bunk	abd0b198ea	sctp: make sctp_outq_flush() static sctp_outq_flush() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:45 -07:00
Adrian Bunk	a94f779f9d	pkt_sched: make qdisc_class_hash_alloc() static This patch makes the needlessly global qdisc_class_hash_alloc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:11 -07:00
David S. Miller	cf508b1211	netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep. The new address list lock needs to handle the same device layering issues that the _xmit_lock one does. This integrates work done by Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:16:42 -07:00
Dave Jones	d29f749e25	net: Fix build failure with 'make mandocs'. The function header comments have to go with the functions they are documenting, or things go horribly wrong when we try to process them with the docbook tools. Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq' Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; ' Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:09:06 -07:00
Greg Kroah-Hartman	16be63fd16	bluetooth: remove improper bluetooth class symlinks. Don't create symlinks in a class to a device that is not owned by the class. If the bluetooth subsystem really wants to point to all of the devices it controls, it needs to create real devices, not fake symlinks. Cc: Maxim Krasnyansky <maxk@qualcomm.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:51 -07:00
David S. Miller	b32d13102d	tcp: Fix bitmask test in tcp_syn_options() As reported by Alexey Dobriyan: CHECK net/ipv4/tcp_output.c net/ipv4/tcp_output.c:475:7: warning: dubious: !x & y And sparse is damn right! if (unlikely(!OPTION_TS & opts->options)) ^^^ size += TCPOLEN_SACKPERM_ALIGNED; OPTION_TS is (1 << 1), so condition will never trigger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 18:45:34 -07:00
Gerrit Renker	47112e25da	udplite: Protection against coverage value wrap-around This patch clamps the cscov setsockopt values to a maximum of 0xFFFF. Setsockopt values greater than 0xffff can cause an unwanted wrap-around. Further, IPv6 jumbograms are not supported (RFC 3838, 3.5), so that values greater than 0xffff are not even useful. Further changes: fixed a typo in the documentation. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:35:08 -07:00
Arjan van de Ven	6579e57b31	net: Print the module name as part of the watchdog message As suggested by Dave: This patch adds a function to get the driver name from a struct net_device, and consequently uses this in the watchdog timeout handler to print as part of the message. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:31:48 -07:00
Stephen Hemminger	7943986ca1	net: use kcalloc in netdev_queue alloc Minor nit, use size_t for allocation size and kcalloc to allocate an array. Probably makes no actual code difference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:28:44 -07:00
Stephen Hemminger	847499ce71	ipv6: use timer pending This fixes the bridge reference count problem and cleanups ipv6 FIB timer management. Don't use expires field, because it is not a proper way to test, instead use timer_pending(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:21:35 -07:00
Patrick McHardy	5547cd0ae8	netfilter: nf_conntrack_sctp: fix sparse warnings Introduced by `a258860e` (netfilter: ctnetlink: add full support for SCTP to ctnetlink): net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: incorrect type in argument 1 (different base types) net/netfilter/nf_conntrack_proto_sctp.c:483:2: expected unsigned int [unsigned] [usertype] x net/netfilter/nf_conntrack_proto_sctp.c:483:2: got restricted unsigned int const <noident> net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: incorrect type in argument 1 (different base types) net/netfilter/nf_conntrack_proto_sctp.c:487:2: expected unsigned int [unsigned] [usertype] x net/netfilter/nf_conntrack_proto_sctp.c:487:2: got restricted unsigned int const <noident> net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:532:42: warning: incorrect type in assignment (different base types) net/netfilter/nf_conntrack_proto_sctp.c:532:42: expected restricted unsigned int <noident> net/netfilter/nf_conntrack_proto_sctp.c:532:42: got unsigned int net/netfilter/nf_conntrack_proto_sctp.c:534:39: warning: incorrect type in assignment (different base types) net/netfilter/nf_conntrack_proto_sctp.c:534:39: expected restricted unsigned int <noident> net/netfilter/nf_conntrack_proto_sctp.c:534:39: got unsigned int Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:02 -07:00
Herbert Xu	c71529e42c	netfilter: nf_nat_sip: c= is optional for session According to RFC2327, the connection information is optional in the session description since it can be specified in the media description instead. My provider does exactly that and does not provide any connection information in the session description. As a result the new kernel drops all invite responses. This patch makes it optional as documented. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:02 -07:00
Jan Engelhardt	db1a75bdcc	netfilter: xt_TCPMSS: collapse tcpmss_reverse_mtu{4,6} into one function Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:01 -07:00
Eric Leblond	72961ecf84	netfilter: nfnetlink_log: send complete hardware header This patch adds some fields to NFLOG to be able to send the complete hardware header with all necessary informations. It sends to userspace: * the type of hardware link * the lenght of hardware header * the hardware header Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:00 -07:00
David Howells	280763c053	netfilter: xt_time: fix time's time_mt()'s use of do_div() Fix netfilter xt_time's time_mt()'s use of do_div() on an s64 by using div_s64() instead. This was introduced by patch `ee4411a1b1` ("[NETFILTER]: x_tables: add xt_time match"). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:59 -07:00
Krzysztof Piotr Oledzki	584015727a	netfilter: accounting rework: ct_extend + 64bit counters (v4) Initially netfilter has had 64bit counters for conntrack-based accounting, but it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are still required, for example for "connbytes" extension. However, 64bit counters waste a lot of memory and it was not possible to enable/disable it runtime. This patch: - reimplements accounting with respect to the extension infrastructure, - makes one global version of seq_print_acct() instead of two seq_print_counters(), - makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n), - makes it possible to enable/disable it at runtime by sysctl or sysfs, - extends counters from 32bit to 64bit, - renames ip_conntrack_counter -> nf_conn_counter, - enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT), - set initial accounting enable state based on CONFIG_NF_CT_ACCT - removes buggy IPCT_COUNTER_FILLING event handling. If accounting is enabled newly created connections get additional acct extend. Old connections are not changed as it is not possible to add a ct_extend area to confirmed conntrack. Accounting is performed for all connections with acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct". Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:58 -07:00
Changli Gao	0dbff689c2	netfilter: nf_nat_core: eliminate useless find_appropriate_src for IP_NAT_RANGE_PROTO_RANDOM Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:57 -07:00
David S. Miller	d3678b463d	Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe." This reverts commit `a0c80b80e0`. After discussions with Jamal and Herbert on netdev, we should provide at least minimal prioritization at the qdisc level even in multiqueue situations. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:50 -07:00
Linus Torvalds	867d79fb9a	net: In __netif_schedule() use WARN_ON instead of BUG_ON Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:49 -07:00
David S. Miller	b6b2fed1f4	net: Improve simple_tx_hash(). Based upon feedback from Eric Dumazet and Andi Kleen. Cure several deficiencies in simple_tx_hash() by using jhash + reciprocol multiply. 1) Eliminates expensive modulus operation. 2) Makes hash less attackable by using random seed. 3) Eliminates endianness hash distribution issues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:48 -07:00
Daniel Lezcano	c3ee84163e	pkt_sched: Remove unused variable skb in dev_deactivate_queue function. Removed unused variable 'skb' in the dev_deactivate_queue function Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 09:18:07 -07:00
Ingo Molnar	eb6a12c242	Merge branch 'linus' into cpus4096-for-linus Conflicts: net/sunrpc/svc.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-21 17:19:50 +02:00
Linus Torvalds	14b395e35d	Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits) nfsd: nfs4xdr.c do-while is not a compound statement nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c lockd: Pass "struct sockaddr *" to new failover-by-IP function lockd: get host reference in nlmsvc_create_block() instead of callers lockd: minor svclock.c style fixes lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock lockd: nlm_release_host() checks for NULL, caller needn't file lock: reorder struct file_lock to save space on 64 bit builds nfsd: take file and mnt write in nfs4_upgrade_open nfsd: document open share bit tracking nfsd: tabulate nfs4 xdr encoding functions nfsd: dprint operation names svcrdma: Change WR context get/put to use the kmem cache svcrdma: Create a kmem cache for the WR contexts svcrdma: Add flush_scheduled_work to module exit function svcrdma: Limit ORD based on client's advertised IRD svcrdma: Remove unused wait q from svcrdma_xprt structure svcrdma: Remove unneeded spin locks from __svc_rdma_free svcrdma: Add dma map count and WARN_ON ...	2008-07-20 21:21:46 -07:00
David Miller	702beb87d6	ipv6: Fix warning in addrconf code. Reported by Linus. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-20 21:18:26 -07:00
Linus Torvalds	d1671a9c15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: Fix build with NET_SCHED disabled.	2008-07-20 21:17:20 -07:00
David S. Miller	3a682fbd73	pkt_sched: Fix build with NET_SCHED disabled. The stab bits can't be referenced uniless the full packet scheduler layer is enabled. Reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 18:13:01 -07:00
Linus Torvalds	db6d8c7a40	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits) iucv: Fix bad merging. net_sched: Add size table for qdiscs net_sched: Add accessor function for packet length for qdiscs net_sched: Add qdisc_enqueue wrapper highmem: Export totalhigh_pages. ipv6 mcast: Omit redundant address family checks in ip6_mc_source(). net: Use standard structures for generic socket address structures. ipv6 netns: Make several "global" sysctl variables namespace aware. netns: Use net_eq() to compare net-namespaces for optimization. ipv6: remove unused macros from net/ipv6.h ipv6: remove unused parameter from ip6_ra_control tcp: fix kernel panic with listening_get_next tcp: Remove redundant checks when setting eff_sacks tcp: options clean up tcp: Fix MD5 signatures for non-linear skbs sctp: Update sctp global memory limit allocations. sctp: remove unnecessary byteshifting, calculate directly in big-endian sctp: Allow only 1 listening socket with SO_REUSEADDR sctp: Do not leak memory on multiple listen() calls sctp: Support ipv6only AF_INET6 sockets. ...	2008-07-20 17:43:29 -07:00
Alan Cox	a352def21a	tty: Ldisc revamp Move the line disciplines towards a conventional ->ops arrangement. For the moment the actual 'tty_ldisc' struct in the tty is kept as part of the tty struct but this can then be changed if it turns out that when it all settles down we want to refcount ldiscs separately to the tty. Pull the ldisc code out of /proc and put it with our ldisc code. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-20 17:12:34 -07:00
David S. Miller	fb65a7c091	iucv: Fix bad merging. Noticed by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 10:18:44 -07:00
Jussi Kivilinna	175f9c1bba	net_sched: Add size table for qdiscs Add size table functions for qdiscs and calculate packet size in qdisc_enqueue(). Based on patch by Patrick McHardy http://marc.info/?l=linux-netdev&m=115201979221729&w=2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:47 -07:00
Jussi Kivilinna	0abf77e55a	net_sched: Add accessor function for packet length for qdiscs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:27 -07:00
Jussi Kivilinna	5f86173bdf	net_sched: Add qdisc_enqueue wrapper Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:04 -07:00
YOSHIFUJI Hideaki	a6ffb404dc	ipv6 mcast: Omit redundant address family checks in ip6_mc_source(). The caller has alredy checked for them. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:36:07 -07:00
YOSHIFUJI Hideaki	230b183921	net: Use standard structures for generic socket address structures. Use sockaddr_storage{} for generic socket address storage and ensures proper alignment. Use sockaddr{} for pointers to omit several casts. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:35:47 -07:00
YOSHIFUJI Hideaki	53b7997fd5	ipv6 netns: Make several "global" sysctl variables namespace aware. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:35:03 -07:00
YOSHIFUJI Hideaki	721499e893	netns: Use net_eq() to compare net-namespaces for optimization. Without CONFIG_NET_NS, namespace is always &init_net. Compiler will be able to omit namespace comparisons with this patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:34:43 -07:00
David S. Miller	407d819cf0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6	2008-07-19 00:30:39 -07:00
Denis V. Lunev	725a8ff04a	ipv6: remove unused parameter from ip6_ra_control Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:28:58 -07:00
Daniel Lezcano	bdccc4ca13	tcp: fix kernel panic with listening_get_next # BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 PGD 11e4b9067 PUD 11d16c067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 3 Modules linked in: bridge ipv6 button battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor thermal_sys hwmon sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 3368, comm: slpd Not tainted 2.6.26-rc2-mm1-lxc4 #1 RIP: 0010:[<ffffffff821ed01e>] [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 RSP: 0018:ffff81011e1fbe18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8100be0ad3c0 RCX: ffff8100619f50c0 RDX: ffffffff82475be0 RSI: ffff81011d9ae6c0 RDI: ffff8100be0ad508 RBP: ffff81011f4f1240 R08: 00000000ffffffff R09: ffff8101185b6780 R10: 000000000000002d R11: ffffffff820fdbfa R12: ffff8100be0ad3c8 R13: ffff8100be0ad6a0 R14: ffff8100be0ad3c0 R15: ffffffff825b8ce0 FS: 00007f6a0ebd16d0(0000) GS:ffff81011f424540(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 000000011dc20000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process slpd (pid: 3368, threadinfo ffff81011e1fa000, task ffff81011f4b8660) Stack: 00000000000002ee ffff81011f5a57c0 ffff81011f4f1240 ffff81011e1fbe90 0000000000001000 0000000000000000 00007fff16bf2590 ffffffff821ed9c8 ffff81011f5a57c0 ffff81011d9ae6c0 000000000000041a ffffffff820b0abd Call Trace: [<ffffffff821ed9c8>] ? tcp_seq_next+0x34/0x7e [<ffffffff820b0abd>] ? seq_read+0x1aa/0x29d [<ffffffff820d21b4>] ? proc_reg_read+0x73/0x8e [<ffffffff8209769c>] ? vfs_read+0xaa/0x152 [<ffffffff82097a7d>] ? sys_read+0x45/0x6e [<ffffffff8200bd2b>] ? system_call_after_swapgs+0x7b/0x80 Code: 31 a9 25 00 e9 b5 00 00 00 ff 45 20 83 7d 0c 01 75 79 4c 8b 75 10 48 8b 0e eb 1d 48 8b 51 20 0f b7 45 08 39 02 75 0e 48 8b 41 28 <4c> 39 78 38 0f 84 93 00 00 00 48 8b 09 48 85 c9 75 de 8b 55 1c RIP [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 RSP <ffff81011e1fbe18> CR2: 0000000000000038 This kernel panic appears with CONFIG_NET_NS=y. How to reproduce ? On the buggy host (host A) * ip addr add 1.2.3.4/24 dev eth0 On a remote host (host B) * ip addr add 1.2.3.5/24 dev eth0 * iptables -A INPUT -p tcp -s 1.2.3.4 -j DROP * ssh 1.2.3.4 On host A: * netstat -ta or cat /proc/net/tcp This bug happens when reading /proc/net/tcp[6] when there is a req_sock at the SYN_RECV state. When a SYN is received the minisock is created and the sk field is set to NULL. In the listening_get_next function, we try to look at the field req->sk->sk_net. When looking at how to fix this bug, I noticed that is useless to do the check for the minisock belonging to the namespace. A minisock belongs to a listen point and this one is per namespace, so when browsing the minisock they are always per namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:15:13 -07:00
Adam Langley	4389dded77	tcp: Remove redundant checks when setting eff_sacks Remove redundant checks when setting eff_sacks and make the number of SACKs a compile time constant. Now that the options code knows how many SACK blocks can fit in the header, we don't need to have the SACK code guessing at it. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:07:02 -07:00
Adam Langley	33ad798c92	tcp: options clean up This should fix the following bugs: * Connections with MD5 signatures produce invalid packets whenever SACK options are included * MD5 signatures are counted twice in the MSS calculations Behaviour changes: * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK This is because we can't fit any SACK blocks in a packet with MD5 + TS options. There was discussion about disabling SACK rather than TS in order to fit in better with old, buggy kernels, but that was deemed to be unnecessary. * SYNs with MD5 don't include a TS option See above. Additionally, it removes a bunch of duplicated logic for calculating options, which should help avoid these sort of issues in the future. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:04:31 -07:00
Adam Langley	49a72dfb88	tcp: Fix MD5 signatures for non-linear skbs Currently, the MD5 code assumes that the SKBs are linear and, in the case that they aren't, happily goes off and hashes off the end of the SKB and into random memory. Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy Polyakov. Also includes a couple of missed route_caps from Stephen's patch in [2]. [1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2 [2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:01:42 -07:00
Vlad Yasevich	845525a642	sctp: Update sctp global memory limit allocations. Update sctp global memory limit allocations to be the same as TCP. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:08:21 -07:00
Harvey Harrison	336d3262df	sctp: remove unnecessary byteshifting, calculate directly in big-endian Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:07:09 -07:00
Vlad Yasevich	4e54064e0a	sctp: Allow only 1 listening socket with SO_REUSEADDR When multiple socket bind to the same port with SO_REUSEADDR, only 1 can be listining. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:06:32 -07:00
Vlad Yasevich	23b29ed80b	sctp: Do not leak memory on multiple listen() calls SCTP permits multiple listen call and on subsequent calls we leak he memory allocated for the crypto transforms. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:06:07 -07:00
Vlad Yasevich	7dab83de50	sctp: Support ipv6only AF_INET6 sockets. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:05:40 -07:00
Florian Westphal	6d0ccbac68	sctp: Prevent uninitialized memory access valgrind reports uninizialized memory accesses when running sctp inside the network simulation cradle simulator: Conditional jump or move depends on uninitialised value(s) at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324) by 0x57427DA: sctp_packet_transmit (output.c:403) by 0x5710EFF: sctp_outq_flush (outqueue.c:824) by 0x5710B88: sctp_outq_uncork (outqueue.c:701) by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548) by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976) by 0x5744460: sctp_do_sm (sm_sideeffect.c:945) by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94) by 0x5725C04: __sctp_connect (socket.c:1094) by 0x57297DC: sctp_connect (socket.c:3297) Conditional jump or move depends on uninitialised value(s) at 0x575D3A5: mod_timer (timer.c:630) by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555) by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448) by 0x5753607: sctp_side_effects (sm_sideeffect.c:976) by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945) by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474) by 0x573347F: sctp_inq_push (inqueue.c:104) by 0x572EF93: sctp_rcv (input.c:256) by 0x5689623: ip_local_deliver_finish (ip_input.c:230) by 0x5689759: ip_local_deliver (ip_input.c:268) by 0x5689CAC: ip_rcv_finish (dst.h:246) #1 is due to "if (t->pmtu_pending)". `8a4794914f` "[SCTP] Flag a pmtu change request" suggests it should be initialized to 0. #2 is the heartbeat timer 'expires' value, which is uninizialised, but test by mod_timer(). T3_rtx_timer seems to be affected by the same problem, so initialize it, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:04:39 -07:00
Florian Westphal	c4e85f82ed	sctp: Don't abort initialization when CONFIG_PROC_FS=n This puts CONFIG_PROC_FS defines around the proc init/exit functions and also avoids compiling proc.c if procfs is not supported. Also make SCTP_DBG_OBJCNT depend on procfs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:03:44 -07:00
Stephen Hemminger	c1e20f7c8b	tcp: RTT metrics scaling Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in kernel units (jiffies) and this leaks out through the netlink API to user space where the units for jiffies are unknown. This patches changes the kernel to convert to/from milliseconds. This changes the ABI, but milliseconds seemed like the most natural unit for these parameters. Values available via syscall in /proc/net/rt_cache and netlink will be in milliseconds. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:02:15 -07:00
David S. Miller	30ee42be00	pkt_sched: Fix noqueue_qdisc initialization. Like noop_qdisc, it needs a dummy backpointer and explicit qdisc->q.lock initialization. Based upon a report by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:00:11 -07:00
David S. Miller	3072367300	pkt_sched: Manage qdisc list inside of root qdisc. Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 22:50:15 -07:00
David S. Miller	72b25a913e	pkt_sched: Get rid of u32_list. The u32_list is just an indirect way of maintaining a reference to a U32 node on a per-qdisc basis. Just add an explicit node pointer for u32 to struct Qdisc an do away with this global list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 20:54:17 -07:00
Patrick McHardy	8913336a7e	packet: add PACKET_RESERVE sockopt Add new sockopt to reserve some headroom in the mmaped ring frames in front of the packet payload. This can be used f.i. when the VLAN header needs to be (re)constructed to avoid moving the entire payload. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 18:05:19 -07:00
Mike Travis	65c0118453	cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr * This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:57 +02:00
Ingo Molnar	bb2c018b09	Merge branch 'linus' into cpus4096 Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:00:54 +02:00
Pavel Emelyanov	b6fcbdb4f2	proc: consolidate per-net single-release callers They are symmetrical to single_open ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:44 -07:00
Pavel Emelyanov	de05c557b2	proc: consolidate per-net single_open callers There are already 7 of them - time to kill some duplicate code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:21 -07:00
Pavel Emelyanov	60bdde9580	proc: clean the ip_misc_proc_init and ip_proc_init_net error paths After all this stuff is moved outside, this function can look better. Besides, I tuned the error path in ip_proc_init_net to make it have only 2 exit points, not 3. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:50 -07:00
Pavel Emelyanov	8e3461d01b	proc: show per-net ip_devconf.forwarding in /proc/net/snmp This one has become per-net long ago, but the appropriate file is per-net only now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:26 -07:00
Pavel Emelyanov	229bf0cbaa	proc: create /proc/net/snmp file in each net All the statistics shown in this file have been made per-net already. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:04 -07:00
Pavel Emelyanov	7b7a9dfdf6	proc: create /proc/net/netstat file in each net Now all the shown in it statistics is netnsizated, time to show it in appropriate net. The appropriate net init/exit ops already exist - they make the sockstat file per net - so just extend them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:05:17 -07:00
Pavel Emelyanov	d89cbbb1e6	ipv4: clean the init_ipv4_mibs error paths After moving all the stuff outside this function it looks a bit ugly - make it look better. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:51 -07:00
Pavel Emelyanov	923c6586b0	mib: put icmpmsg statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:22 -07:00
Pavel Emelyanov	b60538a0d7	mib: put icmp statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:02 -07:00

... 2 3 4 5 6 ...

9842 Commits