android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Ingo Molnar	14cc3e2b63	[PATCH] sem2mutex: misc static one-file mutexes Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jens Axboe <axboe@suse.de> Cc: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:55 -08:00
Linus Torvalds	b55813a2e5	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER] x_table.c: sem2mutex [IPV4]: Aggregate route entries with different TOS values [TCP]: Mark tcp_*mem[] __read_mostly. [TCP]: Set default max buffers from memory pool size [SCTP]: Fix up sctp_rcv return value [NET]: Take RTNL when unregistering notifier [WIRELESS]: Fix config dependencies. [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms. [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated. [MODULES]: Don't allow statically declared exports [BRIDGE]: Unaligned accesses in the ethernet bridge	2006-03-25 08:39:20 -08:00
Davide Libenzi	f348d70a32	[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:56 -08:00
Ilia Sotnikov	cef2685e00	[IPV4]: Aggregate route entries with different TOS values When we get an ICMP need-to-frag message, the original TOS value in the ICMP payload cannot be used as a key to look up the routes to update. This is because the TOS field may have been modified by routers on the way. Similarly, ip_rt_redirect should also ignore the TOS as the router that gave us the message may have modified the TOS value. The patch achieves this objective by aggregating entries with different TOS values (but are otherwise identical) into the same bucket. This makes it easy to update them at the same time when an ICMP message is received. In future we should use a twin-hashing scheme where teh aggregation occurs at the entry level. That is, the TOS goes back into the hash for normal lookups while ICMP lookups will end up with a node that gives us a list that contains all other route entries that differ only by TOS. Signed-off-by: Ilia Sotnikov <hostcc@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:38:55 -08:00
David S. Miller	b8059eadf9	[TCP]: Mark tcp_*mem[] __read_mostly. Suggested by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:36:56 -08:00
John Heffner	7b4f4b5ebc	[TCP]: Set default max buffers from memory pool size This patch sets the maximum TCP buffer sizes (available to automatic buffer tuning, not to setsockopt) based on the TCP memory pool size. The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more than 1/128 of the memory pressure threshold. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:34:07 -08:00
Alexey Dobriyan	53b3531bbb	[PATCH] s/;;/;/g Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-24 07:33:24 -08:00
Patrick McHardy	a5cdc03003	[IPV4]: Add fib rule netlink notifications To really make sense of route notifications in the presence of multiple tables, userspace also needs to be notified about routing rule updates. Notifications are sent to the so far unused RTNLGRP_NOP1 (now RTNLGRP_RULE) group. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-23 01:16:06 -08:00
Alexey Kuznetsov	1a55d57b10	[TCP]: Do not use inet->id of global tcp_socket when sending RST. The problem is in ip_push_pending_frames(), which uses: if (!df) { __ip_select_ident(iph, &rt->u.dst, 0); } else { iph->id = htons(inet->id++); } instead of ip_select_ident(). Right now I think the code is a nonsense. Most likely, I copied it from old ip_build_xmit(), where it was really special, we had to decide whether to generate unique ID when generating the first (well, the last) fragment. In ip_push_pending_frames() it does not make sense, it should use plain ip_select_ident() instead. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 14:27:59 -08:00
Patrick McHardy	6a534ee35c	[NETFILTER]: Fix undefined references to get_h225_addr get_h225_addr is exported, but declared static, which fails when linking statically. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:57:25 -08:00
Pablo Neira Ayuso	b9f78f9fca	[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand x_tables matches and targets that require nf_conntrack_ipv[4\|6] to work don't have enough information to load on demand these modules. This patch introduces the following changes to solve this issue: o nf_ct_l3proto_try_module_get: try to load the layer 3 connection tracker module and increases the refcount. o nf_ct_l3proto_module put: drop the refcount of the module. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:56:08 -08:00
Pablo Neira Ayuso	a45049c51c	[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches Set the family field in xt_[matches\|targets] registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:55:40 -08:00
Pablo Neira Ayuso	4e3882f773	[NETFILTER]: conntrack: cleanup the conntrack ID initialization Currently the first conntrack ID assigned is 2, use 1 instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:55:11 -08:00
Pablo Neira Ayuso	1cde64365b	[NETFILTER]: ctnetlink: Fix expectaction mask dumping The expectation mask has some particularities that requires a different handling. The protocol number fields can be set to non-valid protocols, ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask tuple will not be dumped. Moreover, this results in a kernel panic when nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F) long. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:54:15 -08:00
Jing Min Zhao	5e35941d99	[NETFILTER]: Add H.323 conntrack/NAT helper Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 23:41:17 -08:00
David S. Miller	dbeff12b4d	[INET]: Fix typo in Arnaldo's connection sock compat fixups. "struct inet_csk" --> "struct inet_connection_sock" :-) Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:52:32 -08:00
Arnaldo Carvalho de Melo	543d9cfeec	[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:48:35 -08:00
Arnaldo Carvalho de Melo	dec73ff029	[ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:46:16 -08:00
Dmitry Mishin	3fdadf7d27	[NET]: {get\|set}sockopt compatibility layer This patch extends {get\|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:45:21 -08:00
Catherine Zhang	2c7946a7bf	[SECURITY]: TCP/UDP getpeersec This patch implements an application of the LSM-IPSec networking controls whereby an application can determine the label of the security association its TCP or UDP sockets are currently connected to via getsockopt and the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of an IPSec security association a particular TCP or UDP socket is using. The application can then use this security context to determine the security context for processing on behalf of the peer at the other end of this connection. In the case of UDP, the security context is for each individual packet. An example application is the inetd daemon, which could be modified to start daemons running at security contexts dependent on the remote client. Patch design approach: - Design for TCP The patch enables the SELinux LSM to set the peer security context for a socket based on the security context of the IPSec security association. The application may retrieve this context using getsockopt. When called, the kernel determines if the socket is a connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry cache on the socket to retrieve the security associations. If a security association has a security context, the context string is returned, as for UNIX domain sockets. - Design for UDP Unlike TCP, UDP is connectionless. This requires a somewhat different API to retrieve the peer security context. With TCP, the peer security context stays the same throughout the connection, thus it can be retrieved at any time between when the connection is established and when it is torn down. With UDP, each read/write can have different peer and thus the security context might change every time. As a result the security context retrieval must be done TOGETHER with the packet retrieval. The solution is to build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). Patch implementation details: - Implementation for TCP The security context can be retrieved by applications using getsockopt with the existing SO_PEERSEC flag. As an example (ignoring error checking): getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen); printf("Socket peer context is: %s\n", optbuf); The SELinux function, selinux_socket_getpeersec, is extended to check for labeled security associations for connected (TCP_ESTABLISHED == sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of struct dst_entry values that may refer to security associations. If these have security associations with security contexts, the security context is returned. getsockopt returns a buffer that contains a security context string or the buffer is unmodified. - Implementation for UDP To retrieve the security context, the application first indicates to the kernel such desire by setting the IP_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for UDP should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_IP && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow a server socket to receive security context of the peer. A new ancillary message type SCM_SECURITY. When the packet is received we get the security context from the sec_path pointer which is contained in the sk_buff, and copy it to the ancillary message space. An additional LSM hook, selinux_socket_getpeersec_udp, is defined to retrieve the security context from the SELinux space. The existing function, selinux_socket_getpeersec does not suit our purpose, because the security context is copied directly to user space, rather than to kernel space. Testing: We have tested the patch by setting up TCP and UDP connections between applications on two machines using the IPSec policies that result in labeled security associations being built. For TCP, we can then extract the peer security context using getsockopt on either end. For UDP, the receiving end can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:41:23 -08:00
Rick Jones	15d99e02ba	[TCP]: sysctl to allow TCP window > 32767 sans wscale Back in the dark ages, we had to be conservative and only allow 15-bit window fields if the window scale option was not negotiated. Some ancient stacks used a signed 16-bit quantity for the window field of the TCP header and would get confused. Those days are long gone, so we can use the full 16-bits by default now. There is a sysctl added so that we can still interact with such old stacks Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:40:29 -08:00
Neil Horman	abd596a4b6	[IPV4] ARP: Alloc acceptance of unsolicited ARP via netdevice sysctl. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:39:47 -08:00
David S. Miller	edb2c34fb2	[NETFILTER]: Fix warnings in ip_nat_snmp_basic.c net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode': net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate': net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:36:21 -08:00
Ingo Molnar	57b47a53ec	[NET]: sem2mutex part 2 Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:35:41 -08:00
Arjan van de Ven	4a3e2f711a	[NET] sem2mutex: net/ Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:33:17 -08:00
Stephen Hemminger	1533306186	[NET]: dev_put/dev_hold cleanup Get rid of the old __dev_put macro that is just a hold over from pre 2.6 kernel. And turn dev_hold into an inline instead of a macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:32:28 -08:00
Stephen Hemminger	6756ae4b4e	[NET]: Convert RTNL to mutex. This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and gets rid of some of the leftover legacy. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:23:58 -08:00
Baruch Even	50bf3e224a	[TCP] H-TCP: Better time accounting Instead of estimating the time since the last congestion event, count it directly. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:23:10 -08:00
Baruch Even	0bc6d90b82	[TCP] H-TCP: Account for delayed-ACKs Account for delayed-ACKs in H-TCP. Delayed-ACKs cause H-TCP to be less aggressive than its design calls for. It is especially true when the receiver is a Linux machine where the average delayed ack is over 3 packets with values of 7 not unheard of. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:22:47 -08:00
Baruch Even	c33ad6e476	[TCP] H-TCP: Use msecs_to_jiffies Use functions to calculate jiffies from milliseconds and not the old, crude method of dividing HZ by a value. Ensures more accurate values even in the face of strange HZ values. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:22:20 -08:00
Arnaldo Carvalho de Melo	c4d9390941	[ICSK]: Introduce inet_csk_ctl_sock_create Consolidating open coded sequences in tcp and dccp, v4 and v6. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:01:03 -08:00
Robert Olsson	06ef921d60	[IPV4]: fib_trie stats fix fib_triestats has been buggy and caused oopses some platforms as openwrt. The patch below should cure those problems. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 21:35:01 -08:00
Robert Olsson	5ddf0eb2bf	[IPV4]: fib_trie initialzation fix In some kernel configs /proc functions seems to be accessed before the trie is initialized. The patch below checks for this. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 21:34:12 -08:00
John Heffner	0e7b13685f	[TCP] mtu probing: move tcp-specific data out of inet_connection_sock This moves some TCP-specific MTU probing state out of inet_connection_sock back to tcp_sock. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 21:32:58 -08:00
Patrick McHardy	a193a4abdd	[NETFILTER]: Fix skb->nf_bridge lifetime issues The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips the real hook by registering with high priority and returning NF_STOP if skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not set. The flag is only set during the simulated hook. Because skb->nf_bridge is only freed when the packet is destroyed, the packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but in the case of tunnel devices on top of the bridge also all further ones. Forwarded packets from a bridge encapsulated by a tunnel device and sent as locally outgoing packet will also still have the incorrect bridge information from the input path attached. We already have nf_reset calls on all RX/TX paths of tunnel devices, so simply reset the nf_bridge field there too. As an added bonus, the bridge information for locally delivered packets is now also freed when the packet is queued to a socket. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:23:05 -08:00
Jamal Hadi Salim	9500e8a81f	[IPSEC]: Sync series - fast path Fast path sequence updates that will generate ipsec async events Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:15:29 -08:00
Patrick McHardy	a242769248	[NETFILTER]: ctnetlink: avoid unneccessary event message generation Avoid unneccessary event message generation by checking for netlink listeners before building a message. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:03:59 -08:00
Patrick McHardy	c4b8851392	[NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:03:40 -08:00
Patrick McHardy	c498673474	[NETFILTER]: x_tables: add xt_{match,target} arguments to match/target functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:02:56 -08:00
Patrick McHardy	1c524830d0	[NETFILTER]: x_tables: pass registered match/target data to match/target functions This allows to make decisions based on the revision (and address family with a follow-up patch) at runtime. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:02:15 -08:00
Patrick McHardy	aa83c1ab43	[NETFILTER]: Convert arp_tables targets to centralized error checking Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:01:28 -08:00
Patrick McHardy	1d5cd90976	[NETFILTER]: Convert ip_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:01:14 -08:00
Patrick McHardy	3cdc7c953e	[NETFILTER]: Change {ip,ip6,arp}_tables to use centralized error checking Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:00:36 -08:00
Holger Eitzenberger	f2ad52c9da	[NETFILTER]: Fix CID offset bug in PPTP NAT helper debug message The recent (kernel 2.6.15.1) fix for PPTP NAT helper introduced a bug - which only appears if DEBUGP is enabled though. The calculation of the CID offset into a PPTP request struct is not correct, so that at least not the correct CID is displayed if DEBUGP is enabled. This patch corrects CID offset calculation and introduces a #define for that. Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:58:21 -08:00
Harald Welte	dc808fe28d	[NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:56:32 -08:00
John Heffner	5d424d5a67	[TCP]: MTU probing Implementation of packetization layer path mtu discovery for TCP, based on the internet-draft currently found at <http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:53:41 -08:00
Adrian Bunk	d15150f755	[IPV4] fib_rules.c: make struct fib_rules static again struct fib_rules became global for no good reason. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:46:56 -08:00
Robert Olsson	7b204afd45	[IPV4]: Use RCU locking in fib_rules. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:18:53 -08:00
Patrick McHardy	31fe4d3317	[NETFILTER]: arp_tables: fix NULL pointer dereference The check is wrong and lets NULL-ptrs slip through since !IS_ERR(NULL) is true. Coverity #190 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:40:43 -08:00
Patrick McHardy	baa829d892	[IPV4/6]: Fix UFO error propagation When ufo_append_data fails err is uninitialized, but returned back. Strangely gcc doesn't notice it. Coverity #901 and #902 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:39:40 -08:00
Patrick McHardy	4a1ff6e2bd	[TCP]: tcp_highspeed: fix AIMD table out-of-bounds access Covertiy #547 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:39:39 -08:00
David S. Miller	ba244fe900	[TCP]: Fix tcp_tso_should_defer() when limit>=65536 That's >= a full sized TSO frame, so we should always return 0 in that case. Based upon a report and initial patch from Lachlan Andrew, final patch suggested by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-11 18:51:49 -08:00
Thomas Graf	850a9a4e3c	[NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption The size of the skb carrying the netlink message is not equivalent to the length of the actual netlink message due to padding. ip_queue matches the length of the payload against the original packet size to determine if packet mangling is desired, due to the above wrong assumption arbitary packets may not be mangled depening on their original size. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-07 14:56:12 -08:00
Patrick McHardy	bafac2a512	[NETFILTER]: Restore {ipt,ip6t,ebt}_LOG compatibility The nfnetlink_log infrastructure changes broke compatiblity of the LOG targets. They currently use whatever log backend was registered first, which means that if ipt_ULOG was loaded first, no messages will be printed to the ring buffer anymore. Restore compatiblity by using the old log functions by default and only use the nf_log backend if the user explicitly said so. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-27 13:04:17 -08:00
Herbert Xu	752c1f4c78	[IPSEC]: Kill post_input hook and do NAT-T in esp_input directly The only reason post_input exists at all is that it gives us the potential to adjust the checksums incrementally in future which we ought to do. However, after thinking about it for a bit we can adjust the checksums without using this post_input stuff at all. The crucial point is that only the inner-most NAT-T SA needs to be considered when adjusting checksums. What's more, the checksum adjustment comes down to a single u32 due to the linearity of IP checksums. We just happen to have a spare u32 lying around in our skb structure :) When ip_summed is set to CHECKSUM_NONE on input, the value of skb->csum is currently unused. All we have to do is to make that the checksum adjustment and voila, there goes all the post_input and decap structures! I've left in the decap data structures for now since it's intricately woven into the sec_path stuff. We can kill them later too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-27 13:00:40 -08:00
Herbert Xu	4bf05eceec	[IPSEC] esp: Kill unnecessary block and indentation We used to keep sg on the stack which is why the extra block was useful. We've long since stopped doing that so let's kill the block and save some indentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-27 13:00:01 -08:00
Herbert Xu	4da3089f2b	[IPSEC]: Use TOS when doing tunnel lookups We should use the TOS because it's one of the routing keys. It also means that we update the correct routing cache entry when PMTU occurs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-23 16:19:26 -08:00
Suresh Bhogavilli	8525987849	[IPV4]: Fix garbage collection of multipath route entries When garbage collecting route cache entries of multipath routes in rt_garbage_collect(), entries were deleted from the hash bucket 'i' while holding a spin lock on bucket 'k' resulting in a system hang. Delete entries, if any, from bucket 'k' instead. Signed-off-by: Suresh Bhogavilli <sbhogavilli@verisign.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-23 16:10:52 -08:00
Patrick McHardy	8e249f0881	[NETFILTER]: Fix outgoing redirects to loopback When redirecting an outgoing packet to loopback, it keeps the original conntrack reference and information from the outgoing path, which falsely triggers the check for DNAT on input and the dst_entry is released to trigger rerouting. ip_route_input refuses to route the packet because it has a local source address and it is dropped. Look at the packet itself to dermine if it was NATed. Also fix a missing inversion that causes unneccesary xfrm lookups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-19 22:29:47 -08:00
Patrick McHardy	bc6e14b6f0	[NETFILTER]: Fix NAT PMTUD problems ICMP errors are only SNATed when their source matches the source of the connection they are related to, otherwise the source address is not changed. This creates problems with ICMP frag. required messages originating from a router behind the NAT, if private IPs are used the packet has a good change of getting dropped on the path to its destination. Always NAT ICMP errors similar to the original connection. Based on report by Al Viro. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-19 22:26:40 -08:00
Yasuyuki Kozakai	7d3cdc6b55	[NETFILTER]: nf_conntrack: move registration of __nf_ct_attach Move registration of __nf_ct_attach to nf_conntrack_core to make it usable for IPv6 connection tracking as well. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-15 15:22:21 -08:00
Patrick McHardy	48d5cad87c	[XFRM]: Fix SNAT-related crash in xfrm4_output_finish When a packet matching an IPsec policy is SNATed so it doesn't match any policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish crash because of a NULL pointer dereference. This patch directs these packets to the original output path instead. Since the packets have already passed the POST_ROUTING hook, but need to start at the beginning of the original output path which includes another POST_ROUTING invocation, a flag is added to the IPCB to indicate that the packet was rerouted and doesn't need to pass the POST_ROUTING hook again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-15 15:10:22 -08:00
Patrick McHardy	ee68cea2c2	[NETFILTER]: Fix xfrm lookup after SNAT To find out if a packet needs to be handled by IPsec after SNAT, packets are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This breaks SNAT of non-unicast packets to non-local addresses because the packet is routed as incoming packet and no neighbour entry is bound to the dst_entry. In general, it seems to be a bad idea to replace the dst_entry after the packet was already sent to the output routine because its state might not match what's expected. This patch changes the xfrm lookup in POST_ROUTING to re-use the original dst_entry without routing the packet again. This means no policy routing can be used for transport mode transforms (which keep the original route) when packets are SNATed to match the policy, but it looks like the best we can do for now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-15 01:34:23 -08:00
Dave Jones	77decfc716	[IPV4] ICMP: Invert default for invalid icmp msgs sysctl isic can trigger these msgs to be spewed at a very high rate. There's already a sysctl to turn them off. Given these messages aren't useful for most people, this patch disables them by default. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-13 15:36:21 -08:00
John Heffner	6fcf9412de	[TCP]: rcvbuf lock when tcp_moderate_rcvbuf enabled The rcvbuf lock should probably be honored here. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-09 17:06:57 -08:00
Alexey Kuznetsov	28633514af	[NETLINK]: illegal use of pid in rtnetlink When a netlink message is not related to a netlink socket, it is issued by kernel socket with pid 0. Netlink "pid" has nothing to do with current->pid. I called it incorrectly, if it was named "port", the confusion would be avoided. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-09 16:43:41 -08:00
Al Viro	76edc6051e	[PATCH] ipv4 NULL noise removal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-02-07 20:57:37 -05:00
Al Viro	1b8623545b	[PATCH] remove bogus asm/bug.h includes. A bunch of asm/bug.h includes are both not needed (since it will get pulled anyway) and bogus (since they are done too early). Removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-02-07 20:56:35 -05:00
Linus Torvalds	98bd0c07b6	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2006-02-05 11:10:29 -08:00
Eric Dumazet	88a2a4ac6b	[PATCH] percpu data: only iterate over possible CPUs percpu_data blindly allocates bootmem memory to store NR_CPUS instances of cpudata, instead of allocating memory only for possible cpus. As a preparation for changing that, we need to convert various 0 -> NR_CPUS loops to use for_each_cpu(). (The above only applies to users of asm-generic/percpu.h. powerpc has gone it alone and is presently only allocating memory for present CPUs, so it's currently corrupting memory). Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <axboe@suse.de> Cc: Anton Blanchard <anton@samba.org> Acked-by: William Irwin <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-05 11:06:51 -08:00
Patrick McHardy	7918d212df	[NETFILTER]: Fix check whether dst_entry needs to be released after NAT After DNAT the original dst_entry needs to be released if present so the packet doesn't skip input routing with its new address. The current check for DNAT in ip_nat_in is reversed and checks for SNAT. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:29 -08:00
Patrick McHardy	0047c65a60	[NETFILTER]: Prepare {ipt,ip6t}_policy match for x_tables unification The IPv4 and IPv6 version of the policy match are identical besides address comparison and the data structure used for userspace communication. Unify the data structures to break compatiblity now (before it is released), so we can port it to x_tables in 2.6.17. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:28 -08:00
Patrick McHardy	e55f1bc5dc	[NETFILTER]: Check policy length in policy match strict mode Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:26 -08:00
Kirill Korotaev	ee4bb818ae	[NETFILTER]: Fix possible overflow in netfilters do_replace() netfilter's do_replace() can overflow on addition within SMP_ALIGN() and/or on multiplication by NR_CPUS, resulting in a buffer overflow on the copy_from_user(). In practice, the overflow on addition is triggerable on all systems, whereas the multiplication one might require much physical memory to be present due to the check above. Either is sufficient to overwrite arbitrary amounts of kernel memory. I really hate adding the same check to all 4 versions of do_replace(), but the code is duplicate... Found by Solar Designer during security audit of OpenVZ.org Signed-Off-By: Kirill Korotaev <dev@openvz.org> Signed-Off-By: Solar Designer <solar@openwall.com> Signed-off-by: Patrck McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:25 -08:00
Patrick McHardy	6f16930078	[NETFILTER]: Fix missing src port initialization in tftp expectation mask Reported by David Ahern <dahern@avaya.com>, netfilter bugzilla #426. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:21 -08:00
Patrick McHardy	ad2ad0f965	[NETFILTER]: Fix undersized skb allocation in ipt_ULOG/ebt_ulog/nfnetlink_log The skb allocated is always of size nlbufsize, even if that is smaller than the size needed for the current packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:19 -08:00
Holger Eitzenberger	c2db292438	[NETFILTER]: ULOG/nfnetlink_log: Use better default value for 'nlbufsiz' Performance tests showed that ULOG may fail on heavy loaded systems because of failed order-N allocations (N >= 1). The default value of 4096 is not optimal in the sense that it actually allocates _two_ contigous physical pages. Reasoning: ULOG uses alloc_skb(), which adds another ~300 bytes for skb_shared_info. This patch sets the default value to NLMSG_GOODSIZE and adds some documentation at the top. Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:18 -08:00
Pablo Neira Ayuso	34f9a2e4de	[NETFILTER]: ctnetlink: add MODULE_ALIAS for expectation subsystem Add load-on-demand support for expectation request. eg. conntrack -L expect Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:16 -08:00
Marcus Sundberg	b633ad5fbf	[NETFILTER]: ctnetlink: Fix subsystem used for expectation events The ctnetlink expectation events should use the NFNL_SUBSYS_CTNETLINK_EXP subsystem, not NFNL_SUBSYS_CTNETLINK. Signed-off-by: Marcus Sundberg <marcus@ingate.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:15 -08:00
Herbert Xu	fa60cf7f64	[ICMP]: Fix extra dst release when ip_options_echo fails When two ip_route_output_key lookups in icmp_send were combined I forgot to change the error path for ip_options_echo to not drop the dst reference since it now sits before the dst lookup. To fix it we simply jump past the ip_rt_put call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-04 23:51:14 -08:00
Horms	f00c401b9b	[IPV4]: Remove suprious use of goto out: in icmp_reply This seems to be an artifact of the follwoing commit in February '02. e7e173af42dbf37b1d946f9ee00219cb3b2bea6a In a nutshell, goto out and return actually do the same thing, and both are called in this function. This patch removes out. Signed-Off-By: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-02 17:03:18 -08:00
Herbert Xu	f8addb3215	[IPV4] multipath_wrandom: Fix softirq-unsafe spin lock usage The spin locks in multipath_wrandom may be obtained from either process context or softirq context depending on whether the packet is locally or remotely generated. Therefore we need to disable BH processing when taking these locks. This bug was found by Ingo's lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-02 16:59:16 -08:00
Sam Ravnborg	f9d9516db7	[NET]: Do not export inet_bind_bucket_create twice. inet_bind_bucket_create was exported twice. Keep the export in the file where inet_bind_bucket_create is defined. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-31 17:47:02 -08:00
Patrick McHardy	5d39a795bf	[IPV4]: Always set fl.proto in ip_route_newports ip_route_newports uses the struct flowi from the struct rtable returned by ip_route_connect for the new route lookup and just replaces the port numbers if they have changed. If an IPsec policy exists which doesn't match port 0 the struct flowi won't have the proto field set and no xfrm lookup is done for the changed ports. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-31 17:35:35 -08:00
Linus Torvalds	dd1c1853e2	Fix ipv4/igmp.c compile with gcc-4 and IP_MULTICAST Modern versions of gcc do not like case statements at the end of a block statement: you need at least an empty statement. Using just a "break;" is preferred for visual style. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-31 13:11:41 -08:00
Baruch Even	2c74088e41	[TCP] H-TCP: Fix accounting This fixes the accounting in H-TCP, the ccount variable is also adjusted a few lines above this one. This line was not supposed to be there and wasn't there in the patches originally submitted, the four patches submitted were merged to one and in that merge the bug was introduced. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-30 20:54:39 -08:00
Dave Jones	c5d90e0004	[IPV4] igmp: remove pointless printk This is easily triggerable by sending bogus packets, allowing a malicious user to flood remote logs. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-30 20:27:17 -08:00
Alan Cox	715b49ef2d	[PATCH] EDAC: atomic scrub operations EDAC requires a way to scrub memory if an ECC error is found and the chipset does not do the work automatically. That means rewriting memory locations atomically with respect to all CPUs _and_ bus masters. That means we can't use atomic_add(foo, 0) as it gets optimised for non-SMP This adds a function to include/asm-foo/atomic.h for the platforms currently supported which implements a scrub of a mapped block. It also adjusts a few other files include order where atomic.h is included before types.h as this now causes an error as atomic_scrub uses u32. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
David L Stevens	ad12583f46	[IPV4]: Fix multiple bugs in IGMPv3 1) fix "mld_marksources()" to a) send nothing when all queried sources are excluded b) send full exclude report when source queried sources are not excluded c) don't schedule a timer when there's nothing to report 2) fix "add_grec()" to send empty-source records when it should The original check doesn't account for a non-empty source list with all sources inactive; the new code keeps that short-circuit case, and also generates the group header with an empty list if needed. 3) fix mca_crcount decrement to be after add_grec(), which needs its original value 4) add/remove delete records and prevent current advertisements when an exclude-mode filter moves from "active" to "inactive" or vice versa based on new filter additions. Items 1-3 are just IPv4 versions of the IPv6 bugs found by Yan Zheng and fixed earlier. Item #4 is a related bug that affects exclude-mode change records only (but not queries) and also occurs in IPv6 (IPv6 version coming soon). Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-18 14:20:56 -08:00
Andrew Morton	dbd2915ce8	[IPV4]: RT_CACHE_STAT_INC() warning fix BUG: using smp_processor_id() in preemptible [00000001] code: rpc.statd/2408 And it _is_ a bug, but I guess we don't care enough to add preempt_disable(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 22:46:49 -08:00
Eric Dumazet	2f970d8357	[IPV4]: rt_cache_stat can be statically defined Using __get_cpu_var(obj) is slightly faster than per_cpu_ptr(obj, raw_smp_processor_id()). 1) Smaller code and memory use For static and small objects, DEFINE_PER_CPU(type, object) is preferred over a alloc_percpu() : Better and smaller code to access them, and no extra memory (storing the pointer, and the percpu array of pointers) x86_64 code before patch mov 1237577(%rip),%rax # ffffffff803e5990 <rt_cache_stat> not %rax # part of per_cpu machinery mov %gs:0x3c,%edx # get cpu number movslq %edx,%rdx # extend 32 bits cpu number to 64 bits mov (%rax,%rdx,8),%rax # get the pointer for this cpu incl 0x38(%rax) x86_64 code after patch mov $per_cpu__rt_cache_stat,%rdx mov %gs:0x48,%rax # get percpu data offset incl 0x38(%rax,%rdx,1) 2) False sharing avoidance for SMP : For a small NR_CPUS, the array of per cpu pointers allocated in alloc_percpu() can be <= 32 bytes. This let slab code gives a part of a cache line. If the other part of this 64 bytes (or 128 bytes) cache line is used by a mostly written object, we can have false sharing and expensive per_cpu_ptr() operations. Size of rt_cache_stat is 64 bytes, so this patch is not a danger of a too big increase of bss (in UP mode) or static per_cpu data for SMP (PERCPU_ENOUGH_ROOM is currently 32768 bytes) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 02:54:36 -08:00
David S. Miller	f09484ff87	[NETFILTER]: ip_conntrack_proto_gre.c needs linux/interrupt.h Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 02:42:02 -08:00
Yasuyuki Kozakai	6dd42af790	[NETFILTER] Makefile cleanup These are replaced with x_tables matches and no longer exist. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 02:38:56 -08:00
Benoit Boissinot	ccc91324a1	[NETFILTER] ip[6]t_policy: Fix compilation warnings ip[6]t_policy argument conversion slipped when merging with x_tables Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 02:26:34 -08:00
Linus Torvalds	caf5b04c82	x86: Work around compiler code generation bug with -Os Some versions of gcc generate incorrect code for the inet_check_attr() function, apparently due to a totally bogus index -> pointer comparison transformation. At least "gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)" from FC4 is affected, possibly others too. This changes the function subtly so that the buggy gcc transformation doesn't trigger. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-14 22:08:28 -08:00
Patrick McHardy	ee51b1b6ce	[XFRM]: IPsec tunnel wildcard address support When the source address of a tunnel is given as 0.0.0.0 do a routing lookup to get the real source address for the destination and fill that into the acquire message. This allows to specify policies like this: spdadd 172.16.128.13/32 172.16.0.0/20 any -P out ipsec esp/tunnel/0.0.0.0-x.x.x.x/require; spdadd 172.16.0.0/20 172.16.128.13/32 any -P in ipsec esp/tunnel/x.x.x.x-0.0.0.0/require; Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-13 14:34:36 -08:00
Harald Welte	2e4e6a17af	[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables This monster-patch tries to do the best job for unifying the data structures and backend interfaces for the three evil clones ip_tables, ip6_tables and arp_tables. In an ideal world we would never have allowed this kind of copy+paste programming... but well, our world isn't (yet?) ideal. o introduce a new x_tables module o {ip,arp,ip6}_tables depend on this x_tables module o registration functions for tables, matches and targets are only wrappers around x_tables provided functions o all matches/targets that are used from ip_tables and ip6_tables are now implemented as xt_FOOBAR.c files and provide module aliases to ipt_FOOBAR and ip6t_FOOBAR o header files for xt_matches are in include/linux/netfilter/, include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers around the xt_FOOBAR.h headers Based on this patchset we're going to further unify the code, gradually getting rid of all the layer 3 specific assumptions. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-12 14:06:43 -08:00
Randy Dunlap	4fc268d24c	[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-11 18:42:14 -08:00
Kris Katterjohn	8b3a70058b	[NET]: Remove more unneeded typecasts on *malloc() This removes more unneeded casts on the return value for kmalloc(), sock_kmalloc(), and vmalloc(). Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-11 16:32:14 -08:00
David S. Miller	a776809755	[NETFILTER]: ip_ct_proto_gre_fini() cannot be __exit It is invoked from failures paths of __init code. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-11 16:32:12 -08:00
Nicolas Kaiser	b8ab50bc55	netfilter: headers included twice Headers included twice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-01-11 02:04:35 +01:00
Patrick McHardy	babbdb1a18	[NETFILTER]: Fix timeout sysctls on big-endian 64bit architectures The connection tracking timeout variables are unsigned long, but proc_dointvec_jiffies is used with sizeof(unsigned int) in the sysctl tables. Since there is no proc_doulongvec_jiffies function, change the timeout variables to unsigned int. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:35 -08:00
Patrick McHardy	9d28026b7e	[NETFILTER]: Remove unused function from NAT protocol helpers ->print and ->print_range are not used (and apparently never were). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:34 -08:00
Patrick McHardy	c07bc1ffbd	[NETFILTER]: Fix return value confusion in PPTP NAT helper ip_nat_mangle_tcp_packet doesn't return NF_* values but 0/1 for failure/success. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:33 -08:00
Patrick McHardy	03b9feca89	[NETFILTER]: Fix another crash in ip_nat_pptp The PPTP NAT helper calculates the offset at which the packet needs to be mangled as difference between two pointers to the header. With non-linear skbs however the pointers may point to two seperate buffers on the stack and the calculation results in a wrong offset beeing used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:32 -08:00
Patrick McHardy	15db34702c	[NETFILTER]: Fix crash in ip_nat_pptp When an inbound PPTP_IN_CALL_REQUEST packet is received the PPTP NAT helper uses a NULL pointer in pointer arithmentic to calculate the offset in the packet which needs to be mangled and corrupts random memory or crashes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:30 -08:00
Patrick McHardy	bb94aa169e	[NETFILTER]: net/ipv[46]/netfilter.c cleanups Don't wrap entire file in #ifdef CONFIG_NETFILTER, remove a few unneccessary includes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:29 -08:00
Kris Katterjohn	d3f4a687f6	[NET]: Change memcmp(,,ETH_ALEN) to compare_ether_addr() This changes some memcmp(one,two,ETH_ALEN) to compare_ether_addr(one,two). Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 12:54:28 -08:00
Linus Torvalds	a457aa6c2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial	2006-01-09 17:06:53 -08:00
Adrian Bunk	93b1fae491	spelling: s/trough/through/ Additionally, one comment was reformulated by Joe Perches <joe@perches.com>. Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-01-10 00:13:33 +01:00
Arnaldo Carvalho de Melo	dff2c03534	[INET_DIAG]: Introduce sk_diag_fill To be called from inet_diag_get_exact, also rename inet_diag_fill to inet_csk_diag_fill, for consistency with inet_twsk_diag_fill. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:56:56 -08:00
Arnaldo Carvalho de Melo	c7d58aabdc	[INET_DIAG]: Introduce inet_twsk_diag_dump & inet_twsk_diag_fill To properly dump TIME_WAIT sockets and to reduce complexity a bit by having per socket class accessor routines. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:56:38 -08:00
Arnaldo Carvalho de Melo	4e852c0279	[INET_DIAG]: whitespace/simple cleanups Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:56:19 -08:00
Arnaldo Carvalho de Melo	7dbf075524	[INET_DIAG]: Use inet_twsk() with TIME_WAIT sockets The fields being accessed in inet_diag_dump are outside sock_common, the common part of struct sock and struct inet_timewait_sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:56:03 -08:00
Patrick McHardy	cfacb0577e	[IPV4]: ip_output.c needs xfrm.h This patch fixes a warning from my IPsec patches: CC net/ipv4/ip_output.o net/ipv4/ip_output.c: In function 'ip_finish_output': net/ipv4/ip_output.c:208: warning: implicit declaration of function 'xfrm4_output_finish' Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:16:28 -08:00
Kris Katterjohn	09a626600b	[NET]: Change some "if (x) BUG();" to "BUG_ON(x);" This changes some simple "if (x) BUG();" statements to "BUG_ON(x);" Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:16:18 -08:00
Patrick McHardy	2941a48631	[NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-09 14:16:03 -08:00
Adrian Bunk	97dc627fb3	[IPV4]: make ip_fragment() static Since there's no longer any external user of ip_fragment() we can make it static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 13:23:39 -08:00
Joe Kappus	da7bc6ee8e	[NETFILTER]: ip_conntrack_proto_sctp.c needs linux/interrupt.h Signed-off-by: Joe Kappus <joecool1029@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:41 -08:00
Patrick McHardy	e16a8f0b8c	[NETFILTER]: Add ipt_policy/ip6t_policy matches Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:38 -08:00
Patrick McHardy	eb9c7ebe69	[NETFILTER]: Handle NAT in IPsec policy checks Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi of the original packet from the conntrack information for IPsec policy checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:37 -08:00
Patrick McHardy	b59c270104	[NETFILTER]: Keep conntrack reference until IPsec policy checks are done Keep the conntrack reference until policy checks have been performed for IPsec NAT support. The reference needs to be dropped before a packet is queued to avoid having the conntrack module unloadable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:36 -08:00
Patrick McHardy	5c901daaea	[NETFILTER]: Redo policy lookups after NAT when neccessary When NAT changes the key used for the xfrm lookup it needs to be done again. If a new policy is returned in POST_ROUTING the packet needs to be passed to xfrm4_output_one manually after all hooks were called because POST_ROUTING is called with fixed okfn (ip_finish_output). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:35 -08:00
Patrick McHardy	4e8e9de7c2	[NETFILTER]: Use conntrack information to determine if packet was NATed Preparation for IPsec support for NAT: Use conntrack information instead of saving the saving and comparing the addresses to determine if a packet was NATed and needs to be rerouted to make it easier to extend the key. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:34 -08:00
Patrick McHardy	3e3850e989	[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder ip_route_me_harder doesn't use the port numbers of the xfrm lookup and uses ip_route_input for non-local addresses which doesn't do a xfrm lookup, ip6_route_me_harder doesn't do a xfrm lookup at all. Use xfrm_decode_session and do the lookup manually, make sure both only do the lookup if the packet hasn't been transformed already. Makeing sure the lookup only happens once needs a new field in the IP6CB, which exceeds the size of skb->cb. The size of skb->cb is increased to 48b. Apparently the IPv6 mobile extensions need some more room anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:33 -08:00
Patrick McHardy	8cdfab8a43	[IPV4]: reset IPCB flags when neccessary Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit function before the packet reenters IP. This is neccessary so the encapsulated packets are checked not to be oversized in xfrm4_output.c again. Reset all flags in sit when a packet changes its address family. Also remove some obsolete IPSKB flags. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:32 -08:00
Patrick McHardy	b05e106698	[IPV4/6]: Netfilter IPsec input hooks When the innermost transform uses transport mode the decapsulated packet is not visible to netfilter. Pass the packet through the PRE_ROUTING and LOCAL_IN hooks again before handing it to upper layer protocols to make netfilter-visibility symetrical to the output path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:31 -08:00
Patrick McHardy	16a6677fdf	[XFRM]: Netfilter IPsec output hooks Call netfilter hooks before IPsec transforms. Packets visit the FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode transform. Patch from Herbert Xu <herbert@gondor.apana.org.au>: Move the loop from dst_output into xfrm4_output/xfrm6_output since they're the only ones who need to it. xfrm{4,6}_output_one() processes the first SA all subsequent transport mode SAs and is called in a loop that calls the netfilter hooks between each two calls. In order to avoid the tail call issue, I've added the inline function nf_hook which is nf_hook_slow plus the empty list check. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:28 -08:00
Linus Torvalds	d8d8f6a4fd	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2006-01-06 15:24:28 -08:00
Alexey Dobriyan	76ab608d86	[NET]: Endian-annotate struct iphdr And fix trivial warnings that emerged. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-06 13:24:29 -08:00
Joe	3cbc4ab58f	[NETFILTER]: ipt_helper.c needs linux/interrupt.h From: Joe <joecool1029@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-06 13:15:11 -08:00
Sam Ravnborg	367cb70421	kbuild: un-stringnify KBUILD_MODNAME Now when kbuild passes KBUILD_MODNAME with "" do not __stringify it when used. Remove __stringnify for all users. This also fixes the output of: $ ls -l /sys/module/ drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia_core drwxr-xr-x 3 root root 0 2006-01-05 14:24 "processor" drwxr-xr-x 3 root root 0 2006-01-05 14:24 "psmouse" The quoting of the module names will be gone again. Thanks to GregKH + Kay Sievers for reproting this. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2006-01-06 21:17:50 +01:00
Kris Katterjohn	46f25dffba	[NET]: Change 1500 to ETH_DATA_LEN in some files These patches add the header linux/if_ether.h and change 1500 to ETH_DATA_LEN in some files. Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 16:48:56 -08:00
Andrew Morton	e924283bf9	[IPVS]: Another file needs linux/interrupt.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 16:48:55 -08:00
Yasuyuki Kozakai	e8eaedf2f8	[NETFILTER]: Use HOPLIMIT metric as TTL of TCP reset sent by REJECT HOPLIMIT metric is appropriate to TCP reset sent by REJECT target than hard-coded max TTL. Thanks to David S. Miller for hint. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:28:57 -08:00
Patrick McHardy	0ae2cfe7f3	[NETFILTER]: nf_conntrack_l3proto_ipv4.c needs net/route.h CC [M] net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c: In function 'ipv4_refrag': net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:198: error: dereferencing pointer to incomplete type make[3]: *** [net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o] Error 1 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:21:52 -08:00
Patrick McHardy	1bd9bef6f9	[NETFILTER]: Call POST_ROUTING hook before fragmentation Call POST_ROUTING hook before fragmentation to get rid of the okfn use in ip_refrag and save the useless fragmentation/defragmentation step when NAT is used. The patch introduces one user-visible change, the POSTROUTING chain in the mangle table gets entire packets, not fragments, which should simplify use of the MARK and CLASSIFY targets for queueing as a nice side-effect. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:59 -08:00
Patrick McHardy	abbcc73982	[NETFILTER]: Remove okfn usage in ip_vs_core.c okfn should only be used from different contexts to avoid deep call chains, i.e. by nf_queue. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:40 -08:00
Patrick McHardy	a9b305c4e5	[NETFILTER]: ctnetlink: Fix dumping of helper name Properly dump the helper name instead of internal kernel data. Based on patch by Marcus Sundberg <marcus@ingate.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:02 -08:00
Patrick McHardy	e7be6994ec	[NETFILTER]: Fix module_param types and permissions Fix netfilter module_param types and permissions. Also fix an off-by-one in the ipt_ULOG nlbufsiz < 128k check. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:19:46 -08:00
Pablo Neira Ayuso	c1d10adb4a	[NETFILTER]: Add ctnetlink port for nf_conntrack Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:19:05 -08:00
Pablo Neira Ayuso	205d67c7d9	[NETFILTER]: ctnetlink: remove unused variable Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:44 -08:00
Pablo Neira Ayuso	d4d6bb41e0	[NETFILTER]: ctnetlink: fix conntrack mark race Set conntrack mark before it is in hashes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:25 -08:00
Pablo Neira Ayuso	0368309cb4	[NETFILTER]: ctnetlink: ctnetlink_event cleanup Cleanup: Use 'else if' instead of a ugly 'goto' statement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:08 -08:00
Pablo Neira Ayuso	47116eb201	[NETFILTER]: ctnetlink: use u_int32_t instead of unsigned int Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:50 -08:00
Pablo Neira Ayuso	984955b3d7	[NETFILTER]: ctnetlink: propagate ctnetlink_dump_tuples_proto return value back Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:29 -08:00
Yasuyuki Kozakai	90c4656eb4	[NETFILTER]: ctnetlink: Add sanity checkings for ICMP Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:03 -08:00
Pablo Neira Ayuso	684f7b296c	[NETFILTER]: ctnetlink: remove bogus checks in ICMP protocol at dumping Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:16:41 -08:00
Adrian Bunk	4ffd2e4907	[IPVS]: Fix compilation Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:14:43 -08:00
Thomas Young	74cb879822	[TCP] tcp_vegas: Fix slow start Vegas' slow start was only adding one MSS per RTT rather than one for every ack. Slow start behavior should now match Reno. Signed-off-by: Thomas Young <tyo@ee.mu.oz.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:59:32 -08:00
Arnaldo Carvalho de Melo	f190055ff5	[IPVS]: Add missing include <linux/net.h> CC [M] net/ipv4/ipvs/ip_vs_conn.o /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In function 'ip_vs_conn_new': /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:606: warning: implicit declaration of function 'net_ratelimit' /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In function 'ip_vs_random_dropentry': /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:810: warning: implicit declaration of function 'net_random' Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 02:02:20 -02:00
Arnaldo Carvalho de Melo	80e40daa47	[TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected CC net/ipv4/tcp_ipv4.o /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/tcp_ipv4.c:665: warning: 'syn_flood_warning' defined but not used Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 01:58:06 -02:00
Stephen Hemminger	40efc6fa17	[TCP]: less inline's TCP inline usage cleanup: * get rid of inline in several places * replace __inline__ with inline where possible * move functions used in one file out of tcp.h * let compiler decide on used once cases On x86_64: text data bss dec hex filename 3594701 648348 567400 4810449 4966d1 vmlinux.orig 3593133 648580 567400 4809113 496199 vmlinux On sparc64: text data bss dec hex filename 2538278 406152 530392 3474822 350586 vmlinux.ORIG `2536382` 406384 530392 3473158 34ff06 vmlinux Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 16:03:49 -08:00
Stephen Hemminger	cd8787ab04	[IPV4] fib_trie: build fix Need this to fix build of fib_trie in net-2.6.16 (rebased) tree. The code needs the new inet_make_mask inline. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:38:34 -08:00
Roberto Nibali	4b5bdf5cc3	[IPVS]: Cleanup IP_VS_DBG statements. From: Roberto Nibali <ratz@drugphish.ch> The attached patch (against current -GIT) is a cleanup patch which does following: o lookup debug messages shifted back to 9 o added more informational value to flags and refcnt since those entries can be in multiple referenced structures o cleanup 80 char violation It's the prepatch to the session pool implementation and helps very much to debug and monitor important variables and structures regarding the threshold limitation and persistency without the thousands of lookup messages which noone is interested in. Signed-off-by: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:22:59 -08:00
Christoph Hellwig	b5e5fa5e09	[NET]: Add a dev_ioctl() fallback to sock_ioctl() Currently all network protocols need to call dev_ioctl as the default fallback in their ioctl implementations. This patch adds a fallback to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD. This way all the procotol ioctl handlers can be simplified and we don't need to export dev_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:18:33 -08:00
Arnaldo Carvalho de Melo	14c850212e	[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:21 -08:00
Eric Dumazet	90ddc4f047	[NET]: move struct proto_ops to const I noticed that some of 'struct proto_ops' used in the kernel may share a cache line used by locks or other heavily modified data. (default linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at least) This patch makes sure a 'struct proto_ops' can be declared as const, so that all cpus can share all parts of it without false sharing. This is not mandatory : a driver can still use a read/write structure if it needs to (and eventually a __read_mostly) I made a global stubstitute to change all existing occurences to make them const. This should reduce the possibility of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:15 -08:00
Robert Olsson	fd9662555c	[IPV4] fib_trie: Add credits. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:10 -08:00
Stephen Hemminger	9eb2d62719	[TCP] cubic: use Newton-Raphson Replace cube root algorithim with a faster version using Newton-Raphson. Surprisingly, doing the scaled div64_64 is faster than a true 64 bit division on 64 bit CPU's. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:09 -08:00
Stephen Hemminger	89b3d9aaf4	[TCP] cubic: precompute constants Revised version of patch to pre-compute values for TCP cubic. * d32,d64 replaced with descriptive names * cube_factor replaces srtt[scaled by count] / HZ * ((1 << (10+2BICTCP_HZ)) / bic_scale) beta_scale replaces 8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta); Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:08 -08:00
Arnaldo Carvalho de Melo	d83d8461f9	[IP_SOCKGLUE]: Remove most of the tcp specific calls As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo	a7f5e7f164	[INET]: Generalise tcp_v4_hash_connect Renaming it to inet_hash_connect, making it possible to ditch dccp_v4_hash_connect and share the same code with TCP instead. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:55 -08:00
Arnaldo Carvalho de Melo	6d6ee43e0b	[TWSK]: Introduce struct timewait_sock_ops So that we can share several timewait sockets related functions and make the timewait mini sockets infrastructure closer to the request mini sockets one. Next changesets will take advantage of this, moving more code out of TCP and DCCP v4 and v6 to common infrastructure. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:54 -08:00
Arnaldo Carvalho de Melo	0fa1a53e1f	[IPV6]: Introduce inet6_timewait_sock Out of tcp6_timewait_sock, that now is just an aggregation of inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct inet_timewait_sock, that is common to the IPv6 transport protocols that use timewait sockets, like DCCP and TCP. tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic code to find the IPv6 area in a timewait sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:47 -08:00
Roberto Nibali	f1f71e03b1	[IPVS]: remove dead code This patch removes dead code. I don't see the reason to keep this cruft around, besides cluttering the nice and functionally working code. Signed-off-by: Roberto Nibali <ratz@drugphish.ch> Signed-off-by: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:43 -08:00
Stephen Hemminger	65a45441d7	[UDP]: udp_checksum_init return value Since udp_checksum_init always returns 0 there is no point in having it return a value. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:42 -08:00
Herbert Xu	3305b80c21	[IP]: Simplify and consolidate MSG_PEEK error handling When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled it is left on the socket receive queue. This means that when we detect a checksum error we have to be careful when trying to free the packet as someone could have dequeued it in the time being. Currently this delicate logic is duplicated three times between UDPv4, UDPv6 and RAWv6. This patch moves them into a one place and simplifies the code somewhat. This is based on a suggestion by Eric Dumazet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:41 -08:00
Arnaldo Carvalho de Melo	af05dc9394	[ICSK]: Move v4_addr2sockaddr from TCP to icsk Renaming it to inet_csk_addr2sockaddr. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:39 -08:00
Arnaldo Carvalho de Melo	8292a17a39	[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:38 -08:00
Arnaldo Carvalho de Melo	ca304b6104	[IPV6]: Introduce inet6_rsk() And inet6_rsk_offset in inet_request_sock, for the same reasons as inet_sock's pinfo6 member. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:37 -08:00
Arnaldo Carvalho de Melo	c2977c2213	[ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:34 -08:00
Arnaldo Carvalho de Melo	971af18bbf	[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:33 -08:00
Herbert Xu	89cee8b1cb	[IPV4]: Safer reassembly Another spin of Herbert Xu's "safer ip reassembly" patch for 2.6.16. (The original patch is here: http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2 and my only contribution is to have tested it.) This patch (optionally) does additional checks before accepting IP fragments, which can greatly reduce the possibility of reassembling fragments which originated from different IP datagrams. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:31 -08:00
Eric Dumazet	3183606469	[NETFILTER] ip_tables: NUMA-aware allocation Part of a performance problem with ip_tables is that memory allocation is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch separate cache lines) Even with small iptables rules, the cost of this misplacement can be high on common workloads. Instead of using one vmalloc() area (located in the node of the iptables process), we now allocate an area for each possible CPU, using vmalloc_node() so that memory should be allocated in the CPU's node if possible. Port to arp_tables and ip6_tables by Harald Welte. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:29 -08:00
Stephen Hemminger	df3271f336	[TCP] BIC: CUBIC window growth (2.0) Replace existing BIC version 1.1 with new version 2.0. The main change is to replace the window growth function with a cubic function as described in: http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:28 -08:00
Stephen Hemminger	05d054503a	[TCP] BIC: spelling and whitespace Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:27 -08:00
Stephen Hemminger	018da8f44c	[TCP] BIC: remove low utilization code. The latest BICTCP patch at: http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm disables the low_utilization feature of BICTCP because it doesn't work in some cases. This patch removes it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:26 -08:00
Patrick McHardy	9e999993c7	[XFRM]: Handle DCCP in xfrm{4,6}_decode_session Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:03:46 -08:00
Patrick McHardy	0476f171af	[NETFILTER]: Fix NAT init order As noticed by Phil Oester, the GRE NAT protocol helper is initialized before the NAT core, which makes registration fail. Change the linking order to make NAT be initialized first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 13:53:09 -08:00
Herbert Xu	1542272a60	[GRE]: Fix hardware checksum modification The skb_postpull_rcsum introduced a bug to the checksum modification. Although the length pulled is offset bytes, the origin of the pulling is the GRE header, not the IP header. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-14 12:55:24 -08:00
Marcus Sundberg	2f9616d4c4	[NETFILTER]: ip_nat_tftp: Fix expectation NAT When a TFTP client is SNATed so that the port is also changed, the port is never changed back for the expected connection. Signed-off-by: Marcus Sundberg <marcus@ingate.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-12 15:02:48 -08:00
David S. Miller	dfb4b9dceb	[TCP] Vegas: timestamp before clone We have to store the congestion control timestamp on the SKB before we clone it, not after. Else we get no timestamping information at all. tcp_transmit_skb() has been reworked so that we can do the timestamp still in one spot, instead of at all the call sites. Problem discovered, and initial fix, from Tom Young <tyo@ee.unimelb.edu.au>. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-06 16:24:52 -08:00
Thomas Young	0d7bef600a	[TCP] Vegas: Remove extra call to tcp_vegas_rtt_calc Remove unneeded call to tcp_vegas_rtt_calc. The more accurate microsecond value has already been registered prior to calling tcp_vegas_cong_avoid. Signed-off-by: Thomas Young <tyo@ee.mu.oz.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-06 16:17:11 -08:00
Thomas Young	5b49561381	[TCP] Vegas: stop resetting rtt every ack Move the resetting of rtt measurements to inside the once per RTT block of code. Signed-off-by: Thomas Young <tyo@ee.mu.oz.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-06 16:16:34 -08:00
Patrick McHardy	2fdf1faa8e	[NETFILTER]: Don't use conntrack entry after dropping the reference Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:38:16 -08:00
Patrick McHardy	266c854348	[NETFILTER]: Fix unbalanced read_unlock_bh in ctnetlink NFA_NEST calls NFA_PUT which jumps to nfattr_failure if the skb has no room left. We call read_unlock_bh at nfattr_failure for the NFA_PUT inside the locked section, so move NFA_NEST inside the locked section too. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:37:33 -08:00
Patrick McHardy	a795756333	[NETFILTER]: Mark ctnetlink as EXPERIMENTAL Should have been marked EXPERIMENTAL from the beginning, as the current bunch of fixes show. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:36:25 -08:00
Patrick McHardy	0be7fa92ca	[NETFILTER]: Fix CTA_PROTO_NUM attribute size in ctnetlink CTA_PROTO_NUM is a u_int8_t. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:34:51 -08:00
Patrick McHardy	afe5c6bb03	[NETFILTER]: Fix ip_conntrack_flush abuse in ctnetlink ip_conntrack_flush() used to be part of ip_conntrack_cleanup(), which needs to drop _all_ references on module unload. Table flushed using ctnetlink just needs to clean the table and doesn't need to flush the event cache or wait for any references attached to skbs. Move everything but pure table flushing back to ip_conntrack_cleanup(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:33:50 -08:00
Pablo Neira Ayuso	8d1ca69984	[NETFILTER]: Fix incorrect argument to ip_nat_initialized() in ctnetlink ip_nat_initialized() takes enum ip_nat_manip_type as it's second argument, not a hook number. Noticed and initial patch by Marcus Sundberg <marcus@ingate.com>. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:32:14 -08:00
Herbert Xu	86c8f9d158	[IPV4] Fix EPROTONOSUPPORT error in inet_create There is a coding error in inet_create that causes it to always return ESOCKTNOSUPPORT. It should return EPROTONOSUPPORT when there are protocols registered for a given socket type but none of them match the requested protocol. This is based on a patch by Jayachandran C. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-02 20:43:26 -08:00
David Stevens	24c6927505	[IGMP]: workaround for IGMP v1/v2 bug From: David Stevens <dlstevens@us.ibm.com> As explained at: http://www.cs.ucsb.edu/~krishna/igmp_dos/ With IGMP version 1 and 2 it is possible to inject a unicast report to a client which will make it ignore multicast reports sent later by the router. The fix is to only accept the report if is was sent to a multicast or unicast address. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-02 20:32:59 -08:00
Thomas Graf	ea86575eaf	[NETLINK]: Fix processing of fib_lookup netlink messages The receive path for fib_lookup netlink messages is lacking sanity checks for header and payload and is thus vulnerable to malformed netlink messages causing illegal memory references. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:30:00 -08:00
Phil Oester	2a43c4af3f	[NETFILTER]: Fix recent match jiffies wrap mismatches Around jiffies wrap time (i.e. within first 5 mins after boot), recent match rules which contain both --seconds and --hitcount arguments experience false matches. This is because the last_pkts array is filled with zeros on creation, and when comparing 'now' to 0 (+ --seconds argument), time_before_eq thinks it has found a hit. Below patch adds a break if the packet value is zero. This has the unfortunate side effect of causing mismatches if a packet was received when jiffies really was equal to zero. The odds of that happening are slim compared to the problems caused by not adding the break however. Plus, the author used this same method just below, so it is "good enough". This fixes netfilter bugs #383 and #395. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:29:24 -08:00
Jozsef Kadlecsik	73f306024c	[NETFILTER]: Ignore ACKs ACKs on half open connections in TCP conntrack Mounting NFS file systems after a (warm) reboot could take a long time if firewalling and connection tracking was enabled. The reason is that the NFS clients tends to use the same ports (800 and counting down). Now on reboot, the server would still have a TCB for an existing TCP connection client:800 -> server:2049. The client sends a SYN from port 800 to server:2049, which elicits an ACK from the server. The firewall on the client drops the ACK because (from its point of view) the connection is still in half-open state, and it expects to see a SYNACK. The client will eventually time out after several minutes. The following patch corrects this, by accepting ACKs on half open connections as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:28:58 -08:00
Adrian Bunk	d127e94a5c	[NETFILTER] ipv4: small cleanups This patch contains the following cleanups: - make needlessly global code static - ip_conntrack_core.c: ip_conntrack_flush() -> ip_conntrack_flush(void) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:28:18 -08:00
Adrian Bunk	4b30b1c6a3	[IPV4]: make two functions static This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:27:20 -08:00
Arjan van de Ven	9b5b5cff9a	[NET]: Add const markers to various variables. the patch below marks various variables const in net/; the goal is to move them to the .rodata section so that they can't false-share cachelines with things that get written to, as well as potentially helping gcc a bit with optimisations. (these were found using a gcc patch to warn about such variables) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:21:38 -08:00
Mike Stroyan	18955cfcb2	[IPV4] tcp/route: Another look at hash table sizes The tcp_ehash hash table gets too big on systems with really big memory. It is worse on systems with pages larger than 4KB. It wastes memory that could be better used. It also makes the netstat command slow because reading /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table. The default value should not be larger for larger page sizes. It seems that the effect of page size is an unintended error dating back a long time. I also wonder if the default value really should be a larger fraction of memory for systems with more memory. While systems with really big ram can afford more space for hash tables, it is not clear to me that they benefit from increasing the allocation ratio for this table. The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and mm/page_alloc.c:alloc_large_system_hash. tcp_init calls alloc_large_system_hash passing parameters- bucketsize=sizeof(struct tcp_ehash_bucket) numentries=thash_entries scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT) limit=0 On i386, PAGE_SHIFT is 12 for a page size of 4K On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K The num_physpages test above makes the allocation take a larger fraction of the total memory on systems with larger memory. The threshold size for a i386 system is 512MB. For an ia64 system with 16KB pages the threshold is 2GB. For smaller memory systems- On i386, scale = (27 - 12) = 15 On ia64, scale = (27 - 14) = 13 For larger memory systems- On i386, scale = (25 - 12) = 13 On ia64, scale = (25 - 14) = 11 For the rest of this discussion, I'll just track the larger memory case. The default behavior has numentries=thash_entries=0, so the allocated size is determined by either scale or by the default limit of 1/16 of total memory. In alloc_large_system_hash- \| numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages; \| numentries += (1UL << (20 - PAGE_SHIFT)) - 1; \| numentries >>= 20 - PAGE_SHIFT; \| numentries <<= 20 - PAGE_SHIFT; At this point, numentries is pages for all of memory, rounded up to the nearest megabyte boundary. \| /* limit to 1 bucket per 2^scale bytes of low memory */ \| if (scale > PAGE_SHIFT) \| numentries >>= (scale - PAGE_SHIFT); \| else \| numentries <<= (PAGE_SHIFT - scale); On i386, numentries >>= (13 - 12), so numentries is 1/8196 of bytes of total memory. On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of bytes of total memory. \| log2qty = long_log2(numentries); \| \| do { \| size = bucketsize << log2qty; bucketsize is 16, so size is 16 times numentries, rounded down to a power of two. On i386, size is 1/512 of bytes of total memory. On ia64, size is 1/128 of bytes of total memory. For smaller systems the results are On i386, size is 1/2048 of bytes of total memory. On ia64, size is 1/512 of bytes of total memory. The large page effect can be removed by just replacing the use of PAGE_SHIFT with a constant of 12 in the calls to alloc_large_system_hash. That makes them more like the other uses of that function from fs/inode.c and fs/dcache.c Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:12:55 -08:00

... 2 3 4 5 6 ...

758 Commits