Commit Graph

231 Commits

Author SHA1 Message Date
Jesper Juhl
93765d8a43 [IPV4]: [3/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch changes the type of the local variable 'i' in 
raw_probe_proto_opt() from 'int' to 'unsigned int'. The only use of 'i' in 
this function is as a counter in a for() loop and subsequent index into 
the msg->msg_iov[] array.
Since 'i' is compared in a loop to the unsigned variable msg->msg_iovlen 
gcc -W generates this warning : 

net/ipv4/raw.c:340: warning: comparison between signed and unsigned

Changing 'i' to unsigned silences this warning and is safe since the array 
index can never be negative anyway, so unsigned int is the logical type to 
use for 'i' and also enables a larger msg_iov[] array (but I don't know if 
that will ever matter).

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:00:15 -07:00
Jesper Juhl
926d4b8122 [IPV4]: [2/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch gets rid of the following gcc -W warning in net/ipv4/raw.c :

net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false

Since 'len' is of type size_t it is unsigned and can thus never be <0, and 
since this is obvious from the function declaration just a few lines above 
I think it's ok to remove the pointless check for len<0.


Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:00:00 -07:00
Jesper Juhl
5418c6926f [IPV4]: [1/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch silences these two gcc -W warnings in net/ipv4/raw.c :

net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression
net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression

It doesn't change the behaviour of the code, simply writes the conditional 
expression with plain 'if()' syntax instead of '? :' , but since this 
breaks it into sepperate statements gcc no longer complains about having 
both a signed and unsigned value in the same conditional expression.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:59:45 -07:00
Thomas Graf
94df109a8c [PKT_SCHED]: noop/noqueue qdisc style cleanups
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:59:08 -07:00
Thomas Graf
f87a9c3ddf [PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary initializers,
and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:53 -07:00
Thomas Graf
321090e7a4 [PKT_SCHED]: Add and use prio2list() in the pfifo_fast qdisc
prio2list() returns the relevant sk_buff_head for the
band specified by the priority for a given skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:35 -07:00
Thomas Graf
821d24ae74 [PKT_SCHED]: Transform pfifo_fast to use generic queue management interface
Gives pfifo_fast a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:15 -07:00
Thomas Graf
6fc8e84f4c [PKT_SCHED]: Cleanup fifo qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:00 -07:00
Thomas Graf
aaae3013d1 [PKT_SCHED]: Transform fifo qdisc to use generic queue management interface
The simplicity of the fifo qdisc allows several qdisc operations to be
redirected to the relevant queue management function directly. Saves
a lot of code lines and gives the pfifo a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:57:42 -07:00
Herbert Xu
1e061ab2e5 [SCTP]: Replace spin_lock_irqsave with spin_lock_bh
This patch replaces the spin_lock_irqsave call on the receive queue
lock in SCTP with spin_lock_bh.  Despite the proliferation of
spin_lock_irqsave calls in this stack, it is only entered from the
IPv4/IPv6 stack and user space.  That is, it is never entered from
hardirq context.

The call in question is only called from recvmsg which means that
IRQs aren't disabled.  Therefore it is safe to replace it with
spin_lock_bh.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:56:42 -07:00
Herbert Xu
e0f9f8586a [IPV4/IPV6]: Replace spin_lock_irq with spin_lock_bh
In light of my recent patch to net/ipv4/udp.c that replaced the
spin_lock_irq calls on the receive queue lock with spin_lock_bh,
here is a similar patch for all other occurences of spin_lock_irq
on receive/error queue locks in IPv4 and IPv6.

In these stacks, we know that they can only be entered from user
or softirq context.  Therefore it's safe to disable BH only.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:56:18 -07:00
Jamal Hadi Salim
9ed19f339e [NETLINK]: Set correct pid for ioctl originating netlink events
This patch ensures that netlink events created as a result of programns
using ioctls (such as ifconfig, route etc) contains the correct PID of
those events.
 
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:51 -07:00
Jamal Hadi Salim
e431b8c004 [NETLINK]: Explicit typing
This patch converts "unsigned flags" to use more explict types like u16
instead and incrementally introduces NLMSG_NEW().
 
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:31 -07:00
Thomas Graf
58b82150da [DECNET]: Remove unnecessary initilization of unused variable entries
This patch was supposed to be part of the neighbour tables related
patchset but apparently got lost.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:02 -07:00
Herbert Xu
0603eac0d6 [IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification
This patch changes the format of the XFRM_MSG_DELSA and
XFRM_MSG_DELPOLICY notification so that the main message
sent is of the same format as that received by the kernel
if the original message was via netlink.  This also means
that we won't lose the byid information carried in km_event.

Since this user interface is introduced by Jamal's patch
we can still afford to change it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:54:36 -07:00
Jamal Hadi Salim
b6544c0b4c [NETLINK]: Correctly set NLM_F_MULTI without checking the pid
This patch rectifies some rtnetlink message builders that derive the
flags from the pid. It is now explicit like the other cases
which get it right. Also fixes half a dozen dumpers which did not
set NLM_F_MULTI at all.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:54:12 -07:00
Thomas Graf
1797754ea7 [NETLINK]: Introduce NLMSG_NEW macro to better handle netlink flags
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
a flags argument. NLMSG_PUT stays there for compatibility but now
calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
NLMSG_NEW_ANSWER which now also takes a flags argument.

Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
and fixes the two direct users of __nlmsg_put to either provide
the flags or use NLMSG_NEW(_ANSWER).

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:48 -07:00
Thomas Graf
af0d114176 [PKT_SCHED]: Logic simplifications and codingstyle/whitespace cleanups
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:29 -07:00
Thomas Graf
02f23f095f [PKT_SCHED]: Make dsmark use the new dumping macros
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:12 -07:00
Thomas Graf
758cc43c6d [PKT_SCHED]: Fix dsmark to apply changes consistent
Fixes dsmark to do all configuration sanity checks first and
only apply the changes if all of them can be applied without
any errors. Also fixes the weak sanity checks for DSMARK_VALUE
and DSMASK_MASK.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:52:54 -07:00
Thomas Graf
e386c6eb43 [NEIGH]: Fix use of uninitialized variable when trimming in neightbl_fill_parms
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:52:09 -07:00
Thomas Graf
4b6ea82dd1 [NETLINK]: Kill bogus NLMSG_SET_MULTIPART uses.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:51:43 -07:00
Thomas Graf
c7fb64db00 [NETLINK]: Neighbour table configuration and statistics via rtnetlink
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.

Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.

To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:50:55 -07:00
David S. Miller
e52c1f17e4 [NET]: Move sysctl_max_syn_backlog into request_sock.c
This fixes the CONFIG_INET=n build failure noticed
by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:49:40 -07:00
Arnaldo Carvalho de Melo
2ad69c55a2 [NET] rename struct tcp_listen_opt to struct listen_sock
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:48:55 -07:00
Arnaldo Carvalho de Melo
0e87506fcc [NET] Generalise tcp_listen_opt
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.

Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:59 -07:00
Arnaldo Carvalho de Melo
60236fdd08 [NET] Rename open_request to request_sock
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:

struct open_request  -> struct request_sock
tcp_openreq_alloc    -> reqsk_alloc
tcp_openreq_free     -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free

With this most of the infrastructure closely resembles a struct
sock methods subset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:21 -07:00
Arnaldo Carvalho de Melo
2e6599cb89 [NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.

Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:

->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
  a specific protocol

The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.

I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.

Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)

Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:46:52 -07:00
Jamal Hadi Salim
ee57eef99b [IPSEC] Use NLMSG_LENGTH in xfrm_exp_state_notify
Small fixup to use netlink macros instead of hardcoding.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:45:56 -07:00
Patrick McHardy
7d6dfe1f5b [IPSEC] Fix xfrm_state leaks in error path
Herbert Xu wrote:
> @@ -1254,6 +1326,7 @@ static int pfkey_add(struct sock *sk, st
>       if (IS_ERR(x))
>               return PTR_ERR(x);
>
> +     xfrm_state_hold(x);

This introduces a leak when xfrm_state_add()/xfrm_state_update()
fail. We hold two references (one from xfrm_state_alloc(), one
from xfrm_state_hold()), but only drop one. We need to take the
reference because the reference from xfrm_state_alloc() can
be dropped by __xfrm_state_delete(), so the fix is to drop both
references on error. Same problem in xfrm_user.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:45:31 -07:00
Herbert Xu
f60f6b8f70 [IPSEC] Use XFRM_MSG_* instead of XFRM_SAP_*
This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
The netlink interface is meant to map directly onto the underlying
xfrm subsystem.  Therefore rather than using a new independent
representation for the events we can simply use the existing ones
from xfrm_user.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:44:37 -07:00
Herbert Xu
e7443892f6 [IPSEC] Set byid for km_event in xfrm_get_policy
This patch fixes policy deletion in xfrm_user so that it sets
km_event.data.byid.  This puts xfrm_user on par with what af_key
does in this case.
   
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:44:18 -07:00
Herbert Xu
bf08867f91 [IPSEC] Turn km_event.data into a union
This patch turns km_event.data into a union.  This makes code that
uses it clearer.
  
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:44:00 -07:00
Herbert Xu
4f09f0bbc1 [IPSEC] Fix xfrm to pfkey SA state conversion
This patch adjusts the SA state conversion in af_key such that
XFRM_STATE_ERROR/XFRM_STATE_DEAD will be converted to SADB_STATE_DEAD
instead of SADB_STATE_DYING.

According to RFC 2367, SADB_STATE_DYING SAs can be turned into
mature ones through updating their lifetime settings.  Since SAs
which are in the states XFRM_STATE_ERROR/XFRM_STATE_DEAD cannot
be resurrected, this value is unsuitable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:43:43 -07:00
Herbert Xu
4666faab09 [IPSEC] Kill spurious hard expire messages
This patch ensures that the hard state/policy expire notifications are
only sent when the state/policy is successfully removed from their
respective tables.

As it is, it's possible for a state/policy to both expire through
reaching a hard limit, as well as being deleted by the user.

Note that this behaviour isn't actually forbidden by RFC 2367.
However, it is a quality of implementation issue.

As an added bonus, the restructuring in this patch will help
eventually in moving the expire notifications from softirq
context into process context, thus improving their reliability.

One important side-effect from this change is that SAs reaching
their hard byte/packet limits are now deleted immediately, just
like SAs that have reached their hard time limits.

Previously they were announced immediately but only deleted after
30 seconds.

This is bad because it prevents the system from issuing an ACQUIRE
command until the existing state was deleted by the user or expires
after the time is up.

In the scenario where the expire notification was lost this introduces
a 30 second delay into the system for no good reason.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:43:22 -07:00
Jamal Hadi Salim
26b15dad9f [IPSEC] Add complete xfrm event notification
Heres the final patch.
What this patch provides

- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
  and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:42:13 -07:00
Linus Torvalds
19fa95e9e9 Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6 2005-06-18 13:54:12 -07:00
Linus Torvalds
0e396ee43e Manual merge of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
This is a fixed-up version of the broken "upstream-2.6.13" branch, where
I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
the history is all good.
2005-06-18 11:42:35 -07:00
David Woodhouse
0107b3cf32 Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-18 08:36:46 +01:00
David S. Miller
bcfff0b471 [NETFILTER]: ipt_recent: last_pkts is an array of "unsigned long" not "u_int32_t"
This fixes various crashes on 64-bit when using this module.

Based upon a patch by Juergen Kreileder <jk@blackdown.de>.

Signed-off-by: David S. Miller <davem@davemloft.net>
ACKed-by: Patrick McHardy <kaber@trash.net>
2005-06-15 20:51:14 -07:00
Patrick McHardy
a96aca88ac [NETFILTER]: Advance seq-file position in exp_next_seq()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 18:27:13 -07:00
J. Simonetti
1c2fb7f93c [IPV4]: Sysctl configurable icmp error source address.
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.

In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.

The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:19:03 -07:00
Sridhar Samudrala
6a6ddb2a9c [SCTP] Fix incorrect setting of sk_bound_dev_if when binding/sending to a ipv6
link local address.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:13:05 -07:00
Neil Horman
cdac4e0774 [SCTP] Add support for ip_nonlocal_bind sysctl & IP_FREEBIND socket option
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:12:33 -07:00
Vladislav Yasevich
bca735bd0d [SCTP] Extend the info exported via /proc/net/sctp to support netstat for SCTP.
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:11:57 -07:00
Neil Horman
0fd9a65a76 [SCTP] Support SO_BINDTODEVICE socket option on incoming packets.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:11:24 -07:00
Vladislav Yasevich
4243cac1e7 [SCTP]: Fix bug in restart of peeled-off associations.
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:10:49 -07:00
Rmi Denis-Courmont
77bd91967a [IPv6] Don't generate temporary for TUN devices
Userland layer-2 tunneling devices allocated through the TUNTAP driver 
(drivers/net/tun.c) have a type of ARPHRD_NONE, and have no link-layer 
address. The kernel complains at regular interval when IPv6 Privacy 
extension are enabled because it can't find an hardware address :

Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00): 
cannot get EUI64 identifier; use random bytes.

IPv6 Privacy extensions should probably be disabled on that sort of 
device. They won't work anyway. If userland wants a more usual 
Ethernet-ish interface with usual IPv6 autoconfiguration, it will use a 
TAP device with an emulated link-layer  and a random hardware address 
rather than a TUN device.

As far as I could fine, TUN virtual device from TUNTAP is the very only 
sort of device using ARPHRD_NONE as kernel device type.

Signed-off-by: Rmi Denis-Courmont <rdenis@simphalempin.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:01:34 -07:00
YOSHIFUJI Hideaki
84427d5330 [IPV6]: Ensure to use icmpv6_socket in non-preemptive context.
We saw following trace several times:

|BUG: using smp_processor_id() in preemptible [00000001] code: httpd/30137
|caller is icmpv6_send+0x23/0x540
| [<c01ad63b>] smp_processor_id+0x9b/0xb8
| [<c02993e7>] icmpv6_send+0x23/0x540

This is because of icmpv6_socket, which is the only one user of
smp_processor_id() in icmpv6_send(), AFAIK.

Since it should be used in non-preemptive context,
let's defer the dereference after disabling preemption
(by icmpv6_xmit_lock()).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 14:59:44 -07:00
Ralf Baechle
979b6c135f [NET]: Move the netdev list to vger.kernel.org.
From: Ralf Baechle <ralf@linux-mips.org>

There are archives of the old list at http://oss.sgi.com/archives/netdev

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 14:30:40 -07:00
Randy Dunlap
6efd8455cf [IPV4]: Multipath modules need a license to prevent kernel tainting.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 14:29:06 -07:00
Andi Kleen
e7626486c3 [TCP]: Adjust TCP mem order check to new alloc_large_system_hash
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 14:24:52 -07:00
Thomas Graf
98e5640552 [PKT_SCHED]: Fix numeric comparison in meta ematch
This patch is brought to you by the department of applied stupidity.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:11:19 -07:00
Thomas Graf
e1e284a4bd [PKT_SCHED]: Dump classification result for basic classifier
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:11:02 -07:00
Thomas Graf
4890062960 [PKT_SCHED]: Allow socket attributes to be matched on via meta ematch
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:10:48 -07:00
Thomas Graf
b824979aec [PKT_SCHED]: Fix typo in NET_EMATCH_STACK help text
Spotted by Geert Uytterhoeven <geert@linux-m68k.org>.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:10:22 -07:00
Stephen Hemminger
e387660545 [NET]: Fix sysctl net.core.dev_weight
Changing the sysctl net.core.dev_weight has no effect because the weight
of the backlog devices is set during initialization and never changed.

This patch propagates any changes to the global value affected by sysctl
to the per-cpu devices. It is done every time the packet handler
function is run.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 14:56:01 -07:00
Stephen Hemminger
699a411451 [NET]: Allow controlling NAPI device weight with sysfs
Simple interface to allow changing network device scheduling weight
with sysfs. Please consider this for 2.6.12, since risk/impact is small.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 14:55:42 -07:00
Gabor Fekete
8181b8c1f3 [IPV6]: Update parm.link in ip6ip6_tnl_change()
Signed-off-by: Gabor Fekete <gfekete@cc.jyu.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 14:54:38 -07:00
David S. Miller
fa04ae5c09 [ETHTOOL]: Check correct pointer in ethtool_set_coalesce().
It was checking the "GET" function pointer instead of
the "SET" one.  Looks like a cut&paste error :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-06 15:07:19 -07:00
91bcc018f9 Automatic merge of /spare/repo/netdev-2.6 branch we18 2005-06-04 17:08:24 -04:00
Adrian Bunk
4fef0304ee [IPV6]: Kill export of fl6_sock_lookup.
There is no usage of this EXPORT_SYMBOL in the kernel.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-02 13:06:36 -07:00
Adrian Bunk
64a6c7aa38 [IPVS]: remove net/ipv4/ipvs/ip_vs_proto_icmp.c
ip_vs_proto_icmp.c was never finished.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-02 13:02:25 -07:00
David Woodhouse
1c3f45ab2f Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-02 16:39:11 +01:00
David Woodhouse
4bcff1b37e AUDIT: Fix user pointer deref thinko in sys_socketcall().
I cunningly put the audit call immediately after the 
copy_from_user().... but used the _userspace_ copy of the args still. 
Let's not do that.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-06-02 12:13:21 +01:00
Edgar E Iglesias
36839836e8 [IPSEC]: Fix esp_decap_data size verification in esp4.
Signed-off-by: Edgar E Iglesias <edgar@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 17:08:05 -07:00
Thomas Graf
08e9cd1fc5 [PKT_SCHED]: Disable dsmark debugging messages by default
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:17:28 -07:00
Thomas Graf
486b53e59c [PKT_SCHED]: make dsmark try using pfifo instead of noop while grafting
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:16:52 -07:00
Thomas Graf
0451eb074e [PKT_SCHED]: Fix dsmark to count ignored indices while walking
Unused indices which are ignored while walking must still
be counted to avoid dumping the same index twice.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:15:58 -07:00
Herbert Xu
208d89843b [IPV4]: Fix BUG() in 2.6.x, udp_poll(), fragments + CONFIG_HIGHMEM
Steven Hand <Steven.Hand@cl.cam.ac.uk> wrote:
> 
> Reconstructed forward trace: 
> 
>   net/ipv4/udp.c:1334   spin_lock_irq() 
>   net/ipv4/udp.c:1336   udp_checksum_complete() 
> net/core/skbuff.c:1069   skb_shinfo(skb)->nr_frags > 1
> net/core/skbuff.c:1086   kunmap_skb_frag()
> net/core/skbuff.h:1087   local_bh_enable()
> kernel/softirq.c:0140   WARN_ON(irqs_disabled());

The receive queue lock is never taken in IRQs (and should never be) so
we can simply substitute bh for irq.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-30 15:50:15 -07:00
Harald Welte
9bb7bc942d [NETFILTER]: Fix deadlock with ip_queue and tcp local input path.
When we have ip_queue being used from LOCAL_IN, then we end up with a
situation where the verdicts coming back from userspace traverse the TCP
input path from syscall context.  While this seems to work most of the
time, there's an ugly deadlock:

syscall context is interrupted by the timer interrupt.  When the timer
interrupt leaves, the timer softirq get's scheduled and calls
tcp_delack_timer() and alike.  They themselves do bh_lock_sock(sk),
which is already held from somewhere else -> boom.

I've now tested the suggested solution by Patrick McHardy and Herbert Xu to
simply use local_bh_{en,dis}able().

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-30 15:35:26 -07:00
David S. Miller
d1102b59ca [NET]: Use %lx for netdev->features sysfs formatting.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:28:25 -07:00
David S. Miller
6c94d3611b [IPV6]: Clear up user copy warning in flowlabel code.
We are intentionally ignoring the copy_to_user() value,
make it clear to the compiler too.

Noted by Jeff Garzik.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:28:01 -07:00
Jon Mason
69f6a0fafc [NET]: Add ethtool support for NETIF_F_HW_CSUM.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:27:24 -07:00
Pravin B. Shelar
37e20a66db [IPV4]: Kill MULTIPATHHOLDROUTE flag.
It cannot work properly, so just ignore it in drr
and rr multipath algorithms just like the random
multipath algorithm does.

Suggested by Herbert Xu.

Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:26:44 -07:00
Harald Welte
8f937c6099 [IPV4]: Primary and secondary addresses
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.

Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:23:46 -07:00
Stephen Hemminger
7ce54e3f42 [BRIDGE]: receive path optimization
This improves the bridge local receive path by avoiding going
through another softirq.  The bridge receive path is already being called
from a netif_receive_skb() there is no point in going through another
receiveq round trip.

Recursion is limited because bridge can never be a port of a bridge
so handle_bridge() always returns.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 14:16:48 -07:00
Stephen Hemminger
85967bb46d [BRIDGE]: prevent bad forwarding table updates
Avoid poisoning of the bridge forwarding table by frames that have been
dropped by filtering. This prevents spoofed source addresses on hostile
side of bridge from causing packet leakage, a small but possible security
risk.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 14:15:55 -07:00
Stephen Hemminger
81d35307dd [BRIDGE]: set features based on enslaved devices
Make features of the bridge pseudo-device be a subset of the underlying
devices.  Motivated by Xen and others who use bridging to do failover.

Signed-off-by: Catalin BOIE <catab at umrella.ro>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 14:15:17 -07:00
Stephen Hemminger
81e8157583 [BRIDGE]: make dev->features unsigned
The features field in netdevice is really a bitmask, and bitmask's should
be unsigned.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 14:14:35 -07:00
Stephen Hemminger
d8a33ac435 [BRIDGE]: features change notification
Resend of earlier patch (no changes) from Catalin used to provide
device feature change notification.

Signed-off-by: Catalin BOIE <catab at umbrella.ro>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 14:13:47 -07:00
1f15d69452 Automatic merge of /spare/repo/netdev-2.6 branch master 2005-05-27 22:07:02 -04:00
Alexey Dobriyan
c8b35d2a29 [TOKENRING]: net/802/tr.c: s/struct rif_cache_s/struct rif_cache/
"_s" suffix is certainly of hungarian origin.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:59:42 -07:00
Alexey Dobriyan
c6b3365391 [TOKENRING]: be'ify trh_hdr, trllc, rif_cache_s
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:59:05 -07:00
Hideaki YOSHIFUJI
92d63decc0 From: Kazunori Miyazawa <kazunori@miyazawa.org>
[XFRM] Call dst_check() with appropriate cookie

This fixes infinite loop issue with IPv6 tunnel mode.

Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org>
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:58:04 -07:00
Stephen Hemminger
0dca51d362 [PKT_SCHED] netem: allow random reordering (with fix)
Here is a fixed up version of the reorder feature of netem.
It is the same as the earlier patch plus with the bugfix from Julio merged in.
Has expected backwards compatibility behaviour.

Go ahead and merge this one, the TCP strangeness I was seeing was due
to the reordering bug, and previous version of TSO patch.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:55:48 -07:00
Stephen Hemminger
0f9f32ac65 [PKT_SCHED] netem: use only inner qdisc -- no private skbuff queue
Netem works better if there if packets are just queued in the inner discipline
rather than having a separate delayed queue. Change to use the dequeue/requeue
to peek like TBF does.

By doing this potential qlen problems with the old method are avoided. The problems
happened when the netem_run that moved packets from the inner discipline to the nested
discipline failed (because inner queue was full). This happened in dequeue, so the
effective qlen of the netem would be decreased (because of the drop), but there was
no way to keep the outer qdisc (caller of netem dequeue) in sync.

The problem window is still there since this patch doesn't address the issue of
requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue
should always work.  Long term correct fix is to implement qdisc->peek in all the qdisc's
to allow for this (needed by several other qdisc's as well).

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:55:01 -07:00
Stephen Hemminger
0afb51e728 [PKT_SCHED]: netem: reinsert for duplication
Handle duplication of packets in netem by re-inserting at top of qdisc tree.
This avoid problems with qlen accounting with nested qdisc. This recursion
requires no additional locking but will potentially increase stack depth.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:53:49 -07:00
Herbert Xu
180e425033 [IPV6]: Fix xfrm tunnel oops with large packets
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-23 13:11:07 -07:00
David S. Miller
314324121f [TCP]: Fix stretch ACK performance killer when doing ucopy.
When we are doing ucopy, we try to defer the ACK generation to
cleanup_rbuf().  This works most of the time very well, but if the
ucopy prequeue is large, this ACKing behavior kills performance.

With TSO, it is possible to fill the prequeue so large that by the
time the ACK is sent and gets back to the sender, most of the window
has emptied of data and performance suffers significantly.

This behavior does help in some cases, so we should think about
re-enabling this trick in the future, using some kind of limit in
order to avoid the bug case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-23 12:03:06 -07:00
Tommy S. Christensen
aa1c6a6f7f [NETLINK]: Defer socket destruction a bit
In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.

However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 13:07:32 -07:00
Tommy S. Christensen
68acc024ea [NETLINK]: Move broadcast skb_orphan to the skb_get path.
Cloned packets don't need the orphan call.

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 13:06:35 -07:00
Tommy S. Christensen
db61ecc335 [NETLINK]: Fix race with recvmsg().
This bug causes:

assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)

What's happening is that:

1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
   Note that the rmalloc is not returned at this point since the
   skb is still referenced.
3) The same skb is now sent to socket 2.

This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:46:59 -07:00
Herbert Xu
31c26852cb [IPSEC]: Verify key payload in verify_one_algo
We need to verify that the payload contains enough data so that
attach_one_algo can copy alg_key_len bits from the payload.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:39:49 -07:00
Herbert Xu
b9e9dead05 [IPSEC]: Fixed alg_key_len usage in attach_one_algo
The variable alg_key_len is in bits and not bytes.  The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:39:04 -07:00
David S. Miller
8be58932ca [NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags().
Just do an skb_orphan() and be done with it.
Based upon discussions with Herbert Xu on netdev.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:36:33 -07:00
Julian Anastasov
d9fa0f392b [IP_VS]: Remove extra __ip_vs_conn_put() for incoming ICMP.
Remove extra __ip_vs_conn_put for incoming ICMP in direct routing
mode. Mark de Vries reports that IPVS connections are not leaked anymore.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:29:59 -07:00
Christoph Hellwig
f81a0bffa1 [AF_UNIX]: Use lookup_create().
currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.

I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:26:43 -07:00
Herbert Xu
2fdba6b085 [IPV4/IPV6] Ensure all frag_list members have NULL sk
Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's.  The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter.  Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.

So let's go the other way and make this an invariant:

	For any skb on a frag_list, skb->sk must be NULL.

That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.

The above invariant is actually violated in the following patch
for a short duration inside ip_fragment.  This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-18 22:52:33 -07:00
Evgeniy Polyakov
d48102007d [XFRM]: skb_cow_data() does not set proper owner for new skbs.
It looks like skb_cow_data() does not set 
proper owner for newly created skb.

If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there 
might be a situation when we require recreating skb and 
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.

If above "analysis" is correct then attached patch fixes that.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-18 22:51:45 -07:00