Provide an API for formatting the remote peer address for printing without
exposing its internal structure. The address could be dynamic, so we
support a function call to get the address rather than reading it straight
out of a structure.
Test-plan:
Destructive testing (unplugging the network temporarily). Probably need
to rig a server where certain services aren't running, or that returns an
error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the details of how the RPC client stores remote peer addresses from
the RPC portmapper.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for retrieving the remote peer address without allowing
direct access to the rpc_xprt struct.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms. For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The previous patches removed the last user of RPC child tasks, so we can
remove support for child tasks from net/sunrpc/sched.c now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add comments for external functions, use modern function definition style,
and fix up dprintk formatting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure. This will allow the creation of
a clean API for plugging in different types of bind mechanisms.
This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding. A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).
Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on
Mobile Node.
The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home
Addresses for 2 purposes. 1) We need to check this on receipt of
Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home
Address(es) in source address selection (RFC3484 Section 5 Rule 4).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFA_F_NODAD flag, similar to IN6_IFF_NODAD in BSDs, is introduced
to skip DAD.
This flag should be set to Mobile IPv6 Home Address(es) on Mobile
Node because DAD would fail if we should perform DAD; our Home Agent
protects our Home Address(es).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We did not send appropriate IsRouter flag if the forwarding setting is
positive even value. Let's give 1/0 value to ndisc_send_na().
Also, existing users of ndisc_send_na() give 0/1 to override,
we can omit redundant operation in that function.
Bug hinted by Nicolas Dichtel <nicolas.dichtel@6wind.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not always need proxy NDP functionality even we
enable forwarding.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have sent NA with router flag from the node-wide forwarding
configuration. This is not appropriate for proxy NA, and it should be
set according to each proxy entry's configuration.
This is used by Mobile IPv6 home agent to support physical home link
in acting as a proxy router for mobile node which is not a router,
for example.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This aims at proxying router not updating neighbor cache entry for proxied
address when it receives NA because either the proxied node is off link or
it has already sent a NA to the proxied router.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Proxying router can't forward traffic sent to link-local address, so signal
the sender and discard the packet. This behavior is clarified by Mobile IPv6
specification (RFC3775) but might be required for all proxying router.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is required to respond to NDP messages sent directly to the "target"
unicast address. Proxying node (router) is required to handle such
messages. To achieve this, check if the packet in forwarding patch is
NDP message.
With this patch, the proxy neighbor entries are always looked up in
forwarding path. We may want to optimize further.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
When the master PPTP connection times out while still having unfullfilled
expectations (and a GRE keymap entry) associated with it, the keymap entry
is not destroyed.
Add a destroy callback to struct ip_conntrack_helper and use it to destroy
PPTP siblings when the master is destroyed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When destroying the GRE expectations without having seen the GRE connection
the keymap entry is not freed, leading to a memory leak and, in case of
a following call within the same session, failure during expectation setup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrectly used message types and call IDs:
- PPTP_IN_CALL_REQUEST (PAC->PNS) contains a PptpInCallRequest (icreq)
message and the PAC call ID
- PPTP_IN_CALL_REPLY (PNS->PAC) contains a PptpInCallReply (icack)
message and the PNS call ID
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For rejected calls the state is set to PPTP_CALL_NONE even for non-matching
call ids.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also make sure not to hand packets received in an invalid state to the
NAT helper since it will mangle the packet with invalid data.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also make sure not to pass undersized messages to the NAT helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated expectation handling in the NAT helper and simplify
the remains in the conntrack helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just the values are needed, not the memory locations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a few header definitions to match RFC2637. Most importantly the
PptpOutCallRequest header included an invalid padding field and a
size check was disabled because of this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculated sequence numbers are not used for anything.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call ID in reply packets is never changed, remove the code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack structure contains the call ID in host byte order for no
reason, get rid of back and forth conversions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the xt_compat_match/xt_compat_target into smaller type-safe functions
performing just one operation. Handle all alignment and size-related
conversions centrally in these function instead of requiring each module to
implement a full-blown conversion function. Replace ->compat callback by
->compat_from_user and ->compat_to_user callbacks, responsible for
converting just a single private structure.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparse "defined twice" warning
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix regression introduced by the incremental checksum patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On SMP environments the maximum number of conntracks can be overpassed
under heavy stress situations due to an existing race condition.
CPU A CPU B
atomic_read() ...
early_drop() ...
... atomic_read()
allocate conntrack allocate conntrack
atomic_inc() atomic_inc()
This patch moves the counter incrementation before the early drop stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the bits to dump the conntrack table and the ones to dump and
zero counters in a single piece of code. This patch does not change
the default behaviour if accounting is not enabled.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While standard_target has target->me == NULL, module_put() should be
called for it as for others, because there were try_module_get() before.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that IPv6 supports policy routing we need to reroute in NF_IP6_LOCAL_OUT
when the mark value changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The limit match reinitializes its state whenever the ruleset changes,
which means it will forget about previously used credits.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- remove debugging cruft
- remove printk for reallocation failures
- remove unused addition
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix whitespace error
- break lines at 80 characters
- reformat some expressions to be more readable
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fix some whitespace errors and use the NAT bits instead of deriving
the state manually.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill listhelp.h and use the list.h functions instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling of ipip and ip_gre ICMP error relaying is b0rken; it accesses
8bit field + 3 reserved octets as host-endian 32bit, does comparison,
subtraction and stuffs the result back. That breaks on big-endian.
Fixed, made endian-clean.
[ Note that this effected code is permanently commented out with
and ifdef, so this error couldn't actually cause problems for
anyone. -DaveM ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce methods which manipulate interesting congestion control
state such as pipe and rtt estimate. This is useful for people
wishing to monitor the variables of CCID and instrument the code
[perhaps using Kprobes]. Personally, I am a fan of
encapsulation---that justifies this change =D.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple losses occur in one RTT, the window should be halved
only once [a single "congestion event"]. This is now implemented,
although not perfectly. Slightly changed the interface for changing
the cwnd: pass hctx instead of dp. This is required in order to allow
for change_cwnd to be called from _init().
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate more sequence state on demand. Each time a packet is sent
out by CCID2, a record of it needs to be kept. This list of records
grows proportionally to cwnd. Previously, the length of this list was
hardcored and therefore the cwnd could only grow to this value (of
128). Now, records are allocated on demand as necessary---cwnd may
grow as it wishes. The exceptional case of when memory is not
available is not handled gracefully. Perhaps, cwnd should be capped
at that point.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the user to choose whether or not to enable CCID2 debugging via
Kconfig.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If not enough cwnd is available, tell the sender to check again as
soon as possible. This will increase CPU utilization (polling
frequently for cwnd) but will improve network performance. That is,
the sender will need to wait less before detecting the increase of
cwnd. A better architecture would be for the CCID to call-back (or
dequeue) from DCCP when it is able to transmit traffic -- not the
other way around as it currently occurs.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize the slow-start threshold to infinity. This way, upon connection
initiation, slow-start will be exited only upon a packet loss. This patch will
allow connections to quickly gain speed.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no
problem.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of unused variables in ackvector state.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is
defined as appears in the packet, therefore bit shifting is not
required. This fix allows CCID2 to correctly detect losses.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ackvector length calculation upon receiving an "ack-of-ack". This
patch avoids the ackvector from growing too large which causes it to
not be inserted into packets.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not affect either mss-sized connections (obviously) or
connections controlled by Nagle (because there is only one small
segment in flight).
The idea is to record the fact that a small segment arrives on a
connection, where one small segment has already been received and
still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains
receive buffer.
In other words, it is a "soft" each-2nd-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following cleanups:
- make the following needlessly global function static:
- socket.c: sctp_apply_peer_addr_params()
- add proper prototypes for the several global functions in
include/net/sctp/sctp.h
Note that this fixes wrong prototypes for the following functions:
- sctp_snmp_proc_exit()
- sctp_eps_proc_exit()
- sctp_assocs_proc_exit()
The latter was spotted by the GNU C compiler and reported
by David Woodhouse.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iproute2 doesn't provide the NLM_F_CREATE flag when adding addresses,
it is assumed to be implied. The existing code issues a check on
said flag when the modify operation fails (likely due to ENOENT)
before continueing to create it, this leads to a hard to predict
result, therefore the NLM_F_CREATE check is removed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same behaviour as IPv4, using IFF_UP is a no-no anyway.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replaces INET6_IFADDR_RTA_SPACE with a new function calculating
the total required message size for all address messages.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change some netfilter tunables to __read_mostly. Also fixed some
incorrect file reference comments while I was in there.
(this will be my last __read_mostly patch unless someone points out
something else that needs it)
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change some bridge sysctl tunables to __read_mostly.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Additionaly exports the following information when providing
the list of registered generic netlink families:
- protocol version
- header size
- maximum number of attributes
- list of available operations including
- id
- flags
- avaiability of policy and doit/dumpit function
libnl HEAD provides a utility to read this new information:
0x0010 nlctrl version 1
hdrsize 0 maxattr 6
op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT]
0x0011 NLBL_MGMT version 1
hdrsize 0 maxattr 0
op unknown (0x02) [DOIT]
op unknown (0x03) [DOIT]
....
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When changing a device by ifindex without including a IFLA_IFNAME
attribute, the ifname variable contains random garbage and is used
to change the device name.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that during the MIPv6 respin, some code which was originally
conditionally compiled around CONFIG_XFRM_ADVANCED was accidently left
in after the config option was removed.
This patch removes an extraneous pointer (xerr_idxp) which is no
longer needed.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose IPSEC modes output path to take an xfrm state as input param.
This makes it consistent with the input mode processing (which already
takes the xfrm state as a param).
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
needlock = 0, while socket is not locked at that moment. In order to avoid
this and similar issues in the future, use rcu for sk->sk_filter field read
protection.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
By passing a Linux-generated TSO packet straight back into Linux, Xen
becomes our first LRO user :) Unfortunately, there is at least one spot
in our stack that needs to be changed to cope with this.
The receive MSS estimate is computed from the raw packet size. This is
broken if the packet is GSO/LRO. Fortunately the real MSS can be found
in gso_size so we simply need to use that if it is non-zero.
Real LRO NICs should of course set the gso_size field in future.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typo introduced by myself which leads to freeing the skb
containing the netlink message when it should free the newly
allocated skb for the reply.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global struct simp_hash_info static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some missing include files to the NetLabel related header files.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Arnaldo Carvalho de Melo points out I should be using list_entry in case
the structure changes in future. Current code functions but is reliant
on position and requires type cast.
Noticed when doing this that I have one more variable than I needed so
removing that also.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NETFILTER_DEBUG is enabled, it shouldn't change the
actions of the filtering. The message about skb->dst being NULL
is commonly triggered by dhclient, so it is useless. Make sure all
messages end in newline.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do some simple optimization on the nf_bridge_pad() function
and don't use magic constants. Eliminate a double call and
the #ifdef'd code for CONFIG_BRIDGE_NETFILTER.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup and rearrangement for better style and clarity:
Split the function nf_bridge_maybe_copy_header into two pieces
Move copy portion out of line.
Use Ethernet header size macros.
Use header file to handle CONFIG_NETFILTER_BRIDGE differences
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/xt_CONNMARK.c: In function 'target':
net/netfilter/xt_CONNMARK.c:59: warning: implicit declaration of
function 'nf_conntrack_event_cache'
The warning is due to the following .config:
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_CONNTRACK_MARK=y
# CONFIG_IP_NF_CONNTRACK_EVENTS is not set
CONFIG_IP_NF_CONNTRACK_NETLINK=m
This change was introduced by:
http://www.kernel.org/git/?p=linux/kernel/git/davem/net-2.6.19.git;a=commit;h=76e4b41009b8a2e9dd246135cf43c7fe39553aa5
Proposed solution (based on the define in
include/net/netfilter/nf_conntrack_compat.h:
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ip6_pol_route_lookup(), when we finish backtracking at the
top-level root entry, we need to hold it.
Bug noticed by Mitsuru Chinen <CHINEN@jp.ibm.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts existing NLA_STRING attributes to use the new
validation features, saving a couple of temporary buffers.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces a new attribute type NLA_NUL_STRING to support NUL
terminated strings. Attributes of this kind require to carry
a terminating NUL within the maximum specified in the policy.
The `old' NLA_STRING which is not required to be NUL terminated
is extended to provide means to specify a maximum length of the
string.
Aims at easing the pain with using nla_strlcpy() on temporary
buffers.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also kills a warning while building ipv6:
net/ipv6/udp.c: In function ‘udp_v6_get_port’:
net/ipv6/udp.c:66: warning: passing argument 3 of ‘udp_get_port’ from incompatible pointer type
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates one common function which is called by
udp_v4_get_port() and udp_v6_get_port(). As a result,
* duplicated code is removed
* udp_port_rover and local port lookup can now be removed from udp.h
* further savings follow since the same function will be used by UDP-Litev4
and UDP-Litev6
In contrast to the patch sent in response to Yoshifujis comments
(fixed by this variant), the code below also removes the
EXPORT_SYMBOL(udp_port_rover), since udp_port_rover can now remain
local to net/ipv4/udp.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds transmit buffering to DCCP.
I have tested with CCID2/3 and with loss and rate limiting.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This shifts further sysctls into feat.h. No change in
functionality - shifting code only.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix mark comparison, also dump the mask to userspace when the mask is
zero, but the mark is not (in which case the mark is dumped, so the
mask is needed to make sense of it).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support masking the nfmark value before the search. The mask value is
global for all filters contained in one instance. It can only be set
when a new instance is created, all filters must specify the same mask.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fwmark masks. For compatibility a mask of 0xFFFFFFFF is used
when a mark value != 0 is sent without a mask.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a FRA_FWMASK attributes for fwmark masks. For compatibility a mask of
0xFFFFFFFF is used when a mark value != 0 is sent without a mask.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fwmark masks.
A mask of 0xFFFFFFFF is used when a mark value != 0 is sent without a mask.
Based on patch for net/ipv4/fib_rules.c by Patrick McHardy <kaber@trash.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should not be RTA_MAX+1 but FRA_MAX+1.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add missing nla_policy entry.
- type of fwmark is u32, not u8.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if we find an exact match in the hash table,
we must inspect the inexact list to look for a match
with a better priority.
Noticed by Masahide NAKAMURA <nakam@linux-ipv6.org>.
Signed-off-by: David S. Miller <davem@davemloft.net>
- Remove redundant code. Pointed out by Brian Haley <brian.haley@hp.com>.
- Unify code paths with/without CONFIG_IPV6_MIP.
- Use NIP6_FMT for IPv6 address textual presentation.
- Fold long line. Pointed out by David Miller <davem@davemloft.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
First of all, if the xfrm_state only gets used for input
packets this entropy is a complete waste.
Secondly, it is often the case that a configuration loads
many rules (perhaps even dynamically) and they don't all
necessarily ever get used.
This get_random_bytes() call was showing up in the profiles
for xfrm_state inserts which is how I noticed this.
Signed-off-by: David S. Miller <davem@davemloft.net>
This idea is from Alexey Kuznetsov.
It is common for policies to be non-prefixed. And for
that case we can optimize lookups, insert, etc. quite
a bit.
For each direction, we have a dynamically sized policy
hash table for non-prefixed policies. We also have a
hash table on policy->index.
For prefixed policies, we have a list per-direction which
we will consult on lookups when a non-prefix hashtable
lookup fails.
This still isn't as efficient as I would like it. There
are four immediate problems:
1) Lots of excessive refcounting, which can be fixed just
like xfrm_state was
2) We do 2 hash probes on insert, one to look for dups and
one to allocate a unique policy->index. Althought I wonder
how much this matters since xfrm_state inserts do up to
3 hash probes and that seems to perform fine.
3) xfrm_policy_insert() is very complex because of the priority
ordering and entry replacement logic.
4) Lots of counter bumping, in addition to policy refcounts,
in the form of xfrm_policy_count[]. This is merely used
to let code path(s) know that some IPSEC rules exist. So
this count is indexed per-direction, maybe that is overkill.
Signed-off-by: David S. Miller <davem@davemloft.net>
The source address is always non-prefixed so we should use
it to help give entropy to the bydst hash.
Signed-off-by: David S. Miller <davem@davemloft.net>
The refcounting done for timers and hash table insertions
are just wasted cycles. We can eliminate all of this
refcounting because:
1) The implicit refcount when the xfrm_state object is active
will always be held while the object is in the hash tables.
We never kfree() the xfrm_state until long after we've made
sure that it has been unhashed.
2) Timers are even easier. Once we mark that x->km.state as
anything other than XFRM_STATE_VALID (__xfrm_state_delete
sets it to XFRM_STATE_DEAD), any timer that fires will
do nothing and return without rearming the timer.
Therefore we can defer the del_timer calls until when the
object is about to be freed up during GC. We have to use
del_timer_sync() and defer it to GC because we can't do
a del_timer_sync() while holding x->lock which all callers
of __xfrm_state_delete hold.
This makes SA changes even more light-weight.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just let GC and other normal mechanisms take care of getting
rid of DST cache references to deleted xfrm_state objects
instead of walking all the policy bundles.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead, simply set all potentially aliasing existing xfrm_state
objects to have the current generation counter value.
This will make routes get relooked up the next time an existing
route mentioning these aliased xfrm_state objects gets used,
via xfrm_dst_check().
Signed-off-by: David S. Miller <davem@davemloft.net>
Each xfrm_state inserted gets a new generation counter
value. When a bundle is created, the xfrm_dst objects
get the current generation counter of the xfrm_state
they will attach to at dst->xfrm.
xfrm_bundle_ok() will return false if it sees an
xfrm_dst with a generation count different from the
generation count of the xfrm_state that dst points to.
This provides a facility by which to passively and
cheaply invalidate cached IPSEC routes during SA
database changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The grow algorithm is simple, we grow if:
1) we see a hash chain collision at insert, and
2) we haven't hit the hash size limit (currently 1*1024*1024 slots), and
3) the number of xfrm_state objects is > the current hash mask
All of this needs some tweaking.
Remove __initdata from "hashdist" so we can use it safely at run time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Support Mobile IPv6 extension headers sorting for two transformation policies.
Mobile IPv6 extension headers should be placed after IPsec
transport mode, but before transport AH when outbound.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add sort functions to combine templates/states for IPsec.
Think of outbound transformation order we should be careful with transport AH
which must be the last of all transport ones.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sub policy can be used through netlink socket.
PF_KEY uses main only and it is TODO to support sub.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under two transformation policies it is required to merge them.
This is a platform to sort state for outbound and templates
for inbound respectively.
It will be used when Mobile IPv6 and IPsec are used at the same time.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sub policy is introduced. Main and sub policy are applied the same flow.
(Policy that current kernel uses is named as main.)
It is required another transformation policy management to keep IPsec
and Mobile IPv6 lives separate.
Policy which lives shorter time in kernel should be a sub i.e. normally
main is for IPsec and sub is for Mobile IPv6.
(Such usage as two IPsec policies on different database can be used, too.)
Limitation or TODOs:
- Sub policy is not supported for per socket one (it is always inserted as main).
- Current kernel makes cached outbound with flowi to skip searching database.
However this patch makes it disabled only when "two policies are used and
the first matched one is bypass case" because neither flowi nor bundle
information knows about transformation template size.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Add Kconfig to support sub policy.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignore to report user-space for known mobility headers rejected by
destination options header transformation.
Mobile IPv6 specification (RFC3775) says that mobility header
is used with destination options header carrying home address option
only for binding update message. Other type message cannot be used
and node must drop it silently (and must not send binding error) if
receving such packet.
To achieve it, (1) application should use transformation policy and
wild-card states to catch binding update message prior other packets
(2) kernel doesn't report the reject to user-space not to send
binding error message by application.
This patch is for (2).
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report to user-space when home address option is rejected.
In receiving this message user-space application will send Mobile IPv6 binding
error. It is rate-limited by kernel.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM_MSG_REPORT is a message as notification of state protocol and
selector from kernel to user-space.
Mobile IPv6 will use it when inbound reject is occurred at route
optimization to make user-space know a binding error requirement.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Mobile IPv6 usage, it is required to trace which secpath state is
reject factor in order to notify it to user space (to know the address
which cannot be used route optimized communication).
Based on MIPL2 kernel patch.
This patch was also written by: Henrik Petander <petander@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transformation support mobility header.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mobility header is built by user-space and sent through raw socket.
Kernel just extracts its type to flow.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like ICMPv6, mobility header is handled through raw socket.
In inbound case, check only whether ICMPv6 error should be sent as a reply
or not by kernel.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
This patch was also written by: Antti Tuominen <anttit@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Mobile IPv6 route optimization protocols to netlink interface.
Route optimization states carry care-of address.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add destination options header transformation for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add routing header type 2 transformation for Mobile IPv6.
Based on MIPL2 kernel patch.
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mobile IPv6 defines home address option as an option of destination
options header. It is placed before fragment header then
ip6_find_1stfragopt() is fixed to know about it.
Home address option also carries final source address of the flow,
then outbound AH calculation should take care of it like routing
header case. Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 source address is replaced in receiving packet
with home address option carried by destination options header.
To send ICMPv6 error back, original address which is received one on wire
should be used. This function checks such header is included
and reverts them.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add inbound function of home address option by registering it to TLV
table for destination options header.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In receiving Mobile IPv6 home address option which is a TLV carried by
destination options header, kernel will try to mangle source adderss
of packet. Think of cloned skbuff it is required to replace it by the
parser just like routing header case.
This is a framework to achieve that to allow TLV parser to replace
inbound skbuff pointer.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a helper to search option offset from extension header which
can carry TLV option like destination options header.
Mobile IPv6 home address option will use it.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add socket option and ancillary data interface of routing header type
2. Mobile IPv6 application will use this to send binding
acknowledgement with the header without relation of confirmed route
optimization (binding).
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add inbound interface of routing header type 2 for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transformation user interface is not only for IPsec.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For outbound transformation, bundle is checked whether it is
suitable for current flow to be reused or not. In such IPv6 case
as below, transformation may apply incorrect bundle for the flow instead
of creating another bundle:
- The policy selector has destination prefix length < 128
(Two or more addresses can be matched it)
- Its bundle holds dst entry of default route whose prefix length < 128
(Previous traffic was used such route as next hop)
- The policy and the bundle were used a transport mode state and
this time flow address is not matched the bundled state.
This issue is found by Mobile IPv6 usage to protect mobility signaling
by IPsec, but it is not a Mobile IPv6 specific.
This patch adds strict check to xfrm_bundle_ok() for each
state mode and address when prefix length is less than 128.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch transformation state is updated last used time
for each sending. Xtime is used for it like other state lifetime
expiration.
Mobile IPv6 enabled nodes will want to know traffic status of each
binding (e.g. judgement to request binding refresh by correspondent node,
or to keep home/care-of nonce alive by mobile node).
The last used timestamp is an important hint about it.
Based on MIPL2 kernel patch.
This patch was also written by: Henrik Petander <petander@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Care-of address is carried by state as a transformation option like
IPsec encryption/authentication algorithm.
Based on MIPL2 kernel patch.
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
For originated outbound IPv6 packets which will fragment, ip6_append_data()
should know length of extension headers before sending them and
the length is carried by dst_entry.
IPv6 IPsec headers fragment then transformation was
designed to place all headers after fragment header.
OTOH Mobile IPv6 extension headers do not fragment then
it is a good idea to make dst_entry have non-fragment length to tell it
to ip6_append_data().
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Outbound transformation replaces both source and destination address with
state's end-point addresses at the same time when IPsec tunnel mode.
It is also required to change them for Mobile IPv6 route optimization, but we
should care about the following differences:
- changing result is not end-point but care-of address
- either source or destination is replaced for each state
This hook is a common platform to change outbound address.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On current kernel inbound transformation state is allowed transport and
disallowed tunnel mode when mismatch is occurred between tempates and states.
As the result of adding two more modes by Mobile IPv6, this function name
is misleading. Inbound transformation can allow only transport mode
when mismatch is occurred between template and secpath.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM_STATE_WILDRECV flag is introduced; the last resort state is set
it and receives packet which is not route optimized but uses such
extension headers i.e. Mobile IPv6 signaling (binding update and
acknowledgement). A node enabled Mobile IPv6 adds the state.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Mobile IPv6 usage, routing header or destination options header is
used and it doesn't require this comparison. It is checked only for
IPsec template.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Route optimization is used with routing header and destination options
header for Mobile IPv6.
At outbound it makes header space like IPsec transport. At inbound it
does nothing because exhdrs.c functions have responsibility to update
skbuff information for these headers.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On current kernel, ip6_find_1stfragopt() is used by IPv6 IPsec to find
offset to be inserted header in outbound for transport mode. (BTW, no
usage may be needed for IPv4 case.) Mobile IPv6 requires another
logic for routing header and destination options header
respectively. This patch is common platform for the offset and adopts
it to IPsec.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a support to search transformation states by its addresses
by using source address list for Mobile IPv6 usage.
To use it from user-space, it is also added a message type for
source address as a xfrm state option.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support source address based searching.
Mobile IPv6 will use it.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be added two more transformation protocols (routing header
and destination options header) for Mobile IPv6.
xfrm_id_proto_match() can be handle zero as all, IPSEC_PROTO_ANY as
all IPsec and otherwise as exact one.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put the helper to header for future use.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags into
RT6_LOOKUP_F_xxx flags, and put them into ip6_route.h
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi
Signed-off-by: David S. Miller <davem@davemloft.net>
This is for developers only.
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even on RTN_ROOT node, we need to process its subtree first.
Fix NULL pointer dereference in fib6_locate().
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split up function for finding routes for redirects.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP sysctl entries are displayed in milliseconds, but stored
internally in jiffies. This results in multiple levels of msecs to
jiffies conversion and as a result produces a truncation error. This
patch makes things consistent in that we store and display defaults
in milliseconds and only convert once for use by association.
This patch also adds some sane min/max values so that we don't go off
the deep end.
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- remove unused define
- remove useless wrapper function
- use new line for expression after condition
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The matches are identical besides one looking for NEXTHDR_HOP, the other
for NEXTHDR_DEST. Remove ip6t_dst.c and handle both in ip6t_hbh.c.
Signed-off-by: Patrick McHardy <kaber@trash,net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size is verified by x_tables and isn't needed by the modules anymore.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace open coded checksum update by nf_csum_update calls and clean up
the surrounding code a bit.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPCT_HELPER and IPCT_NATINFO bits are never set on updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses nfnetlink_has_listeners to check for listeners in
userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctnetlink dumps the mark iif the event mark happened
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the mark event. ctnetlink can use this to know if
the mark needs to be dumped.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel De Graaf <danield@iastate.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces IPv4 DSCP target by address family independent version.
This also
- utilizes dsfield.h to get/mangle DS field in IPv4/IPv6 header
- fixes Kconfig help text.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces IPv4 dscp match by address family independent version.
This also
- utilizes dsfield.h to get the DS field in IPv4/IPv6 header, and
- checks for the DSCP value from user space.
- fixes Kconfig help text.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to do multiple dereferences - sk->sk_socket->file->f_flags
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During accept/peeloff we try to copy the list of bound addresses from
the original endpoint to the new one. However, we forgot to set the flag
to say that IPv6 is allowed on the new endpoint.
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the "nomem" conditions that may occur during the
processing by the state machine functions. In most cases we delay adding
side-effect commands until all memory allocations are done.
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds more statistics info under /proc/net/sctp/snmp
that should be useful for debugging. The additional events that
are counted now include timer expirations, retransmits, packet
and data chunk discards.
The Data chunk discards include all the cases where a data chunk
is discarded including high tsn, bad stream, dup tsn and the most
useful one(out of receive buffer/rwnd).
Also moved the SCTP MIB data structures from the generic include
directories to include/sctp/sctp.h.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes various unvalidated netlink attributes causing memory
corruptions when left empty by userspace applications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replaces the struct in6_rtmsg based interface orignating from
the ioctl interface with a struct fib6_config based on. Allows
changing the interface without breaking the ioctl interface
and avoids passing on tons of parameters.
The recently introduced struct nl_info is used to pass on
netlink authorship information for notifications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a simple ip6_ins_rt() for the majority of users and
an alternative for the exception via netlink. Avoids code
obfuscation.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a simple ip6_del_rt() for the majority of users and
an alternative for the exception via netlink. Avoids code
obfuscation.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was simply making templates of functions and mostly causing a lot
of code duplication in the classifier action modules.
We solve this more cleanly by having a common "struct tcf_common" that
hash worker functions contained once in act_api.c can work with.
Callers work with real action objects that have the common struct
plus their module specific struct members. You go from a common
object to the higher level one using a "to_foo()" macro which makes
use of container_of() to do the dirty work.
This also kills off act_generic.h which was only used by act_simple.c
and keeping it around was more work than the it's value.
Signed-off-by: David S. Miller <davem@davemloft.net>
Run ethernet support through Lindent and fix up.
Applies after docbook comments patch
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add docbook style comments to ethernet support.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several symbols only used by rtnetlink and since it can
not be a module, there is no reason to export them.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes various unvalidated netlink attributes causing memory
corruptions when left empty by userspace applications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces struct fib_config replacing the ugly struct kern_rta
prone to ordering issues. Avoids creating faked netlink messages
for auto generated routes or requests via ioctl.
A new interface net/nexthop.h is added to help navigate through
nexthop configuration arrays.
A new struct nl_info will be used to carry the necessary netlink
information to be used for notifications later on.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly.
Couldn't actually measure any performance increase while testing (.3%
I consider noise), but seems like the right thing to do.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for NLM_F_ECHO allowing applications to easly
see which address have been deleted, added, or promoted.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for NLM_F_ECHO to simplify the process of identifying
inserted rules with an auto generated priority.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds rtnl_notify() to send rtnetlink notification messages and
rtnl_set_sk_err() to report notification errors as socket
errors in order to indicate the need of a resync due to loss
of events.
nlmsg_report() is added to properly document the meaning of
NLM_F_ECHO.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds nlmsg_notify() implementing proper notification logic. The
message is multicasted to all listeners in the group. The
applications the requests orignates from can request a unicast
back report in which case said socket will be excluded from the
multicast to avoid duplicated notifications.
nlmsg_multicast() is extended to take allocation flags to
allow notification in atomic contexts.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code is wrong on so many levels, please lose it so it isn't
replicated anywhere else.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- make the following needlessly global functions static:
- dn_fib.c: dn_fib_sync_down()
- dn_fib.c: dn_fib_sync_up()
- dn_rules.c: dn_fib_rule_action()
- remove the following unneeded prototype:
- dn_fib.c: dn_cache_dump()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the following needlessly global code static:
- fib6_walker_lock
- struct fib6_walker_list
- fib6_walk_continue()
- fib6_walk()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a fix to the DECnet rules compare function where we used 32bit
values rather than 16bit values. Spotted by Patrick McHardy.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a fix to Patrick McHardy's increase number of routing tables
patch for DECnet. I did just test this and it appears to be working
fine with this patch.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some improvements to robust name interface. These API's are safe
now by convention, but it is worth providing some safety checks
against future bugs.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add code to initialize rb tree nodes, and check for double deletion.
This is not a real fix, but I can make it trap sometimes and may
be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use hlist instead of list for the hash list. This saves
space, and we can check for double delete better.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code was a mess in terms of indentation. Run through Lindent
script, and cleanup the damage. Also, don't use, vim magic
comment, and substitute inline for __inline__.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the conditional compilation around HTB_HYSTERSIS
since code was splitting mid expression.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the macro's being used to obscure the locking.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HTB network scheduler had debug code that wouldn't compile
and confused and obfuscated the code, remove it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the number of possible routing tables to 2^32 by replacing the
fixed sized array of pointers by a hash table and replacing iterations
over all possible table IDs by hash table walking.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase number of possible routing tables to 2^32 by replacing iterations
over all possible table IDs by hash table walking.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the number of possible routing tables to 2^32 by replacing the
fixed sized array of pointers by a hash table and replacing iterations
over all possible table IDs by hash table walking.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce RTA_TABLE route attribute and FRA_TABLE routing rule attribute
to hold 32 bit routing table IDs. Usespace compatibility is provided by
continuing to accept and send the rtm_table field, but because of its
limited size it can only carry the low 8 bits of the table ID. This
implies that if larger IDs are used, _all_ userspace programs using them
need to use RTA_TABLE.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of
support for a larger number of routing tables. net/ipv6 already uses u32
everywhere and needs no further changes. No functional changes are made by
this patch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
config.h is automatically included by kbuild these days.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_register() doesn't change the family, so the protocols can
define it read-only. No caller ever checks return value from
sock_unregister()
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the gross custom locking done in socket code for net_family[]
with simple RCU usage. Some reordering necessary to avoid sleep issues
with sock_alloc.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make socket.c conform to current style:
* run through Lindent
* get rid of unneeded casts
* split assignment and comparsion where possible
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per Stephen Hemminger's recent patch to ipv4/fib_semantics.c this
is the same change but for DECnet.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the DECnet rules code to use the generic
rules system created by Thomas Graf <tgraf@suug.ch>.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now most inet_lookup_* functions take a host-order hnum instead
of a network-order dport because that's how it is represented
internally.
This means that users of these functions have to be careful about
using the right byte-order. To add more confusion, inet_lookup takes
a network-order dport unlike all other functions.
So this patch changes all visible inet_lookup functions to take a
dport and move all dport->hnum conversion inside them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ther is no point in using a more expensive reader/writer lock
for a low contention lock like the fib_info_lock. The only
reader case is in handling route redirects.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The callers of fib6_rule_lookup don't expect it to return NULL,
therefore it must return ip6_null_entry whenever fib_rule_lookup fails.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
By modern standards this function is way too big to be inlined. It's
even bigger than __inet_lookup_listener :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value is_setbyuser from struct ip_options is never used and set
only one time (http://linux-net.osdl.org/index.php/TODO#IPV4).
This little patch removes it from the kernel source.
Signed-off-by: Louis Nyffenegger <louis.nyffenegger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements wrapper functions that provide a convenient way
to access the sockets API for in-kernel users like sunrpc, cifs &
ocfs2 etc and any future users.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnetlink_rcv_msg() is not longer required to parse attributes
for the neighbour tables layer, remove dependency on obsolete and
buggy rta_buf.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves netlink neighbour bits to linux/neighbour.h. Also
moves bits to be exported to userspace from net/neighbour.h
to linux/neighbour.h and removes __KERNEL__ guards, userspace
is not supposed to be using it.
rtnetlink_rcv_msg() is not longer required to parse attributes
for the neighbour layer, remove dependency on obsolete and
buggy rta_buf.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes:
Return EAFNOSUPPORT if no table matches the specified
address family.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes:
Return ENOENT if the neighbour is not found (was EINVAL)
Return EAFNOSUPPORT if no table matches the specified
address family.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the lookup in a table returns ip6_null_entry the policy routing lookup
returns it instead of continuing in the next table, which effectively means
it only searches the local table.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_null_entry doesn't have rt6i_table set, when trying to delete it the
kernel crashes dereferencing table->tb6_lock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This looks like a mistake, the table ID is overwritten again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle GSO packets in nf_queue by segmenting them before queueing to
avoid breaking GSO in case they get mangled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update hardware checksums incrementally to avoid breaking GSO.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix compile breakage caused by move of IFA_F_SECONDARY to new header
file.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transforms netlink code to dump link tables to use the new
netlink api. Makes rtnl_getlink() available regardless of the
availability of the wireless extensions.
Adding copy_rtnl_link_stats() avoids the structural dependency
of struct rtnl_link_stats on struct net_device_stats and thus
avoids troubles later on.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transforms do_setlink() into rtnl_setlink() using the new
netlink api. A warning message printed to the console is
added in the event that a change request fails while part
of the change request has been comitted already. The ioctl()
based nature of net devices makes it almost impossible to
move on to atomic netlink operations without obsoleting
some of the functionality.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes various unvalidated netlink attributes causing
memory corruptions when left empty by userspace.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds rtm_to_ifaddr() transforming a netlink message to a
struct in_ifaddr. Fixes various unvalidated netlink attributes
causing memory corruptions when left empty by userspace
applications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes a theoretical memory and locking leak when the size of
the netlink header would exceed the skb tailroom.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds:
nlmsg_get_pos() return current position in message
nlmsg_trim() trim part of message
nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr
nla_put_nohdr(skb, len, data) add attribute w/o hdr
nla_find_nested() find attribute in nested attributes
Fixes nlmsg_new() to take allocation flags and consider size.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for policy routing rules including a new
local table for routes with a local destination.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds the framework to support multiple IPv6 routing tables.
Currently all automatically generated routes are put into the
same table. This could be changed at a later point after
considering the produced locking overhead.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Ab)using rt6_lock wouldn't work anymore if rt6_lock is
converted into a per table lock.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the net/Kconfig file to enable selecting the NetLabel Kconfig
options.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NetLabel support to the SELinux LSM and modify the
socket_post_create() LSM hook to return an error code. The most
significant part of this patch is the addition of NetLabel hooks into
the following SELinux LSM hooks:
* selinux_file_permission()
* selinux_socket_sendmsg()
* selinux_socket_post_create()
* selinux_socket_sock_rcv_skb()
* selinux_socket_getpeersec_stream()
* selinux_socket_getpeersec_dgram()
* selinux_sock_graft()
* selinux_inet_conn_request()
The basic reasoning behind this patch is that outgoing packets are
"NetLabel'd" by labeling their socket and the NetLabel security
attributes are checked via the additional hook in
selinux_socket_sock_rcv_skb(). NetLabel itself is only a labeling
mechanism, similar to filesystem extended attributes, it is up to the
SELinux enforcement mechanism to perform the actual access checks.
In addition to the changes outlined above this patch also includes
some changes to the extended bitmap (ebitmap) and multi-level security
(mls) code to import and export SELinux TE/MLS attributes into and out
of NetLabel.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add CIPSO/IPv4 and unlabeled packet management to the NetLabel
subsystem. The CIPSO/IPv4 changes allow the configuration of
CIPSO/IPv4 within the overall NetLabel framework. The unlabeled
packet changes allows NetLabel to pass unlabeled packets without
error.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new kernel subsystem, NetLabel, to provide explicit packet
labeling services (CIPSO, RIPSO, etc.) to LSM developers. NetLabel is
designed to work in conjunction with a LSM to intercept and decode
security labels on incoming network packets as well as ensure that
outgoing network packets are labeled according to the security
mechanism employed by the LSM. The NetLabel subsystem is configured
through a Generic NETLINK interface described in the header files
included in this patch.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the Commercial IP Security Option (CIPSO) to the IPv4
network stack. CIPSO has become a de-facto standard for
trusted/labeled networking amongst existing Trusted Operating Systems
such as Trusted Solaris, HP-UX CMW, etc. This implementation is
designed to be used with the NetLabel subsystem to provide explicit
packet labeling to LSM developers.
The CIPSO/IPv4 packet labeling works by the LSM calling a NetLabel API
function which attaches a CIPSO label (IPv4 option) to a given socket;
this in turn attaches the CIPSO label to every packet leaving the
socket without any extra processing on the outbound side. On the
inbound side the individual packet's sk_buff is examined through a
call to a NetLabel API function to determine if a CIPSO/IPv4 label is
present and if so the security attributes of the CIPSO label are
returned to the caller of the NetLabel API function.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes to the core network stack to support the NetLabel subsystem. This
includes changes to the IPv4 option handling to support CIPSO labels.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This automatically labels the TCP, Unix stream, and dccp child sockets
as well as openreqs to be at the same MLS level as the peer. This will
result in the selection of appropriately labeled IPSec Security
Associations.
This also uses the sock's sid (as opposed to the isec sid) in SELinux
enforcement of secmark in rcv_skb and postroute_last hooks.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defaults the label of socket-specific IPSec policies to be the
same as the socket they are set on.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This labels the flows that could utilize IPSec xfrms at the points the
flows are defined so that IPSec policy and SAs at the right label can
be used.
The following protos are currently not handled, but they should
continue to be able to use single-labeled IPSec like they currently
do.
ipmr
ip_gre
ipip
igmp
sit
sctp
ip6_tunnel (IPv6 over IPv6 tunnel device)
decnet
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes the security context of a security association created
for use by IKE in the acquire messages sent to IKE daemons using
PF_KEY. This would allow the daemons to include the security context
in the negotiation, so that the resultant association is unique to
that security context.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes the security context of a security association created
for use by IKE in the acquire messages sent to IKE daemons using
netlink/xfrm_user. This would allow the daemons to include the
security context in the negotiation, so that the resultant association
is unique to that security context.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a seemless mechanism for xfrm policy selection and
state matching based on the flow sid. This also includes the necessary
SELinux enforcement pieces.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds security for IP sockets at the sock level. Security at the
sock level is needed to enforce the SELinux security policy for
security associations even when a sock is orphaned (such as in the TCP
LAST_ACK state).
This will also be used to enforce SELinux controls over data arriving
at or leaving a child socket while it's still waiting to be accepted.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts SCTP to use the new HMAC template and hash interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts IPsec to use the new HMAC template. The names of
existing simple digest algorithms may still be used to refer to their
HMAC composites.
The same structure can be used by other MACs such as AES-XCBC-MAC.
This patch also switches from the digest interface to hash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts all remaining users to use the new block cipher type
where applicable. It also changes all simple cipher operations to use
the new encrypt_one/decrypt_one interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts IPSec/ESP to use the new block cipher type where
applicable. Similar to the HMAC conversion, existing algorithm names
have been kept for compatibility.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a compatibility name field for each IPsec algorithm. This
is needed when parameterised algorithms are used. For example, "md5" will
become "hmac(md5)", and "aes" will become "cbc(aes)".
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The header file linux/crypto.h is only needed by a few files so including
it in net/xfrm.h (which is included by half of the networking stack) is a
waste. This patch moves it out of net/xfrm.h and into the specific header
files that actually need it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
fib_trie.c::check_leaf() passes host-endian where fib_semantic_match()
expects (and stores into) net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing aliases for ipt_quota and ip6t_quota to make autoload
work.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In clip_mkip(), skb->dev is dereferenced after clip_push(),
which frees up skb.
Advisory: AD_LAB-06009 (<adlab@venustech.com.cn>).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM.
On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote:
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
> (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290
>
> but task is already holding lock:
> (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290
>
> which lock already depends on the new lock.
This turns out to be a genuine bug. The queue lock and xmit lock are
intentionally taken out of order. Two things are supposed to prevent
dead-locks from occuring:
1) When we hold the queue_lock we're supposed to only do try_lock on the
tx_lock.
2) We always drop the queue_lock after taking the tx_lock and before doing
anything else.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&dev->_xmit_lock){-+..}:
> [<c012e7b6>] lock_acquire+0x76/0xa0
> [<c0336241>] _spin_lock_bh+0x31/0x40
> [<c02d25a9>] dev_activate+0x69/0x120
This path obviously breaks assumption 1) and therefore can lead to ABBA
dead-locks.
I've looked at the history and there seems to be no reason for the lock
to be held at all in dev_watchdog_up. The lock appeared in day one and
even there it was unnecessary. In fact, people added __dev_watchdog_up
precisely in order to get around the tx lock there.
The function dev_watchdog_up is already serialised by rtnl_lock since
its only caller dev_activate is always called under it.
So here is a simple patch to remove the tx lock from dev_watchdog_up.
In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and
replace it with dev_watchdog_up.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Non-linear skbs are truncated to their linear part with mmaped IO.
Fix by using skb_copy_bits instead of memcpy.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>