The linearisation operation doesn't need to be super-optimised. So we can
replace __skb_linearize with __pskb_pull_tail which does the same thing but
is more general.
Also, most users of skb_linearize end up testing whether the skb is linear
or not so it helps to make skb_linearize do just that.
Some callers of skb_linearize also use it to copy cloned data, so it's
useful to have a new function skb_linearize_cow to copy the data if it's
either non-linear or cloned.
Last but not least, I've removed the gfp argument since nobody uses it
anymore. If it's ever needed we can easily add it back.
Misc bugs fixed by this patch:
* via-velocity error handling (also, no SG => no frags)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
hashlimit does:
if (!ht->rnd)
get_random_bytes(&ht->rnd, 4);
ignoring that 0 is also a valid random number.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
create_proc_entry must not be called with locks held. Use a mutex
instead to protect data only changed in user context.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a secmark field to IP and NF conntracks, so that security markings
on packets can be copied to their associated connections, and also
copied back to packets as required. This is similar to the network
mark field currently used with conntrack, although it is intended for
enforcement of security policy rather than network policy.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a secmark field to the skbuff structure, to allow security subsystems to
place security markings on network packets. This is similar to the nfmark
field, except is intended for implementing security policy, rather than than
networking policy.
This patch was already acked in principle by Dave Miller.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is typed wrong, and it's only assigned and used once.
So just pass in iph->daddr directly which fixes both problems.
Based upon a patch by Alexey Dobriyan.
Signed-off-by: David S. Miller <davem@davemloft.net>
All users pass 32-bit values as addresses and internally they're
compared with 32-bit entities. So, change "laddr" and "raddr" types to
__be32.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users except two expect 32-bit big-endian value. One is of
->multiaddr = ->multiaddr
variety. And last one is "%08lX".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The suseconds_t et al. are not necessarily any particular type on
every platform, so cast to unsigned long so that we can use one printf
format string and avoid warnings across the board
Signed-off-by: David S. Miller <davem@davemloft.net>
Implementation of RFC3742 limited slow start. Added as part
of the TCP highspeed congestion control module.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new module for tracking TCP state variables non-intrusively
using kprobes. It has a simple /proc interface that outputs one line
for each packet received. A sample usage is to collect congestion
window and ssthresh over time graphs.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many of the TCP congestion methods all just use ssthresh
as the minimum congestion window on decrease. Rather than
duplicating the code, just have that be the default if that
handle in the ops structure is not set.
Minor behaviour change to TCP compound. It probably wants
to use this (ssthresh) as lower bound, rather than ssthresh/2
because the latter causes undershoot on loss.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original code did a 64 bit divide directly, which won't work on
32 bit platforms. Rather than doing a 64 bit square root twice,
just implement a 4th root function in one pass using Newton's method.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP Compound is a sender-side only change to TCP that uses
a mixed Reno/Vegas approach to calculate the cwnd.
For further details look here:
ftp://ftp.research.microsoft.com/pub/tr/TR-2005-86.pdf
Signed-off-by: Angelo P. Castellani <angelo.castellani@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP Veno module is a new congestion control module to improve TCP
performance over wireless networks. The key innovation in TCP Veno is
the enhancement of TCP Reno/Sack congestion control algorithm by using
the estimated state of a connection based on TCP Vegas. This scheme
significantly reduces "blind" reduction of TCP window regardless of
the cause of packet loss.
This work is based on the research paper "TCP Veno: TCP Enhancement
for Transmission over Wireless Access Networks." C. P. Fu, S. C. Liew,
IEEE Journal on Selected Areas in Communication, Feb. 2003.
Original paper and many latest research works on veno:
http://www.ntu.edu.sg/home/ascpfu/veno/veno.html
Signed-off-by: Bin Zhou <zhou0022@ntu.edu.sg>
Cheng Peng Fu <ascpfu@ntu.edu.sg>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP Low Priority is a distributed algorithm whose goal is to utilize only
the excess network bandwidth as compared to the ``fair share`` of
bandwidth as targeted by TCP. Available from:
http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf
Original Author:
Aleksandar Kuzmanovic <akuzma@northwestern.edu>
See http://www-ece.rice.edu/networks/TCP-LP/ for their implementation.
As of 2.6.13, Linux supports pluggable congestion control algorithms.
Due to the limitation of the API, we take the following changes from
the original TCP-LP implementation:
o We use newReno in most core CA handling. Only add some checking
within cong_avoid.
o Error correcting in remote HZ, therefore remote HZ will be keeped
on checking and updating.
o Handling calculation of One-Way-Delay (OWD) within rtt_sample, sicne
OWD have a similar meaning as RTT. Also correct the buggy formular.
o Handle reaction for Early Congestion Indication (ECI) within
pkts_acked, as mentioned within pseudo code.
o OWD is handled in relative format, where local time stamp will in
tcp_time_stamp format.
Port from 2.4.19 to 2.6.16 as module by:
Wong Hoi Sing Edison <hswong3i@gmail.com>
Hung Hing Lun <hlhung3i@gmail.com>
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRE keys are 16-bit wide.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SIP connection tracking helper. Originally written by
Christian Hentschel <chentschel@arnet.com.ar>, some cleanup, minor
fixes and bidirectional SIP support added by myself.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call Forwarding doesn't need to create an expectation if both peers can
reach each other without our help. The internal_net_addr parameter
lets the user explicitly specify a single network where this is true,
but is not very flexible and even fails in the common case that calls
will both be forwarded to outside parties and inside parties. Use an
optional heuristic based on routing instead, the assumption is that
if bpth the outgoing device and the gateway are equal, both peers can
reach each other directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a port number within a packet is replaced by a differently sized
number only the packet is resized, but not the copy of the data.
Following port numbers are rewritten based on their offsets within
the copy, leading to packet corruption.
Convert the amanda helper to the textsearch infrastructure to avoid
the copy entirely.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of skipping search entries for the wrong direction simply index
them by direction.
Based on patch by Pablo Neira <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using the ID to find out where to continue dumping, take a
reference to the last entry dumped and try to continue there.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current configuration only allows to configure one manip and overloads
conntrack status flags with netlink semantic.
Signed-off-by: Patrick Mchardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag in a connection status to have a non updated timeout.
This permits to have connection that automatically die at a given
time.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
None of the existing helpers expects to get called for related ICMP
packets and some even drop them if they can't parse them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the unmaintainable ipt_recent match by a rewritten version that
should be fully compatible.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have xfrm_mode objects we can move the transport mode specific
input decapsulation code into xfrm_mode_transport. This removes duplicate
code as well as unnecessary header movement in case of tunnel mode SAs
since we will discard the original IP header immediately.
This also fixes a minor bug for transport-mode ESP where the IP payload
length is set to the correct value minus the header length (with extension
headers for IPv6).
Of course the other neat thing is that we no longer have to allocate
temporary buffers to hold the IP headers for ESP and IPComp.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the structure xfrm_mode. It is meant to represent
the operations carried out by transport/tunnel modes.
By doing this we allow additional encapsulation modes to be added
without clogging up the xfrm_input/xfrm_output paths.
Candidate modes include 4-to-6 tunnel mode, 6-to-4 tunnel mode, and
BEET modes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of locks used to manage afinfo structures can easily be reduced
down to one each for policy and state respectively. This is based on the
observation that the write locks are only held by module insertion/removal
which are very rare events so there is no need to further differentiate
between the insertion of modules like ipv6 versus esp6.
The removal of the read locks in xfrm4_policy.c/xfrm6_policy.c might look
suspicious at first. However, after you realise that nobody ever takes
the corresponding write lock you'll feel better :)
As far as I can gather it's an attempt to guard against the removal of
the corresponding modules. Since neither module can be unloaded at all
we can leave it to whoever fixes up IPv6 unloading :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only want to take receive RTT mesaurements for data
bearing frames, here in the header prediction fast path
for a pure-sender, we know that we have a pure-ACK and
thus the checks in tcp_rcv_rtt_mesaure_ts() will not pass.
Signed-off-by: David S. Miller <davem@davemloft.net>
Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any socket recv of less than this ammount will not be offloaded
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an extra argument to sk_eat_skb, and make it move early copied
packets to the async_wait_queue instead of freeing them.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Aki M Nyrhinen <anyrhine@cs.helsinki.fi>
IMHO the current fix to the problem (in_flight underflow in reno)
is incorrect. it treats the symptons but ignores the problem. the
problem is timing out packets other than the head packet when we
don't have sack. i try to explain (sorry if explaining the obvious).
with sack, scanning the retransmit queue for timed out packets is
fine because we know which packets in our retransmit queue have been
acked by the receiver.
without sack, we know only how many packets in our retransmit queue the
receiver has acknowledged, but no idea which packets.
think of a "typical" slow-start overshoot case, where for example
every third packet in a window get lost because a router buffer gets
full.
with sack, we check for timeouts on those every third packet (as the
rest have been sacked). the packet counting works out and if there
is no reordering, we'll retransmit exactly the packets that were
lost.
without sack, however, we check for timeout on every packet and end up
retransmitting consecutive packets in the retransmit queue. in our
slow-start example, 2/3 of those retransmissions are unnecessary. these
unnecessary retransmissions eat the congestion window and evetually
prevent fast recovery from continuing, if enough packets were lost.
Signed-off-by: David S. Miller <davem@davemloft.net>
Trimming the head of an skb by calling skb_pull can cause the packet
to become unaligned if the length pulled is odd. Since the length is
entirely arbitrary for a FIN packet carrying data, this is actually
quite common.
Unaligned data is not the end of the world, but we should avoid it if
it's easily done. In this case it is trivial. Since we're discarding
all of the head data it doesn't matter whether we move skb->data forward
or back.
However, it is still possible to have unaligned skb->data in general.
So network drivers should be prepared to handle it instead of crashing.
This patch also adds an unlikely marking on len < headlen since partial
ACKs on head data are extremely rare in the wild. As the return value
of __pskb_trim_head is no longer ever NULL that has been removed.
Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When snd_cwnd is smaller than 38 and the connection is in
congestion avoidance phase (snd_cwnd > snd_ssthresh), the snd_cwnd
seems to stop growing.
The additive increase was confused because C array's are 0 based.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It appears that sockaddr_in.sin_zero is not zeroed during
getsockopt(...SO_ORIGINAL_DST...) operation. This can lead
to an information leak (CVE-2006-1343).
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kmalloc fails, error path leaks data allocated from asn1_oid_decode().
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When parsing unknown sequence extensions the "son"-pointer points behind
the last known extension for this type, don't try to interpret it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The condition "> H323_ERROR_STOP" can never be true since H323_ERROR_STOP
is positive and is the highest possible return code, while real errors are
negative, fix the checks. Also only abort on real errors in some spots
that were just interpreting any return value != 0 as error.
Fixes crashes caused by use of stale data after a parsing error occured:
BUG: unable to handle kernel paging request at virtual address bfffffff
printing eip:
c01aa0f8
*pde = 1a801067
*pte = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: ip_nat_h323 ip_conntrack_h323 nfsd exportfs sch_sfq sch_red cls_fw sch_hfsc xt_length ipt_owner xt_MARK iptable_mangle nfs lockd sunrpc pppoe pppoxx
CPU: 0
EIP: 0060:[<c01aa0f8>] Not tainted VLI
EFLAGS: 00210646 (2.6.17-rc4 #8)
EIP is at memmove+0x19/0x22
eax: d77264e9 ebx: d77264e9 ecx: e88d9b17 edx: d77264e9
esi: bfffffff edi: bfffffff ebp: de6a7680 esp: c0349db8
ds: 007b es: 007b ss: 0068
Process asterisk (pid: 3765, threadinfo=c0349000 task=da068540)
Stack: <0>00000006 c0349e5e d77264e3 e09a2b4e e09a38a0 d7726052 d7726124 00000491
00000006 00000006 00000006 00000491 de6a7680 d772601e d7726032 c0349f74
e09a2dc2 00000006 c0349e5e 00000006 00000000 d76dda28 00000491 c0349f74
Call Trace:
[<e09a2b4e>] mangle_contents+0x62/0xfe [ip_nat]
[<e09a2dc2>] ip_nat_mangle_tcp_packet+0xa1/0x191 [ip_nat]
[<e0a2712d>] set_addr+0x74/0x14c [ip_nat_h323]
[<e0ad531e>] process_setup+0x11b/0x29e [ip_conntrack_h323]
[<e0ad534f>] process_setup+0x14c/0x29e [ip_conntrack_h323]
[<e0ad57bd>] process_q931+0x3c/0x142 [ip_conntrack_h323]
[<e0ad5dff>] q931_help+0xe0/0x144 [ip_conntrack_h323]
...
Found by the PROTOS c07-h2250v4 testsuite.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>