The following commandline:
root=/dev/mtdblock6 rw rootfstype=jffs2 ip=192.168.1.10:::255.255.255.0:localhost.localdomain:eth1:off console=ttyS0,115200
makes ip_auto_config fall back to DHCP and complain "IP-Config: Incomplete
network configuration information." depending on if CONFIG_IP_PNP_DHCP is
set or not.
The only way I can make ip_auto_config accept my IP config is to add an
entry for the server IP:
ip=192.168.1.10:192.168.1.15::255.255.255.0:localhost.localdomain:eth1:off
I think this is a bug since I am not using a NFS root FS.
The following patch fixes the above problem.
From: Andrew Morton <akpm@linux-foundation.org>
Davem said (in February!):
Well, first of all the change in question is not in 2.4.x either. I just
checked the current 2.4.x GIT tree and the test is exactly:
if (ic_myaddr == INADDR_NONE ||
#ifdef CONFIG_ROOT_NFS
(MAJOR(ROOT_DEV) == UNNAMED_MAJOR
&& root_server_addr == INADDR_NONE
&& ic_servaddr == INADDR_NONE) ||
#endif
ic_first_dev->next) {
which matches 2.6.x
I even checked 2.4.x when it was branched for 2.5.x and the test was the
same at the point in time too.
Looking at the proposed change a bit it appears that it is probably
correct, as it's trying to check that ROOT_DEV is nfs root. But if it is
correct then the UNNAMED_MAJOR comparison in the same code block should be
removed as it becomes superfluous.
I'm happy to apply this patch with that modification made.
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small patch to H-TCP from Douglas Leith.
Fix estimation of maxRTT. The original code ignores rtt measurements
during slow start (via the check tp->snd_ssthresh < 0xFFFF) yet this
is probably a good time to try to estimate max rtt as delayed acking
is disabled and slow start will only exit on a loss which presumably
corresponds to a maxrtt measurement. Second, the original code (via
the check htcp_ccount(ca) > 3) ignores rtt data during what it
estimates to be the first 3 round-trip times. This seems like an
unnecessary check now that the RCV timestamp are no longer used
for rtt estimation.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as
well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the call to seq_open() returns != 0 then the code calls
kfree(st) but then on the very next line proceeds to
dereference the pointer - not good.
Problem spotted by the Coverity checker.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a DSACK is received, it's better to lower cwnd as it's
a sign of data receival.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_cwnd_down must check for it too as it should be conservative
in case of collapse stuff and also when receiver is trying to
lie (though that wouldn't be very successful/useful anyway).
Note:
- Separated also is_dupack and do_lost in fast_retransalert
* Much cleaner look-and-feel now
* This time it really fixes cumulative ACK with many new
SACK blocks recovery entry (I claimed this fixes with
last patch but it wasn't). TCP will now call
tcp_update_scoreboard regardless of is_dupack when
in recovery as long as there is enough fackets_out.
- Introduce FLAG_SND_UNA_ADVANCED
* Some prior_snd_una arguments are unnecessary after it
- Added helper FLAG_ANY_PROGRESS to avoid long FLAG...|FLAG...
constructs
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.
The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP. Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.
TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).
I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.
After this patch, rate halving is performed for ACK only if
packets in flight was supposedly changed too.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that netdev notifications can fail, we can use this to signal
errors during registration for IPv4/IPv6. In particular, if we
fail to allocate memory for the inet device, we can fail the netdev
registration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a path that forwards packets, IPVS should be using
skb_forward_csum instead of directly setting ip_summed
to CHECKSUM_NONE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change HTCP to use measured RTT rather than smooth RTT.
Srtt is computed using the TCP receive timestamp
options, so it is vulnerable to hostile receivers. To avoid any problems
this might cause use the measured RTT instead.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove use of received timestamp option value from RTT calculation in Cubic.
A hostile receiver may be returning a larger timestamp option than the original
value. This would cause the sender to believe the malevolent receiver had
a larger RTT and because Cubic tries to provide some RTT friendliness, the
sender would then favor the liar.
Instead, use the jiffie resolutionRTT value already computed and
passed back after ack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the API for the callback that is done after an ACK is
received. It solves a couple of issues:
* Some congestion controls want higher resolution value of RTT
(controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
all compute a RTT in microseconds.
* Other congestion control could use RTT at jiffies resolution.
To keep API consistent the units should be the same for both cases, just the
resolution should change.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
no real bugs, just misannotations cropping up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Loading one of the LOG target fails if a different target has already
registered itself as backend for the same family. This can affect the
ipt_LOG and ipt_ULOG modules when both are loaded.
Reported and tested by: <t.artem@mailcity.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC net/ipv4/inetpeer.o
net/ipv4/inetpeer.c: In function 'unlink_from_pool':
net/ipv4/inetpeer.c:297: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c:297: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c: In function 'inet_getpeer':
net/ipv4/inetpeer.c:409: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c:409: warning: the address of 'stack' will always evaluate as 'true'
"Fix" by checking for != NULL.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Removal of rtt argument in ->cong_avoid() had missed tcp_htcp.c
instance.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).
Here is a short excerpt of the semantic patch performing
this transformation:
@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@
x =
- kmalloc
+ kzalloc
(E1,E2)
... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);
@@
expression E1,E2,E3;
@@
- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)
[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything
useful - so remove it ..
I've left a do-nothing version so that out-of-tree jprobes code will still
compile without modifications.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For yet unknown reason, something cleared SACKED_RETRANS bit
underneath FRTO.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in the times of Linux 2.2, negative values for the creat parameter
of __neigh_lookup() had a particular meaning, but no longer, so we
should pass 1 instead.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lower ip6tables, arptables and ebtables printk severity similar to
Dan Aloni's patch for iptables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.
To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.
Spotted by Jordan Russell.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global __inet_twsk_kill() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sangtae noticed the ssthresh got missed.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits)
ioatdma: add the unisys "i/oat" pci vendor/device id
ARM: Add drivers/dma to arch/arm/Kconfig
iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
iop13xx: surface the iop13xx adma units to the iop-adma driver
dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
md: remove raid5 compute_block and compute_parity5
md: handle_stripe5 - request io processing in raid5_run_ops
md: handle_stripe5 - add request/completion logic for async expand ops
md: handle_stripe5 - add request/completion logic for async read ops
md: handle_stripe5 - add request/completion logic for async check ops
md: handle_stripe5 - add request/completion logic for async compute ops
md: handle_stripe5 - add request/completion logic for async write ops
md: common infrastructure for running operations with raid5_run_ops
md: raid5_run_ops - run stripe operations outside sh->lock
raid5: replace custom debug PRINTKs with standard pr_debug
raid5: refactor handle_stripe5 and handle_stripe6 (v3)
async_tx: add the async_tx api
xor: make 'xor_blocks' a library routine for use with async_tx
dmaengine: make clients responsible for managing channels
dmaengine: refactor dmaengine around dma_async_tx_descriptor
...
Switch from formatting messages in probe routine and copying with
kfifo, to using a small circular queue of information and formatting
on read. This avoids wraparound issues with kfifo, and saves one
copy.
Also make sure to state correct license, rather than copying off some
other driver I started with.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/tcp.c: In function 'tcp_recvmsg':
net/ipv4/tcp.c:1111: warning: unused variable 'available'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Leech <christopher.leech@intel.com>
The performance wins come with having the DMA copy engine doing the copies
in parallel with the context switch. If there is enough data ready on the
socket at recv time just use a regular copy.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Make all initialized struct seq_operations in net/ const
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont wrote:
> Right. By the way, shouldn't "len" rather be signed in there?
>
> unsigned int len;
>
> /* if we're overly short, let UDP handle it */
> len = skb->len - sizeof(struct udphdr);
> if (len <= 0)
> goto udp;
It should, but the < 0 case can't happen since __udp4_lib_rcv
already makes sure that we have at least a complete UDP header.
Anyways, this patch fixes it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust structure size and don't expect pointers passed in from
userspace to be valid. Also replace an enum in an ABI structure
by a fixed size type.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert from the global expectation list to the hash table.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.
This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...
Consistently use nf_ct_ as prefix for exported functions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass NULL, this also doesn't seem very useful for modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.
One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now memory space for help and NAT are allocated by extension
infrastructure.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
DNAT of the the RTP session is only necessary if the SIP session has
been SNATed.
Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes redundant parentheses and braces (And add one pair in a
xt_tcpudp.c macro).
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
device_cmp: the function's address is taken (call to nf_ct_iterate_cleanup)
alloc_null_binding: referenced externally
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make a number of variables const and/or remove unneeded casts.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of target checkentry functions to boolean.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the "hotdrop" variables to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.
Make xfrm4_rcv_encap() static since it is no longer called externally.
Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do same adjustment to SACK fastpath counters provided that
they're valid.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.
Include padding to put the new encap_rcv field on a 4-byte boundary.
Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.
This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computation
Signed-off-by: David S. Miller <davem@davemloft.net>
It is clean-up for XFRM type modules and adds aliases with its
protocol:
ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
ROUTING and DSTOPTS for MIPv6
It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table. Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.
This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.
The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_read_sock() currently assumes that the recv_actor() only returns
number of bytes copied. For network splice receive, we may have to
return an error in some cases. So allow the actor to return a negative
error value.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_vs currently fails to reset its ip_vs_sync_state variable if the
sync thread fails to start properly. The result is that the kernel
will report a running daemon when their actuall is none.
If you issue the following commands:
1. ipvsadm --start-daemon master --mcast-interface bla
2. ipvsadm -L --daemon
3. ipvsadm --stop-daemon master
Assuming that bla is not an actual interface, step 2 should return no
data, but instead returns:
$ ipvsadm -L --daemon
master sync daemon (mcast=bla, syncid=0)
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6f74651ae6 is found guilty
of breaking DSACK counting, which should be done only for the
SACK block reported by the DSACK instead of every SACK block
that is received along with DSACK information.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 164891aadf broke RTT
sampling of congestion control modules. Inaccurate timestamps
could be fed to them without providing any way for them to
identify such cases. Previously RTT sampler was called only if
FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
timestamps nicely. In addition, the new behavior could give an
invalid timestamp (zero) to RTT sampler if only skbs with
TCPCB_RETRANS were ACKed. This solves both problems.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This flaw does not affect any behavior (currently).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the current default of 100, Cubic and BIC perform very
poorly compared to standard Reno.
In the worst case, this change makes Cubic and BIC as aggressive as
Reno. So this change should be very safe.
Signed-off-by: David S. Miller <davem@davemloft.net>
Without FRTO, the tcp_try_to_open is never called with
lost_out > 0 (see tcp_time_to_recover). However, when FRTO is
enabled, the !tp->lost condition is not used until end of FRTO
because that way TCP avoids premature entry to fast recovery
during FRTO.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 options are not very well aligned within the packet and the
format of a CIPSO option is even worse. The result is that the CIPSO
engine in the kernel does a few unaligned accesses when parsing and
validating incoming packets with CIPSO options attached which generate
error messages on certain alignment sensitive platforms. This patch
fixes this by marking these unaligned accesses with the
get_unaliagned() macro.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current NetLabel code has some redundant APIs which allow both
"struct socket" and "struct sock" types to be used; this may have made
sense at some point but it is wasteful now. Remove the functions that
operate on sockets and convert the callers. Not only does this make
the code smaller and more consistent but it pushes the locking burden
up to the caller which can be more intelligent about the locks. Also,
perform the same conversion (socket to sock) on the SELinux/NetLabel
glue code where it make sense.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we create idev before addresses are added, it no longer makes
sense to remove them when addresses are all deleted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts changesets:
6aaf47fa48b7b5f487abde34ed91c4fc038410b4
There are still some correctness issues recently
discovered which do not have a known fix that doesn't
involve doing a full hash table scan on port bind.
So revert for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
check_compat_entry_size_and_hooks iterates over the matches and calls
compat_check_calc_match, which loads the match and calculates the
compat offsets, but unlike the non-compat version, doesn't call
->checkentry yet. On error however it calls cleanup_matches, which in
turn calls ->destroy, which can result in crashes if the destroy
function (validly) expects to only get called after the checkentry
function.
Add a compat_release_match function that only drops the module reference
on error and rename compat_check_calc_match to compat_find_calc_match to
reflect the fact that it doesn't call the checkentry function.
Reported by Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a helper module is unloaded all conntracks refering to it have their
helper pointer NULLed out, leading to lots of races. In most places this
can be fixed by proper use of RCU (they do already check for != NULL,
but in a racy way), additionally nf_conntrack_expect_related needs to
bail out when no helper is present.
Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy
and not worth fixing.
Signed-off-by: Patrick McHarrdy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC doesn't like the way Stephen initially did it:
net/ipv4/tcp_probe.c:83: warning: empty declaration
Signed-off-by: David S. Miller <davem@davemloft.net>
LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0
>/proc/sys/net/core/warnings).
The "TCP: Treason uncloaked!" message can use this facility.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously inet devices were only constructed when addresses are added
(or rarely in ipmr). Therefore the default config values they get are
the ones at the time of these operations.
Now that we're creating inet devices earlier, this changes the
behaviour of default config values in an incompatible way (see bug
#8519).
This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously once inetdev_init has been called on a device any changes
made to ipv4_devconf_dflt would have no effect on that device's
configuration.
This creates a problem since we have moved the point where
inetdev_init is called from when an address is added to where the
device is registered.
This patch is the first half of a set that tries to mimic the old
behaviour while still calling inetdev_init.
It propagates any changes to ipv4_devconf_dflt to those devices that
have not had the corresponding attribute set.
The next patch will forcibly set all values at the point where
inetdev_init was previously called.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the ipv4_devconf config members (everything except
sysctl) to an array. This allows easier manipulation which will be
needed later on to provide better management of default config values.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>