Make TSO segment transmit size decisions at send time not earlier.
The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.
This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.
Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.
A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: David S. Miller <davem@davemloft.net>
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue. This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.
However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.
Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.
As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_write_xmit() uses tcp_current_mss(), but some of it's callers,
namely __tcp_push_pending_frames(), already has this value available
already.
While we're here, fix the "cur_mss" argument to be "unsigned int"
instead of plain "unsigned".
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it. tcp_write_xmit() does the call for us,
so the call here can simply be removed.
Also, tcp_write_xmit() can be marked static.
Signed-off-by: David S. Miller <davem@davemloft.net>
It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.
Signed-off-by: David S. Miller <davem@davemloft.net>
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.
Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al. We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.
Signed-off-by: David S. Miller <davem@davemloft.net>
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set. It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.
When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.
Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration. That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.
Signed-off-by: David S. Miller <davem@davemloft.net>
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.
This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.
So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom(). This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.
Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.
Signed-off-by: David S. Miller <davem@davemloft.net>
I suspect "#define __ARGS(x) ()" was deprecated before I was born.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds qdisc_alloc() to share code between qdisc_create()
and qdisc_create_dflt(). Hides the qdisc alignment behind
macros and makes use of them.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the code to load packet data into a register:
k = fentry->k;
if (k < 0) {
...
} else {
u32 _tmp, *p;
p = skb_header_pointer(skb, k, 4, &_tmp);
if (p != NULL) {
A = ntohl(*p);
continue;
}
}
skb_header_pointer checks if the requested data is within the
linear area:
int hlen = skb_headlen(skb);
if (offset + len <= hlen)
return skb->data + offset;
When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.
I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.
This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.
Thanks to Thomas Vgtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __attribute__((packed)) to ensure that the stat64 structure is
correctly laid out no matter which ABI the kernel is compiled for.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
From: Rob Punkunus <rpunkunus@nvidia.com>
Rob Punkunus recently submitted a patch to enable support for MCP51/MCP55 in
the amd74xx driver. This patch was whitespace-corrupted and didn't apply to
2.6.12 since MCP51 support was merged in the 2.6.12-rc series.
Gentoo would like to support this hardware for our upcoming release media, so
I fixed the patch, and here it is :)
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Patch from Todd Poynor
Add definition of K0DB4 SDCLK<0,3> divide-by-4 control/status bit in the
MDREFR register for Intel XScale PXA27x.
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Todd Poynor
Add support for PXA27x Standby mode, a low-power mode that retains CPU
and some peripheral state (the existing "sleep" mode is a power-power
mode that retains less state). Activated via:
echo -n standby > /sys/power/state
From: David Burrage and Todd Poynor
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Removed dead code in arch/xtensa/kernel/pci.c and use the pci_name() macro.
Fixed an error in the delay asm macro: '1' is an invalid immediate value.
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed this because I was doing some more ipc cleanups and I did the
original errno and ipc cleanups for other architectures, so it stuck out.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 4866cde064 requires finish_arch_switch
to have only one parameter instead of two.
Also fix another compile error (double declaration of account_system_vtime)
if CONFIG_VIRT_CPU_ACCOUNTING is not defined.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we receive an unrecognised abort during boot, don't try to
send a signal to pid0, but instead report the current state.
This leads to less confusing debug reports.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
No one was looking at the return value of bus_rescan_devices, and it
really wasn't anything that anyone in the kernel would ever care about.
So change it which enabled some counting code to be removed also.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add bus_find_device() and driver_find_device() which allow searching for a
device in the bus's resp. the driver's klist and obtain a reference on it.
Signed-off-by: Cornelia Huck <cohuck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The hvlpevent_queue (formally ItLpQueue) has a member called xInUseWord
which is used for serialising access to the queue. Because it's a word
(ie. 32 bit) there's a custom 32-bit version of test_and_set_bit() or
thereabouts in ItLpQueue.c.
The xInUseWord is not shared with they hypervisor, so we can replace it
with a spinlock and remove the custom code.
There is also another locking mechanism (ItLpQueueInProcess). This is
redundant because it's only manipulated while the lock's held. Remove it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently there's a per-cpu count of lpevents processed, a per-queue (ie.
global) total count, and a count by event type.
Replace all that with a count by event for each cpu. We only need to add
it up int the proc code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we count the number of lpevents processed in 3 seperate places.
One of these counters is never read, so just remove it. This means
hvlpevent_queue_process() no longer needs to return the number of events
processed.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we've renamed the xItLpQueue structure, rename the functions that
operate on it also.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The xItLpQueue is a queue of HvLpEvents that we're given by the Hypervisor.
Rename xItLpQueue to hvlpevent_queue and make the type struct hvlpevent_queue.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
External parties don't need to use ItLpQueue_getNextLpEvent() or
ItLpQueue_clearValid(), they're internal to ItLpQueue.c
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The xItLpQueue is initalised manually in iSeries_setup_arch(). Move
this code into ItLpQueue.c for a cleaner separation.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because there's only one ItLpQueue and we know where it is, ie. xItLpQueue,
there's no point passing pointers to it it around all over the place.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The iSeries code keeps a pointer to the ItLpQueue in its paca struct. But
all these pointers end up pointing to the one place, ie. xItLpQueue.
So remove the pointer from the paca struct and just refer to xItLpQueue
directly where needed.
The only complication is that the spread_lpevents logic was implemented by
having a NULL lpqueue pointer in the paca on CPUs that weren't supposed to
process events. Instead we just compare the spread_lpevents value to the
processor id to get the same behaviour.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove legacy ISA serial ports for Accent, Boca, Fourport, Hub6 and MCA
from the architecture specific serial.h include.
The only ports which remain in asm-*/serial.h are the platform specific
entries. These should really be converted by platform maintainers to
use a platform device, such as can be found in
arch/arm/mach-footbridge/isa.c
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
With DEBUG enabled, head.S includes arch/debug-macro.S. On the PXA, this
contains references to the macro io_p2v() so hardware.h needs to be
included.
Signed-off-by: Richard Purdie <rpurdie@openedhand.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert ARMs timer implementations to use readl/writel instead of accessing
the registers via a struct.
People have recently asked if accessing timers via a structure is the
"right way" and its not the Linux way. So fix this code to conform to
"The Linux Way"(tm).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>