Use get_online_cpus() to prevent cpu hotplug in situations where
for_each_online_cpu() is called.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes the last remaining section mismatch warnings in s390
architecture code. It reveals also a real bug introduced by... me
with git commit 2069e978d5
("[S390] sparsemem vmemmap: initialize memmap.")
Calling the generic vmemmap_alloc_block() function to get initialized
memory is a nice idea, however that function is __meminit annotated
and therefore the function might be gone if we try to call it later.
This can happen if a DCSS segment gets added.
So basically revert the patch and clear the memmap explicitly to fix
the original bug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
virtio tests with guests larger than 4 GB revealed that the dma_addr_t
definition for s390 did not make it into the 64bit world.
This patch changes the definition on s390 to have an u64 on 64bit and
u32 on 32bit systems.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Due to incorrect function call sequence it can happen that a tape block
request is finished before the request is taken from the block request queue.
The following sequence leads to that condition:
* tapeblock_start_request() -> start CCW program
* Request finishes -> IO interrupt
* tapeblock_end_request()
* end_that_request_last()
If blkdev_dequeue_request() has not been called before end_that_request_last(),
a kernel bug is triggered in end_that_request_last() because the request is
still queued. To solve that problem blkdev_dequeue_request() has to be called
before starting the CCW program.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 make allnoconfig fails with the following build error:
arch/s390/mm/init.c: In function 'show_mem':
arch/s390/mm/init.c:55: error: implicit declaration of function 'pfn_valid'
make[1]: *** [arch/s390/mm/init.o] Error 1
make: *** [arch/s390/mm] Error 2
This problem can by fixed ensuring that ARCH_SELECT_MEMORY_MODEL
is always turned on.
Signed-off-by: Hans-Joachim Picht <hans@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is the lguest implementation of the VIRTIO_F_NOTIFY_ON_EMPTY feature.
It is currently only published for network devices, but it is turned on for
everyone.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio allows drivers to suppress callbacks (ie. interrupts) for
efficiency (no locking, it's just an optimization).
There's a similar mechanism for the host to suppress notifications
coming from the guest: in that case, we ignore the suppression if the
ring is completely full.
It turns out that life is simpler if the host similarly ignores
callback suppression when the ring is completely empty: the network
driver wants to free up old packets in a timely manner, and otherwise
has to use a timer to poll.
We have to remove the code which ignores interrupts when the driver
has disabled them (again, it had no locking and hence was unreliable
anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since commit 72e61eb40b (virtio: change config
to guest endian) config space is no longer fixed endian.
Lets change the virtio_blk_config variables.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty,
This patch is a prereq for the virtio_blk blocksize patch, please apply it
first.
Adding an u32 value to the virtio_blk_config unconvered a small bug the config
space defintions:
v is a pointer, to we have to use sizeof(*v) instead of sizeof(v).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Hello Rusty,
seems that we still have a problem with virtio_net and the enable_cb callback.
During a long running network stress tests with virtio and got the following
oops:
------------[ cut here ]------------
kernel BUG at drivers/virtio/virtio_ring.c:230!
illegal operation: 0001 [#1] SMP
Modules linked in:
CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34
Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990)
Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800
0000000000000000 0000000000419ce0 0000000000000080 000000000000007b
000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920
0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18
Krnl Code: 00000000002d0ebc: a7110001 tmll %r1,1
00000000002d0ec0: a7740004 brc 7,2d0ec8
00000000002d0ec4: a7f40001 brc 15,2d0ec6
>00000000002d0ec8: a517fffe nill %r1,65534
00000000002d0ecc: 40103000 sth %r1,0(%r3)
00000000002d0ed0: 07f0 bcr 15,%r0
00000000002d0ed2: e31020380004 lg %r1,56(%r2)
00000000002d0ed8: a7480000 lhi %r4,0
Call Trace:
([<000000000029c0fc>] virtnet_poll+0x290/0x3b8)
[<0000000000333fb8>] net_rx_action+0x9c/0x1b8
[<00000000001394bc>] __do_softirq+0x74/0x108
[<000000000010d16a>] do_softirq+0x92/0xac
[<0000000000139826>] irq_exit+0x72/0xc8
[<000000000010a7b6>] do_extint+0xe2/0x104
[<0000000000110508>] ext_no_vtime+0x16/0x1a
Last Breaking-Event-Address:
[<00000000002d0ec4>] vring_enable_cb+0x18/0x60
I looked into the virtio_net code for some time and I think the following
scenario happened. Please look at virtnet_poll:
[...]
/* Out of packets? */
if (received < budget) {
netif_rx_complete(vi->dev, napi);
if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
&& napi_schedule_prep(napi)) {
vi->rvq->vq_ops->disable_cb(vi->rvq);
__netif_rx_schedule(vi->dev, napi);
goto again;
}
}
If an interrupt arrives after netif_rx_complete, a second poll routine can run
on a different cpu. The second check for napi_schedule_prep would prevent any
harm in the network stack, but we have called enable_cb possibly after the
disable_cb in skb_recv_done.
static void skb_recv_done(struct virtqueue *rvq)
{
struct virtnet_info *vi = rvq->vdev->priv;
/* Schedule NAPI, Suppress further interrupts if successful. */
if (netif_rx_schedule_prep(vi->dev, &vi->napi)) {
rvq->vq_ops->disable_cb(rvq);
__netif_rx_schedule(vi->dev, &vi->napi);
}
}
That means that the second poll routine runs with interrupts enabled, which is
ok, since we can handle additional interrupts. The problem is now that the
second poll routine might also call enable_cb, triggering the BUG.
The only solution I can come up with, is to remove the BUG statement in
enable_cb - similar to disable_cb. Opinions or better ideas where the oops
could come from?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that by itself, having a "hardware" random generator does very
little: you should probably run "rngd" in your guest to feed this into
the kernel entropy pool.
Included:
virtio_rng: dont use vmalloced addresses for virtio
If virtio_rng is build as a module, random_data is an address
in vmalloc space. As virtio expects guest real addresses, this
can cause any kind of funny behaviour, so lets allocate
random_data dynamically with kmalloc.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Hello Rusty,
sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anthony Liguori points out that three different transports use the virtio code,
but each one keeps its own counter to set the virtio_device's index field. In
theory (though not in current practice) this means that names could be
duplicated, and that risk grows as more transports are created.
So we move the selection of the unique virtio_device.index into the common code
in virtio.c, which has the side-benefit of removing duplicate code.
The only complexity is that lguest and S/390 use the index to uniquely identify
the device in case of catastrophic failure before register_virtio_device() is
called: now we use the offset within the descriptor page as a unique identifier
for the printks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
The common virtio code sets the bus_id, overriding anything virtio_pci
sets anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Chris Lalancette <clalance@redhat.com> points out that virtio.c sets all device
names to '0', '1', etc, which looks silly in /proc/interrupts. We change this
from '%d' to 'virtio%d'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Fix a modprobe virtio_blk ; rmmod virtio_blk ; modprobe virtio_blk crash; this
was basically because we weren't doing "del_gendisk()" in the remove path.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (moved del_gendisk up)
Thanks to Jon Corbet & LWN. Only took me a day to join the dots.
Host->Guest netcat before (with unnecessily large receive buffers):
1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s
After:
1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
Revert "USB: EHCI: fix performance regression"
USB: fsl_usb2_udc: fix recursive lock
USB: usb-serial: option: Don't match Huawei driver CD images
USB: pl2303: another product ID
USB: add another scanner quirk
USB: Add support for ROKR W5 in unusual_devs.h
USB: Fix M600i unusual_devs entry
USB: usb-storage: unusual_devs update for Cypress ATACB
USB: EHCI: fix performance regression
USB: EHCI: fix bug in Iso scheduling
USB: EHCI: fix remote-wakeup regression
USB: EHCI: suppress unwanted error messages
USB: EHCI: fix up root-hub TT mess
USB: add all configs to the "descriptors" attribute
USB: fix possible deadlock involving sysfs attributes
USB: Firmware loader driver for USB Apple iSight camera
USB: FTDI_SIO : Add support for Matrix Orbital PID Range
Create the dev_set_name function now so that various subsystems can
start changing over to it before other changes in 2.6.27 will make it
compulsory.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit fa38dfcc56.
It wasn't really a regression and David and Alan are still working
through the issues reported.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add the interface info matching to all Huawei cards, as they all also
contain a Mass Storage Device interface (usually containing Windows
drivers) which should not get bound by this driver.
See also drivers/usb/storage/unusual_devs.h
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I've just got a USB GPRS/EDGE modem branded Manufacturer Micromax Model
MMX610U (see http://www.airtel.in/level2_t3data.aspx?path=1/106/179)
working by adding another product ID to pl2303. Modem info reports same
module as Max Arnold's i.e.SIMCOM SIM600 but with product ID 0x0612
(cf Ox0611).
From: Steve Murphy <steve@gnusis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Like the HP53{00,70} scanner other devices of the OEM Avision require
the USB_QUIRK_STRING_FETCH_255 to correct set a configuration with
"recent" Linux kernels.
Signed-off-by: René Rebe <rene@exactcode.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds support for rev 2 of an existing unusual_devs entry
enabling ROKR W5s to work. Greg, please apply.
From: Javier Smaldone <javier@smaldone.com.ar>
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out that the unusual_devs entry for the Motorola M600i needs
another flag. This patch adds it. Thanks to Atte André Jensen
<atte@ballbreaker.dk>.
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1101) updates the unusual_devs entry for the Cypress
ATACB pass-through. The protocol field is changed from US_PR_BULK to
US_PR_DEVICE, since the Cypress devices already set bInterfaceProtocol
to Bulk-only.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1099) fixes a performance regression in ehci-hcd. The
fundamental problem is that queue headers get removed from the
schedule too quickly, since the code checks for a counter advancing
rather than making an actual time-based check. The latency involved
in removing the queue header and then relinking it can severely
degrade certain kinds of workloads.
The patch replaces a simple counter with a timestamp derived from the
controller's uframe value. In addition, the delay for unlinking an
idle queue header is increased from 5 ms to 10 ms; since some
controllers (nVidia) have a latency of up to 1 ms for unlinking, this
reduces the relative impact from 20% to 10%.
Finally, a logical error left over from the IAA watchdog-timer
conversion is corrected. Now the driver will always either unlink an
idle queue header or set up a timer to unlink it later. The old code
would sometimes fail to do either.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: David Brownell <david-b@pacbell.net>
Cc: Leonid <leonidv11@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1098) changes the way ehci-hcd schedules its periodic
Iso transfers. That the current scheduling code is wrong is clear on
the face of it: Sometimes it returns -EL2NSYNC (meaning that an URB
couldn't be scheduled because it was submitted too late), but it does
this even when the URB_ISO_ASAP flag is set (meaning the URB should be
scheduled as soon as possible).
The new code properly implements as-soon-as-possible scheduling,
assigning the next unexpired slot as the URB's starting point. It
also is more careful about checking for Iso URB completion: It doesn't
bother to check for activity during frames that are already over,
and it allows for the possibility that some of the URB's packets may
have raced the hardware when they were submitted and so never got used
(the packet status is set to -EXDEV).
This fixes problems several people have experienced with USB video
applications.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1097) fixes a bug in the remote-wakeup handling in
ehci-hcd. The driver currently does not keep track of whether the
change-suspend feature is enabled for each port; the feature is
automatically reset the first time it is read. But recent changes to
the hub driver require that the feature be read at least twice in
order to work properly.
A bit-vector is added for storing the change-suspend feature values.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1096) fixes an annoying problem: When a full-speed or
low-speed device is plugged into an EHCI controller, it fails to
enumerate at high speed and then is handed over to the companion
controller. But usbcore logs a misleading and unwanted error message
when the high-speed enumeration fails.
The patch adds a new HCD method, port_handed_over, which asks whether
a port has been handed over to a companion controller. If it has, the
error message is suppressed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1095) cleans up the HCD glue and several of the EHCI
bus-glue files. The ehci->is_tdi_rh_tt flag is redundant, since it
means the same thing as the hcd->has_tt flag, so it is removed and the
other flag used in its place.
Some of the bus-glue files didn't get the relinquish_port method added
to their hc_driver structures. Although that routine currently
doesn't do anything for controllers with an integrated TT, in the
future it might. So the patch adds it where it is missing.
Lastly, some of the bus-glue files have erroneous entries for their
hc_driver's suspend and resume methods. These method pointers are
specific to PCI and shouldn't be used otherwise.
(The patch also includes an invisible whitespace fix.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
This patch (as1094) changes the output of the "descriptors" binary
attribute. Now it will contain the device descriptor followed by all
the configuration descriptors, not just the descriptor for the current
config.
Userspace libraries want to have access to the kernel's cached
descriptor information, so they can learn about device characteristics
without having to wake up suspended devices. So far the only user of
this attribute is the new libusb-1.0 library; thus changing its
contents shouldn't cause any problems.
This should be considered for 2.6.26, if for no other reason than to
minimize the range of releases in which the attribute contains only the
current config descriptor.
Also, it doesn't hurt that the patch removes the device locking --
which was formerly needed in order to know for certain which config was
indeed current.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a potential deadlock when the usb_generic driver is unbound
from a device. The problem is that generic_disconnect() is called
with the device lock held, and it removes a bunch of device attributes
from sysfs. If a user task happens to be running an attribute method
at the time, the removal will block until the method returns. But at
least one of the attribute methods (the store routine for power/level)
needs to acquire the device lock!
This patch (as1093) eliminates the deadlock by moving the calls to
create and remove the sysfs attributes from the usb_generic driver
into usb_new_device() and usb_disconnect(), where they can be invoked
without holding the device lock.
Besides, the other sysfs attributes are created when the device is
registered and removed when the device is unregistered. So it seems
only fitting for the extra attributes to be created and removed at the
same time.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Uninitialised Apple iSight drivers present with a distinctive USB ID.
Once firmware has been uploaded, they disconnect and reconnect with a
new ID. At this point they can be driven by the uvcvideo driver. As this
is unique to the Apple cameras and not functionality shared by any other
UVC devices, it makes sense to provide the firmware loading
functionality in a separate driver. This driver will read an isight.fw
file extracted from the Apple driver using the tools at
http://bersace03.free.fr/ift/ and upload it to the camera. It will also
handle the case where the device loses its firmware during hibernation
and must have it reloaded.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds support for the range of PIDs
that have been allocated for FTDI based devices
at Matrix Orbital.
A small number of units have been shipped early 2008
with a faulty USB Descriptor. Products that may have
this issue have been marked with the existing quirk to
work around the problem.
Signed-off-by: R. Molenkamp <rmolenkamp@matrixorbital.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the
error exit 'err_out_unregister' from different places once with the
'cpu_policy_rwsem' lock held, once with the lock released:
| if (ret)
| goto err_out_unregister;
| }
|
| policy->governor = NULL; /* to assure that the starting sequence is
| * run in cpufreq_set_policy */
|
| /* set default policy */
| ret = __cpufreq_set_policy(policy, &new_policy);
| policy->user_policy.policy = policy->policy;
| policy->user_policy.governor = policy->governor;
|
| unlock_policy_rwsem_write(cpu);
|
| if (ret) {
| dprintk("setting policy failed\n");
| goto err_out_unregister;
| }
This leads to the following error message in case of a failing
__cpufreq_set_policy() call:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
[<c01b4564>] unlock_policy_rwsem_write+0x30/0x40
but there are no more locks to release!
other info that might help us debug this:
1 lock held by swapper/1:
#0: (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130
stack backtrace:
[<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104)
[<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c)
r6:00000028 r5:c3c1ab80 r4:c01b4564
[<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c)
r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80
[<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30)
r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c
[<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40)
r4:c3c34884
[<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398)
[<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130)
[<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174)
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dave Jones <davej@redhat.com>
improve the sysbench ramp-up phase and its peak throughput on
a 16way NUMA box, by turning on WAKE_AFFINE:
tip/sched tip/sched+wake-affine
-------------------------------------------------
1: 700 830 +15.65%
2: 1465 1391 -5.28%
4: 3017 3105 +2.81%
8: 5100 6021 +15.30%
16: 10725 10745 +0.19%
32: 10135 10150 +0.16%
64: 9338 9240 -1.06%
128: 8599 8252 -4.21%
256: 8475 8144 -4.07%
-------------------------------------------------
SUM: 57558 57882 +0.56%
this change also improves lat_ctx from 6.69 usecs to 1.11 usec:
$ ./lat_ctx -s 0 2
"size=0k ovr=1.19
2 1.11
$ ./lat_ctx -s 0 2
"size=0k ovr=1.22
2 6.69
in sysbench it's an overall win with some weakness at the lots-of-clients
side. That happens because we now under-balance this workload
a bit. To counter that effect, turn on NEWIDLE:
wake-idle wake-idle+newidle
-------------------------------------------------
1: 830 834 +0.43%
2: 1391 1401 +0.65%
4: 3105 3091 -0.43%
8: 6021 6046 +0.42%
16: 10745 10736 -0.08%
32: 10150 10206 +0.55%
64: 9240 9533 +3.08%
128: 8252 8355 +1.24%
256: 8144 8384 +2.87%
-------------------------------------------------
SUM: 57882 58591 +1.21%
as a bonus this not only improves the many-clients case but
also improves the (more important) rampup phase.
sysbench is a workload that quickly breaks down if the
scheduler over-balances, so since it showed an improvement
under NEWIDLE this change is definitely good.
I tried to group recovery related fields nearby (non-CA_Open related
variables, to be more accurate) so that one to three cachelines would
not be necessary in CA_Open. These are now contiguously deployed:
struct sk_buff_head out_of_order_queue; /* 1968 80 */
/* --- cacheline 32 boundary (2048 bytes) --- */
struct tcp_sack_block duplicate_sack[1]; /* 2048 8 */
struct tcp_sack_block selective_acks[4]; /* 2056 32 */
struct tcp_sack_block recv_sack_cache[4]; /* 2088 32 */
/* --- cacheline 33 boundary (2112 bytes) was 8 bytes ago --- */
struct sk_buff * highest_sack; /* 2120 8 */
int lost_cnt_hint; /* 2128 4 */
int retransmit_cnt_hint; /* 2132 4 */
u32 lost_retrans_low; /* 2136 4 */
u8 reordering; /* 2140 1 */
u8 keepalive_probes; /* 2141 1 */
/* XXX 2 bytes hole, try to pack */
u32 prior_ssthresh; /* 2144 4 */
u32 high_seq; /* 2148 4 */
u32 retrans_stamp; /* 2152 4 */
u32 undo_marker; /* 2156 4 */
int undo_retrans; /* 2160 4 */
u32 total_retrans; /* 2164 4 */
...and they're then followed by URG slowpath & keepalive related
variables.
Head of the out_of_order_queue always needed for empty checks, if
that's empty (and TCP is in CA_Open), following ~200 bytes (in 64-bit)
shouldn't be necessary for anything. If only OFO queue exists but TCP
is in CA_Open, selective_acks (and possibly duplicate_sack) are
necessary besides the out_of_order_queue but the rest of the block
again shouldn't be (ie., the other direction had losses).
As the cacheline boundaries depend on many factors in the preceeding
stuff, trying to align considering them doesn't make too much sense.
Commented one ordering hazard.
There are number of low utilized u8/16s that could be combined get 2
bytes less in total so that the hole could be made to vanish (includes
at least ecn_flags, urg_data, urg_mode, frto_counter, nonagle).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent short-running wakers of short-running threads from overloading a single
cpu via wakeup affinity, and wire up disconnected debug option.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make sched_clock_cpu() return 0 before it has been initialized and avoid
corrupting its state due to doing so.
This fixes the weird printk timestamp jump reported.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.
With bisect, I located the following patch:
| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group scheduling
Revert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>