Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In case a cpu goes idle but softirqs are pending only an error message is
printed to the console. It may take a very long time until the pending
softirqs will finally be executed. Worst case would be a hanging system.
With this patch the timer tick just continues and the softirqs will be
executed after the next interrupt. Still a delay but better than a
hanging system.
Currently we have at least two device drivers on s390 which under certain
circumstances schedule a tasklet from process context. This is a reason
why we can end up with pending softirqs when going idle. Fixing these
drivers seems to be non-trivial.
However there is no question that the drivers should be fixed.
This patch shouldn't be considered as a bug fix. It just is intended to
keep a system running even if device drivers are buggy.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Glauber <jan.glauber@de.ibm.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
Revert "USB: EHCI: fix performance regression"
USB: fsl_usb2_udc: fix recursive lock
USB: usb-serial: option: Don't match Huawei driver CD images
USB: pl2303: another product ID
USB: add another scanner quirk
USB: Add support for ROKR W5 in unusual_devs.h
USB: Fix M600i unusual_devs entry
USB: usb-storage: unusual_devs update for Cypress ATACB
USB: EHCI: fix performance regression
USB: EHCI: fix bug in Iso scheduling
USB: EHCI: fix remote-wakeup regression
USB: EHCI: suppress unwanted error messages
USB: EHCI: fix up root-hub TT mess
USB: add all configs to the "descriptors" attribute
USB: fix possible deadlock involving sysfs attributes
USB: Firmware loader driver for USB Apple iSight camera
USB: FTDI_SIO : Add support for Matrix Orbital PID Range
Create the dev_set_name function now so that various subsystems can
start changing over to it before other changes in 2.6.27 will make it
compulsory.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit fa38dfcc56.
It wasn't really a regression and David and Alan are still working
through the issues reported.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add the interface info matching to all Huawei cards, as they all also
contain a Mass Storage Device interface (usually containing Windows
drivers) which should not get bound by this driver.
See also drivers/usb/storage/unusual_devs.h
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I've just got a USB GPRS/EDGE modem branded Manufacturer Micromax Model
MMX610U (see http://www.airtel.in/level2_t3data.aspx?path=1/106/179)
working by adding another product ID to pl2303. Modem info reports same
module as Max Arnold's i.e.SIMCOM SIM600 but with product ID 0x0612
(cf Ox0611).
From: Steve Murphy <steve@gnusis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Like the HP53{00,70} scanner other devices of the OEM Avision require
the USB_QUIRK_STRING_FETCH_255 to correct set a configuration with
"recent" Linux kernels.
Signed-off-by: René Rebe <rene@exactcode.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds support for rev 2 of an existing unusual_devs entry
enabling ROKR W5s to work. Greg, please apply.
From: Javier Smaldone <javier@smaldone.com.ar>
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out that the unusual_devs entry for the Motorola M600i needs
another flag. This patch adds it. Thanks to Atte André Jensen
<atte@ballbreaker.dk>.
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1101) updates the unusual_devs entry for the Cypress
ATACB pass-through. The protocol field is changed from US_PR_BULK to
US_PR_DEVICE, since the Cypress devices already set bInterfaceProtocol
to Bulk-only.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1099) fixes a performance regression in ehci-hcd. The
fundamental problem is that queue headers get removed from the
schedule too quickly, since the code checks for a counter advancing
rather than making an actual time-based check. The latency involved
in removing the queue header and then relinking it can severely
degrade certain kinds of workloads.
The patch replaces a simple counter with a timestamp derived from the
controller's uframe value. In addition, the delay for unlinking an
idle queue header is increased from 5 ms to 10 ms; since some
controllers (nVidia) have a latency of up to 1 ms for unlinking, this
reduces the relative impact from 20% to 10%.
Finally, a logical error left over from the IAA watchdog-timer
conversion is corrected. Now the driver will always either unlink an
idle queue header or set up a timer to unlink it later. The old code
would sometimes fail to do either.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: David Brownell <david-b@pacbell.net>
Cc: Leonid <leonidv11@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1098) changes the way ehci-hcd schedules its periodic
Iso transfers. That the current scheduling code is wrong is clear on
the face of it: Sometimes it returns -EL2NSYNC (meaning that an URB
couldn't be scheduled because it was submitted too late), but it does
this even when the URB_ISO_ASAP flag is set (meaning the URB should be
scheduled as soon as possible).
The new code properly implements as-soon-as-possible scheduling,
assigning the next unexpired slot as the URB's starting point. It
also is more careful about checking for Iso URB completion: It doesn't
bother to check for activity during frames that are already over,
and it allows for the possibility that some of the URB's packets may
have raced the hardware when they were submitted and so never got used
(the packet status is set to -EXDEV).
This fixes problems several people have experienced with USB video
applications.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1097) fixes a bug in the remote-wakeup handling in
ehci-hcd. The driver currently does not keep track of whether the
change-suspend feature is enabled for each port; the feature is
automatically reset the first time it is read. But recent changes to
the hub driver require that the feature be read at least twice in
order to work properly.
A bit-vector is added for storing the change-suspend feature values.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1096) fixes an annoying problem: When a full-speed or
low-speed device is plugged into an EHCI controller, it fails to
enumerate at high speed and then is handed over to the companion
controller. But usbcore logs a misleading and unwanted error message
when the high-speed enumeration fails.
The patch adds a new HCD method, port_handed_over, which asks whether
a port has been handed over to a companion controller. If it has, the
error message is suppressed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1095) cleans up the HCD glue and several of the EHCI
bus-glue files. The ehci->is_tdi_rh_tt flag is redundant, since it
means the same thing as the hcd->has_tt flag, so it is removed and the
other flag used in its place.
Some of the bus-glue files didn't get the relinquish_port method added
to their hc_driver structures. Although that routine currently
doesn't do anything for controllers with an integrated TT, in the
future it might. So the patch adds it where it is missing.
Lastly, some of the bus-glue files have erroneous entries for their
hc_driver's suspend and resume methods. These method pointers are
specific to PCI and shouldn't be used otherwise.
(The patch also includes an invisible whitespace fix.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
This patch (as1094) changes the output of the "descriptors" binary
attribute. Now it will contain the device descriptor followed by all
the configuration descriptors, not just the descriptor for the current
config.
Userspace libraries want to have access to the kernel's cached
descriptor information, so they can learn about device characteristics
without having to wake up suspended devices. So far the only user of
this attribute is the new libusb-1.0 library; thus changing its
contents shouldn't cause any problems.
This should be considered for 2.6.26, if for no other reason than to
minimize the range of releases in which the attribute contains only the
current config descriptor.
Also, it doesn't hurt that the patch removes the device locking --
which was formerly needed in order to know for certain which config was
indeed current.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a potential deadlock when the usb_generic driver is unbound
from a device. The problem is that generic_disconnect() is called
with the device lock held, and it removes a bunch of device attributes
from sysfs. If a user task happens to be running an attribute method
at the time, the removal will block until the method returns. But at
least one of the attribute methods (the store routine for power/level)
needs to acquire the device lock!
This patch (as1093) eliminates the deadlock by moving the calls to
create and remove the sysfs attributes from the usb_generic driver
into usb_new_device() and usb_disconnect(), where they can be invoked
without holding the device lock.
Besides, the other sysfs attributes are created when the device is
registered and removed when the device is unregistered. So it seems
only fitting for the extra attributes to be created and removed at the
same time.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Uninitialised Apple iSight drivers present with a distinctive USB ID.
Once firmware has been uploaded, they disconnect and reconnect with a
new ID. At this point they can be driven by the uvcvideo driver. As this
is unique to the Apple cameras and not functionality shared by any other
UVC devices, it makes sense to provide the firmware loading
functionality in a separate driver. This driver will read an isight.fw
file extracted from the Apple driver using the tools at
http://bersace03.free.fr/ift/ and upload it to the camera. It will also
handle the case where the device loses its firmware during hibernation
and must have it reloaded.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds support for the range of PIDs
that have been allocated for FTDI based devices
at Matrix Orbital.
A small number of units have been shipped early 2008
with a faulty USB Descriptor. Products that may have
this issue have been marked with the existing quirk to
work around the problem.
Signed-off-by: R. Molenkamp <rmolenkamp@matrixorbital.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the
error exit 'err_out_unregister' from different places once with the
'cpu_policy_rwsem' lock held, once with the lock released:
| if (ret)
| goto err_out_unregister;
| }
|
| policy->governor = NULL; /* to assure that the starting sequence is
| * run in cpufreq_set_policy */
|
| /* set default policy */
| ret = __cpufreq_set_policy(policy, &new_policy);
| policy->user_policy.policy = policy->policy;
| policy->user_policy.governor = policy->governor;
|
| unlock_policy_rwsem_write(cpu);
|
| if (ret) {
| dprintk("setting policy failed\n");
| goto err_out_unregister;
| }
This leads to the following error message in case of a failing
__cpufreq_set_policy() call:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
[<c01b4564>] unlock_policy_rwsem_write+0x30/0x40
but there are no more locks to release!
other info that might help us debug this:
1 lock held by swapper/1:
#0: (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130
stack backtrace:
[<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104)
[<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c)
r6:00000028 r5:c3c1ab80 r4:c01b4564
[<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c)
r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80
[<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30)
r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c
[<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40)
r4:c3c34884
[<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398)
[<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130)
[<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174)
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dave Jones <davej@redhat.com>
improve the sysbench ramp-up phase and its peak throughput on
a 16way NUMA box, by turning on WAKE_AFFINE:
tip/sched tip/sched+wake-affine
-------------------------------------------------
1: 700 830 +15.65%
2: 1465 1391 -5.28%
4: 3017 3105 +2.81%
8: 5100 6021 +15.30%
16: 10725 10745 +0.19%
32: 10135 10150 +0.16%
64: 9338 9240 -1.06%
128: 8599 8252 -4.21%
256: 8475 8144 -4.07%
-------------------------------------------------
SUM: 57558 57882 +0.56%
this change also improves lat_ctx from 6.69 usecs to 1.11 usec:
$ ./lat_ctx -s 0 2
"size=0k ovr=1.19
2 1.11
$ ./lat_ctx -s 0 2
"size=0k ovr=1.22
2 6.69
in sysbench it's an overall win with some weakness at the lots-of-clients
side. That happens because we now under-balance this workload
a bit. To counter that effect, turn on NEWIDLE:
wake-idle wake-idle+newidle
-------------------------------------------------
1: 830 834 +0.43%
2: 1391 1401 +0.65%
4: 3105 3091 -0.43%
8: 6021 6046 +0.42%
16: 10745 10736 -0.08%
32: 10150 10206 +0.55%
64: 9240 9533 +3.08%
128: 8252 8355 +1.24%
256: 8144 8384 +2.87%
-------------------------------------------------
SUM: 57882 58591 +1.21%
as a bonus this not only improves the many-clients case but
also improves the (more important) rampup phase.
sysbench is a workload that quickly breaks down if the
scheduler over-balances, so since it showed an improvement
under NEWIDLE this change is definitely good.
Prevent short-running wakers of short-running threads from overloading a single
cpu via wakeup affinity, and wire up disconnected debug option.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make sched_clock_cpu() return 0 before it has been initialized and avoid
corrupting its state due to doing so.
This fixes the weird printk timestamp jump reported.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.
With bisect, I located the following patch:
| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group scheduling
Revert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Coverity checker spotted a memleak introduced by commit
39106dcf85 (cpumask: use new cpus_scnprintf
function).
It seems the kfree() got lost between v2 and v3 of this patch...
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> +#define ARCH_KMALLOC_MINALIGN (sizeof(long) * 2)
> +#define ARCH_SLAB_MINALIGN (sizeof(long) * 2)
This doesn't work if SLAB is selected and slab debugging is enabled as
these are passed to the preprocessor, and the preprocessor doesn't
understand sizeof.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: fix RCU problem in cfq_cic_lookup()
block: make blktrace use per-cpu buffers for message notes
Added in elevator switch message to blktrace stream
Added in MESSAGE notes for blktraces
block: reorder cfq_queue to save space on 64bit builds
block: Move the second call to get_request to the end of the loop
splice: handle try_to_release_page() failure
splice: fix sendfile() issue with relay
Specify the minimum slab/kmalloc alignment to be 8 bytes. This fixes a
crash when SLOB is selected as the memory allocator. The FRV arch needs
this so that it can use the load- and store-double instructions without
faulting. By default SLOB sets the minimum to be 4 bytes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a typo in the header guard of asm/ipc.h.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cfq_cic_lookup() needs to properly protect ioc->ioc_data before
dereferencing it and also exclude updaters of ioc->ioc_data as well.
Also add a number of comments documenting why the existing RCU usage
is OK.
Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for
review and comments!
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.
The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Allows messages to be inserted into blktrace streams.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
saves 8 bytes of padding & increases objects/slab from 30 to 32 on my
AMD64 config
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In function get_request_wait, the second call to get_request could be
moved to the end of the while loop, because if the first call to
get_request fails, the second call will fail without sleep.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
splice currently assumes that try_to_release_page() always suceeds,
but it can return failure. If it does, we cannot steal the page.
Acked-by: Mingming Cao <cmm@us.ibm.com
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Splice isn't always incrementing the ppos correctly, which broke
relay splice.
Signed-off-by: Tom Zanussi <zanussi@comcast.net>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
pciehp: add message about pciehp_slot_with_bus option
pci hotplug core: add check of duplicate slot name
pciehp: move msleep after power off
pciehp: poll cmd completion if hotplug interrupt is disabled
pciehp: fix slow probing
pciehp: fix NULL dereference in interrupt handler
shpchp: add message about shpchp_slot_with_bus option
PCI: don't enable ASPM on devices with mixed PCIe/PCI functions
Some (broken?) platform assign the same slot name to multiple hotplug
slots. On such system, slot initialization would fail because of name
collision. The pciehp driver already have a "slot_with_bus" module
option which adds the bus number into the slot name. This patch adds
the message about this module option that will be displayed when slot
name collision is detected.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>