Second preparatory patch for fix-ideal runtime:
Mark prev_sum_exec_runtime at the beginning of our run, the same spot
that adds our wait period to wait_runtime. This seems a more natural
location to do this, and it also reduces the code a bit:
text data bss dec hex filename
13397 228 1204 14829 39ed sched.o.before
13391 228 1204 14823 39e7 sched.o.after
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Preparatory patch for fix-ideal-runtime:
simplify __check_preempt_curr_fair(): get rid of the integer return.
text data bss dec hex filename
13404 228 1204 14836 39f4 sched.o.before
13393 228 1204 14825 39e9 sched.o.after
functionality is unchanged.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the cfs_rq->wait_runtime debug/statistics counter was not maintained
properly - fix this.
this also removes some code:
text data bss dec hex filename
13420 228 1204 14852 3a04 sched.o.before
13404 228 1204 14836 39f4 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
fix niced_granularity(). This resulted in under-scheduling for
CPU-bound negative nice level tasks (and this in turn caused
higher than necessary latencies in nice-0 tasks).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
First fix the check
if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task)
with this
if (*imbalance < busiest_load_per_task)
As the current check is always false for nice 0 tasks (as
SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0
tasks).
With the above change, imbalance was getting reset to 0 in the corner
case condition, making the FUZZ logic fail. Fix it by not corrupting the
imbalance and change the imbalance, only when it finds that the HT/MC
optimization is needed.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Initialise s_flags in get_sb_mtd_aux() from the flags parameter.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
I've bisected the deadlock when many small appends are done on jffs2 down to
this commit:
commit 6fe6900e1e
Author: Nick Piggin <npiggin@suse.de>
Date: Sun May 6 14:49:04 2007 -0700
mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.
I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd. All depending on whether the filler is async and/or can return
with a !uptodate page.
It introduced a wait to read_cache_page, as well as a
read_cache_page_async function equivalent to the old read_cache_page
without any callers.
Switching jffs2_gc_fetch_page to read_cache_page_async for the old
behavior makes the deadlocks go away, but maybe reintroduces the
use-before-uptodate problem? I don't understand the mm/fs interaction
well enough to say.
[It's fine. dwmw2.]
Signed-off-by: Jason Lunz <lunz@falooley.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Ryusuke Konishi says:
The recent truncate_complete_page() clears the dirty flag from a page
before calling a_ops->invalidatepage(),
^^^^^^
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
...
cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at
kernel 2.6.20
if (PagePrivate(page))
do_invalidatepage(page, 0); ---> will call
a_ops->invalidatepage()
...
}
and this is disturbing nfs_wb_page_priority() from calling
nfs_writepage_locked() that is expected to handle the pending
request (=nfs_page) associated with the page.
int nfs_wb_page_priority(struct inode *inode, struct page *page, int how)
{
...
if (clear_page_dirty_for_io(page)) {
ret = nfs_writepage_locked(page, &wbc);
if (ret < 0)
goto out;
}
...
}
Since truncate_complete_page() will get rid of the page after
a_ops->invalidatepage() returns, the request (=nfs_page) associated
with the page becomes a garbage in nfs_inode->nfs_page_tree.
------------------------
Fix this by ensuring that nfs_wb_page_priority() recognises that it may
also need to clear out non-dirty pages that have an nfs_page associated
with them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
According to the mount(2) man page, the proper error return code for the
mount(2) system call when the special device name or the mounted-on
directory name is too long is ENAMETOOLONG.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The hostname was getting truncated in the new text-based NFS mount API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Don't filter the return code from the in-kernel rpcbind or NFS mount
clients. Return the real error code so that callers of the new NFS
text-based mount API can apply a useful retry strategy.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The new text-based NFS mount option parsing logic doesn't recognize any
valid transport protocols due to a silly mistake in the protocol token
matching logic. This prevents basic mount requests such as:
mount.nfs server:/export /mnt -o proto=tcp
from working with the new text-based NFS mount API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Doh! We can't use cancel_delayed_work_sync because we may have been called
from an unmount that was being performed by nfs_automount_task.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I think that I've found and fixed the problem. There is a copy/paste bug in
vt6421_set_dma_mode() function which causes wrong values to be written to
PATA_UDMA_TIMING register.
This patch fixes a copy/paste bug that breaks DMA modes on VT6421 PATA port.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
After my last patch we have a new header file for HP simulator use.
Here's code to use it for stuff that used to have `extern' statements
inline in the code. Functionality should not change with this patch.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch cleans up the `enable early console for SKI' patch
(471e7a4484), and
1. potentially allows the gensparse_defconfig to work again.
(there are other problems running a generic kernel on Ski)
2. fixes the `console registered twice' problem.
3. Cleans up the code by moving the `extern hpsim_cons' declaration to
a new asm/hpsim.h file.
Thanks to Jes for comments.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When dumping memory via sysrq-m it is possible to take a bogus NMI watchdog
or softlockup watchdog because the dump can take a long time on big memory
systems.
Occasionally tickle the watchdog when doing the dump.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add additional support for CPU disable on SN platforms.
Correctly setup the smp_affinity mask for I/O error IRQs.
Restrict the use of the feature to Altix 4000 and 450 systems
running with a CPU disable capable PROM, and do not allow disabling
of CPU 0.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
vmalloc() returns a void pointer - no need to cast it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Avoid setting the value if the symbol doesn't need to be changed or can't
be changed. Later choices may change the dependencies and thus the
possible input range.
make oldconfig from a 2.6.22 .config with CONFIG_HOTPLUG_CPU not set
was in some configurations setting CONFIG_HOTPLUG_CPU=y without asking,
even when there was no actual requirement for CONFIG_HOTPLUG_CPU.
This was triggered by SUSPEND_SMP that does a select HOTPLUG_CPU.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/ehca: SRQ fixes to enable IPoIB CM
IB/ehca: Fix Small QP regressions
This avoids the recent NFS mount regression (returning EBUSY when
mounting the same filesystem twice with different parameters).
The best I can do given the constraints appears to be to have the kernel
first look for a superblock that matches both the fsid and the
user-specified mount options, and then spawn off a new superblock if
that search fails.
Note that this is not the same as specifying nosharecache everywhere
since nosharecache will never attempt to match an existing superblock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lguest didn't initialize the kernel stack the way a real i386 kernel
does, and ended up triggering a corner-case in the stack frame checking
that doesn't happen on naive i386, and that the stack dumping didn't
handle quite right.
This makes the frame handling more correct, and tries to clarify the
code at the same time so that it's a bit more obvious what is going on.
Thanks to Rusty Russell for debugging the lguest failure-
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The HPET clocksource in drivers/char/hpet.c was written as generic code
for ia64, but it is not yet ready to replace the native HPET clocksource
implementations that the i386/x86-64 architectures use.
On x86[-64], trying to register this clocksource results in potentially
multiple hpet-based clocksources being registered, and if the ia64 one
is chosen on x86_64 some users have experienced hangs.
Eventually all three architectures may end up using the same code, but
that is not the case right now.
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Ornati <ornati@fastwebnet.it>
Cc: Bob Picco <bob.picco@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
- cxgb3 engine microcode load
cxgb3 - Fix dev->priv usage
qeth: Drop ARP packages on HiperSockets interface with NOARP attribute.
qeth: provide specific message for OSA-adapters exclusively used
qeth: crash during reboot after failing online setting
qeth: Announce tx checksumming for qeth devices in TSO/EDDP mode
qeth: dont return the return values of void functions.
qeth: enforce a rate limit for inbound scatter gather messages
qeth: ungrouping a device must not be interruptible
netxen: fix crashes during module unload
netxen: Avoid firmware load in PCI probe
PS3: fix the bug that 'ifconfig down' would hang
IOC3: Program UART predividers.
Fix ehca SRQ support so that IPoIB connected mode works:
- Report max_srq > 0 if SRQ is supported
- Report "last wqe reached" asynchronous event when base QP dies;
this is required by the IB spec and IPoIB CM relies on receiving it
when cleaning up.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The new Small QP code had a few bugs that would also make it trigger
for non-Small QPs. Fix them.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The VESA BIOS is specified to be register-clean. However, we have now
found at least one system which violates that. Thus, be as paranoid
about VESA calls as about everything else.
Huge thanks to Will Simoneau for reporting, diagnosing, and testing
this out on Dell Inspiron 5150.
Cc: Will Simoneau <simoneau@ele.uri.edu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
sched: clean up task_new_fair()
sched: small schedstat fix
sched: fix wait_start_fair condition in update_stats_wait_end()
sched: call update_curr() in task_tick_fair()
sched: make the scheduler converge to the ideal latency
sched: fix sleeper bonus limit
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] Bump driver versions
ata_piix: implement IOCFG bit18 quirk
libata: implement BROKEN_HPA horkage and apply it to affected drives
sata_promise: FastTrack TX4200 is a second-generation chip
pata_marvell: Add more identifiers
ata_piix: add Satellite U200 to broken suspend list
ata: add ATA_MWDMA* and ATA_SWDMA* defines
ata_piix: IDE mode SATA patch for Intel Tolapai
libata-core: Allow translation setting to fail
Load the engine microcode when an interface
is brought up, instead of of doing it when the module
is loaded.
Loosen up tight binding between the driver and the
engine microcode version.
There is no need for microcode update with T3A boards.
Fix the file naming.
Do a better job at logging the loading activity.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
cxgb3 used netdev_priv() and dev->priv for different purposes.
In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the
net-2.6.24 git branch.
Without this fix, cxgb3 crashes on 2.6.23.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
A network interface can get ARP packets even when the interface has
NOARP specified. In a HiperSockets environment this disturbs receiving
systems when packets are sent on the multicast queue. (E.g. TCP/IP on
z/VM issues messages reporting invalid data on the HiperSockets
interface.)
Qeth will no longer send ARP packets on HiperSockets interface when
interface has the NOARP attribute.
Signed-off-by: Klaus D. Wacker <kdwacker@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Exclusive usage of OSA-cards has been introduced. Even though Linux
does not make use of it, qeth should be prepared to receive a bad RC
for some initialization steps. A meaningful message is now given,
if an OSA-device is set online, even though the OSA-adapter is already
exclusively used by another host.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Online setting of a qeth device may fail for instance because of:
- out-of-memory condition when allocating qdio queues
- IDX ACTIVATE problem
- ...
Such a device is still returned in a driver_for_each_device loop
processed in qeth_reboot_event(), which calls
qeth_clear_qdio_buffers(). Make sure qeth_clear_output_buffer() is
called only, if the qdio queues have been successfully allocated
during initialization of a qeth device.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
TSO requires tx checksumming. For non GSO frames in TSO/EDDP mode we
have to manually calculate the checksum.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>