Our performance validation on 2.6.15-rc1 caught a disastrous performance
regression on ia64 with netperf (-98%) and volanomark (-58%) compares to
previous kernel version 2.6.14-git7. See the following chart (result
group 1 & 2).
http://kernel-perf.sourceforge.net/results.machine_id=26.html
We have root caused it to commit 64c7c8f885
This changeset broke the ia64 task resched notification. In
sched.c:resched_task(), a reschedule IPI is conditioned upon
TIF_POLLING_NRFLAG. However, the above changeset unconditionally set
the polling thread flag for idle tasks regardless whether pal_halt_light
is in use or not. As a result, resched IPI is not sent from
resched_task(). And since the default behavior on ia64 is to use
pal_halt_light, we end up delaying the rescheduling task until next
timer tick, and thus cause the performance regression.
This fixes the performance bug. I'm glad our performance suite is
turning up bad performance bug like this in time.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This avoids a BUG_ON with kref.c when SA1111 tries to register
a driver with an unregistered bus type.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
- Fix a regression in command completion, which prevented
the restart of the DMA engine after the device throws
an error.
- Pack more hardware info into the port-reset error message.
- Promote "welcome to our timeout" message from debug msg
to normal printk.
Something I've found handy countless times when users do this..
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Picked from the ubuntu-2.6 tree
The change in location for ll_rw_blk.c from drivers/block/ to block/ caused
failure to generate documentation.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/block/cciss_scsi.c:264: warning: `print_bytes' defined but not used
drivers/block/cciss_scsi.c:298: warning: `print_cmd' defined but not used
Signed-off-by: Grant Coady <gcoady@gmail.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A variable was being used in multiple conflicting ways. I also restructured
the code a bit for clarity.
Signed-off-by: Miles Bader <miles@gnu.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to use the USB_DEVICE macro here, else the modinfo aliases go all wrong.
Also, correctly terminate the table, as noted by Dave Jones <davej@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Despite the fact that md threads don't need to be signalled, and won't
respond to signals anyway, we need to have an 'interruptible' wait, else
they stay in 'D' state and add to the load average.
(akpm: the signal_pending() test is unneeded - we'll fix that up in the next
round. For now, leave it there because that's how the code used to be).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was marked deprecated "after 2.6" back in the 2.5 days. But now it
seems there isn't going to be any "after 2.6", and we deprecate by date
now. So set a date.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Being kernel-threads, nfsd servers don't get pre-empted (depending on
CONFIG). If there is a steady stream of NFS requests that can be served
from cache, an nfsd thread may hold on to a cpu indefinitely, which isn't
very friendly.
So it is good to have a cond_resched in there (just before looking for a
new request to serve), to make sure we play nice.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These exported symbols are in arch/ppc/ but missing from arch/powerpc/ for
ppc32 builds.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lots of good changes to the driver lately that userspace will care about
the version of the driver. Bump the version from 36.0 to 38.0 to be higher
than 37 that the 2.4 driver came out with a few weeks ago which doesn't
have all the same changes.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sysctl.h (again) useable from userspace
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A patch by Eric was merged (f2b36db692)
and later on reverted back (1e4c85f97f).
Along with above patch, another patch was posted and has been merged
(3d1675b41b). That patch was dependent on
the above patch and now it should also be reverted.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rather than defining our own PM option, use kernel/power/Kconfig.
This fixes build errors introduced by
bca73e4bf8
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix up booting with sparse mem enabled. Otherwise it would just
cause an early PANIC at boot.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is needed for large multinode IBM systems which have a sparse
APIC space in clustered mode, fully covering the available 8 bits.
The previous kernels would limit the local APIC number to 127,
which caused it to reject some of the CPUs at boot.
I increased the maximum and shrunk the apic_version array a bit
to make up for that (the version is only 8 bit, so don't need
an full int to store)
Cc: Chris McDermott <lcm@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CONFIG_CHECKING covered some debugging code used in the early times
of the port. But it wasn't even SMP safe for quite some time
and the bugs it checked for seem to be gone.
This patch removes all the code to verify GS at kernel entry. There
haven't been any new bugs in this area for a long time.
Previously it also covered the sysctl for the page fault tracing.
That didn't make much sense because that code was unconditionally
compiled in. I made that a boot option now because it is typically
only useful at boot.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current x86_64 NUMA memory code is inconsequent when it comes to node
memory ranges. The exact behaviour varies depending on which config option
that is used.
setup_node_bootmem() has start and end as arguments and these are used to
calculate the size of the node like this: (end - start). This is all fine
if end is pointing to the first non-available byte. The problem is that the
current x86_64 code sometimes treats it as the last present byte and sometimes
as the first non-available byte. The result is that some configurations might
lose a page at the end of the range.
This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU
so they all treat the end variable as the first non-available byte. This is
the same way as the single node code.
The patch is boot tested on dual x86_64 hardware with the above configurations,
but maybe the removed code is needed as some workaround?
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The logging for boot errors was turned off because it was broken
on some AMD systems. But give Intel EM64T systems a chance because they are
supposed to be correct there.
The advantage is that there is a chance to actually log uncorrected
machine checks after the reset.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On x86_64 arches, there is no way to choose ACPI_NUMA without having to choose
K8_NUMA. CONFIG_K8_NUMA is not needed for Intel EM64T NUMA boxes. It also
looks odd if you have to select ACPI_NUMA from the power management menu.
This patch fixes those oddities. Patch does the following:
1. Makes NUMA a config option like other arches
2. Makes topology detection options like K8_NUMA dependent on NUMA
3. Choosing ACPI NUMA detection can be done from the standard
"Processor type and features" menu
AK: I fixed up the dependencies and changed the help texts a bit
on top of Kiran's patch.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Keeping this function does not makes sense because it's a copied (and
buggy) copy of sys_time. The only difference is that now.tv_sec (which is
a time_t, i.e. a 64-bit long) is copied (and truncated) into a int
(32-bit).
The prototype is the same (they both take a long __user *), so let's drop
this and redirect it to sys_time (and make sure it exists by defining
__ARCH_WANT_SYS_TIME).
Only disadvantage is that the sys_stime definition is also compiled (may be
fixed if needed by adding a separate __ARCH_WANT_SYS_STIME macro, and
defining it for all arch's defining __ARCH_WANT_SYS_TIME except x86_64).
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current value was correct before the introduction of Intel EM64T support -
but now L1_CACHE_SHIFT_MAX can be less than L1_CACHE_SHIFT, which _is_ funny!
Between the few users of ____cacheline_maxaligned_in_smp, we also have (for
example) rcu_ctrlblk, and struct zone, with zone->{lru_,}lock. I.e. we have
a lot of excess cacheline bouncing on them.
No correctness issues, obviously. So this could even be merged for 2.6.14
(I'm not a fan of this idea, though).
CC: Andi Kleen <ak@suse.de>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not needed since x86-64 always uses the spinlock based rwsems.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
B stepping were the first shipping Opterons. memcpy/memset/copy_page/
clear_page had special optimized version for them. These are really
old and in the minority now and the difference to the generic versions
(using rep microcode) is not that big anyways. So just remove them.
TODO: figure out optimized versions for Intel Netburst based EM64T
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Old code could retry for 10 seconds worst time. Only try it
for one second now.
Suggested by Yinghai Lu
Cc: Yinghai.Lu@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the Intel cache detection code assumption that number of threads
sharing the cache will either be equal to number of HT or core siblings.
This also cleans up the code in general a bit.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fields obtained through cpuid vector 0x1(ebx[16:23]) and
vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not
always be the same as what is available and what OS sees. So make sure
"siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen
by OS instead of what cpuid instruction says. This will also fix the buggy BIOS
cases (for example where cpuid on a single core cpu says there are "2" siblings,
even when HT is disabled in the BIOS.
http://bugzilla.kernel.org/show_bug.cgi?id=4359)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When they were disabled before (e.g. after a panic) it's better
to keep them off, otherwise followon panics can happen from timer
interrupt handlers etc.
Drawback is that pageup in the console won't work anymore though.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They report 40bit, but only have 36bits of physical address space.
This caused problems with setting up the correct masks for MTRR.
CPUID workaround for steppings 0F33h(supporting x86) and 0F34h(supporting x86
and EM64T). Detail info can be found at:
http://download.intel.com/design/Xeon/specupdt/30240216.pdfhttp://download.intel.com/design/Pentium4/specupdt/30235221.pdf
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So far all new ones have worked and there isn't much variation because
the CPU does all the interesting bits.
So enable try unsupported by default.
Can be still disabled with try_unsupported=0 (module) or
amd64.try_unsupported=0 (boot option)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(no name because I'm not sure of the correct name)
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Compute the highest possible value for memnode_shift, in order to reduce
footprint of memnodemap[] to the minimum, thus making all users
(phys_to_nid(), kfree()), more cache friendly.
Before the patch :
Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 23 for the hash shift. Max adder is 3ffffffff
After the patch :
Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 33 for the hash shift.
In this case, only 2 bytes of memnodemap[] are used, instead of 2048
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows to run 64bit signal handlers in 64bit processes that run small
code snippets in compat mode.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With a NR_CPUS==128 kernel with CPU hotplug enabled we would waste 4MB
on per CPU data of all possible CPUs. The reason was that HOTPLUG
always set up possible map to NR_CPUS cpus and then we need to allocate
that much (each per CPU data is roughly ~32k now)
The underlying problem is that ACPI didn't tell us how many hotplug CPUs
the platform supports. So the old code just assumed all, which would
lead to this memory wastage.
This implements some new heuristics:
- If the BIOS specified disabled CPUs in the ACPI/mptables assume they
can be enabled later (this is bending the ACPI specification a bit,
but seems like a obvious extension)
- The user can overwrite it with a new additionals_cpus=NUM option
- Otherwise use half of the available CPUs or 2, whatever is more.
Cc: ashok.raj@intel.com
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I got some questions on this, so just fix up the documentation.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor victory on the continuous quest against all stray extern.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>