It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it. I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.
The way NODEFER affects signals on other Unix boxes is as follows:
1) If NODEFER is set, other signals in sa_mask are still blocked.
2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).
The way NODEFER affects signals on Linux:
1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).
2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.
The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.
Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux. So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 floating-point exception handling has a bug that can cause error
code 0 to be sent instead of the proper code during signal delivery.
This is caused by unconditionally checking the IS and c1 bits from the
FPU status word when they are not always relevant. The IS bit tells
whether an exception is a stack fault and is only relevant when the
exception is IE (invalid operation.) The C1 bit determines whether a
stack fault is overflow or underflow and is only relevant when IS and IE
are set.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm trying to get the nmi working with my laptop (IBM ThinkPad G41) and after
debugging it a while, I found that the nmi code doesn't want to set it up for
this particular CPU.
Here I have:
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
stepping : 1
cpu MHz : 3320.084
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl est tm2 cid xtpr
bogomips : 6642.39
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
stepping : 1
cpu MHz : 3320.084
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl est tm2 cid xtpr
bogomips : 6637.46
And the following code shows:
$ cat linux-2.6.13-rc6/arch/i386/kernel/nmi.c
[...]
void setup_apic_nmi_watchdog (void)
{
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
return;
setup_k7_watchdog();
break;
case X86_VENDOR_INTEL:
switch (boot_cpu_data.x86) {
case 6:
if (boot_cpu_data.x86_model > 0xd)
return;
setup_p6_watchdog();
break;
case 15:
if (boot_cpu_data.x86_model > 0x3)
return;
Here I get boot_cpu_data.x86_model == 0x4. So I decided to change it and
reboot. I now seem to have a working NMI. So, unless there's something know
to be bad about this processor and the NMI. I'm submitting the following
patch.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Acked-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since early CPU identify is in this information is already available
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the fix to align node_end_pfns to a proper location. The earlier fix
made the node_remap_start_vaddr to get misaligned causing remap_numa_kva to
barf again :-/
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch add stubs to allow the visws subarch to link again.
Signed-off-by: Tom Duffy <thomas.duffy.99@alumni.brown.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Another x86 subarchitecture bit I missed. This adds both
machine_emergency_restart missed in my reboot fixes and
machine_shutdown needed for kexec support.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is one more bit of breakage my x86 sub-architecture
confusion caused.
Add machine_shutdown to voyager so it will compile with CONFIG_KEXEC.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] i386: Implement machine_emergency_reboot
introduced this new function into arch/i386/reboot.c. However,
subarchitectures are entitled to implement their own copies of reboot.c
from which this new function is now missing.
It looks like visws will also need a similar fixup
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a user whose apps weren't working correctly because his "rtc" wasn't
working fully.
For the sake of simplicity, it seems sensible to always enable HPET RTC
emulation.
Remove a special config option for HPET_EMULATE_RTC and make it directly
depend on HPET_TIMER and RTC. This will avoid the hangs when EMULATE_RTC
is not configured and when some userlevel script depends on RTC interrupt,
as in:
http://bugzilla.kernel.org/show_bug.cgi?id=4904
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix bug found by Grant Coady <lkml@dodo.com.au>'s autobuild setup.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We know that the randomisation slows down some workloads on Transmeta CPUs
by quite large amounts. We think it's because the CPU needs to recode the
same x86 instructions when they pop up at a different virtual address after
a fork+exec.
So disable randomization by default on those CPUs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is
trying to solve a real problem, we must not hard-code an incomplete and
insufficient approach into a syscall, because syscalls are pretty much
for eternity. I am quite strongly convinced that this syscall must not
hit v2.6.13 in its current form.
Firstly, the syscall lacks basic syscall design: e.g. it allows the
global setting of VM policy for unprivileged users. (!) [ Imagine an
Oracle installation and a SAP installation on the same NUMA box fighting
over the 'optimal' setting for this flag. What will they do? Will they
try to set the flag to their own preferred value every second or so? ]
Secondly, it was added based on a single datapoint from Martin:
http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
where Martin characterizes the numbers the following way:
' Run-to-run variability for "make -j" is huge, so these numbers aren't
terribly useful except to see that with reclaim the benchmark still
finishes in a reasonable amount of time. '
in other words: the fundamental problem has likely not been solved, only
a tendential move into the right direction has been observed, and a
handful of numbers were picked out of a set of hugely variable results,
without showing the variability data. How much variance is there
run-to-run?
I'd really suggest to first walk the walk and see what's needed to get
stable & predictable kernel compilation numbers on that NUMA box, before
adding random syscalls to tune a particular aspect of the VM ... which
approach might not even matter once the whole picture has been analyzed
and understood!
The third, most important point is that the syscall exposes VM tuning
internals in a completely unstructured way. What sense does it make to
have a _GLOBAL_ per-node setting for 'should we go to another node for
reclaim'? If then it might make sense to do this per-app, via numalib or
so.
The change is minimalistic in that it doesnt remove the syscall and the
underlying infrastructure changes, only the user-visible changes. We
could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
/proc/sys/vm/swappiness, but even that looks quite counterproductive
when the generic approach is that we are trying to reduce the number of
external factors in the VM balance picture.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add reference count and disable ACPI PCI Interrupt Link
when no device still uses it.
Warn when drivers have not released Link at suspend time.
http://bugzilla.kernel.org/show_bug.cgi?id=3469
Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Otherwise a platform that supports ACPI based cpufreq
and boots up at lowest possible speed could stay there
forever. This because the governor may request max speed,
but the code doesn't update if there is no change in
speed, and it assumed the initial state of max speed.
http://bugzilla.kernel.org/show_bug.cgi?id=4634
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The patch addresses a problem with ACPI SCI interrupt entry, which gets
re-used, and the IRQ is assigned to another unrelated device. The patch
corrects the code such that SCI IRQ is skipped and duplicate entry is
avoided. Second issue came up with VIA chipset, the problem was caused by
original patch assigning IRQs starting 16 and up. The VIA chipset uses
4-bit IRQ register for internal interrupt routing, and therefore cannot
handle IRQ numbers assigned to its devices. The patch corrects this
problem by allowing PCI IRQs below 16.
Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While reserving KVA for lmem_maps of node, we have to make sure that
node_remap_start_pfn[] is aligned to a proper pmd boundary.
(node_remap_start_pfn[] gets its value from node_end_pfn[])
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:740: warning: unused variable `vid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:739: warning: unused variable `fid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:743: warning: unused variable `vid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:742: warning: unused variable `fid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `fid' undeclared (first use in this function)
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `vid' undeclared (first use in this function)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
For some reason I was telling my inline assembly that the
input argument was an output argument.
Playing in the trampoline code I have seen a couple of
instances where lgdt get the wrong size (because the
trampolines run in 16bit mode) so use lgdtl and lidtl to
be explicit.
Additionally gcc-3.3 and gcc-3.4 want's an lvalue for a
memory argument and it doesn't think an array of characters
is an lvalue so use a packed structure instead.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somewhere recently, the TSC got re-enabled for timekeeping on NUMAQ
machines. However, the hardware makes these get unsynchronized quite
badly. So badly, in fact, that the code to fix up the skew can just hang
on boot.
This patch re-disables them. It's nicely confined to the numaq.c file. It
would be great if this could make it into 2.6.13, I think it counts as a
bugfix.
Tested on a 16-proc 4-node NUMAQ.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
can be encoded in the current driver's 4 bit frequency
field. This patch updates the driver to support Rev F
including 6 bit FIDs and processor ID updates.
This should apply cleanly whether or not the dual-core
bugfix I sent out last week is applied. I'd prefer
that both get applied, of course.
Signed-off-by: David Keck <david.keck@amd.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
each core be created in the _cpu_init function
call. The cpufreq infrastructure doesn't call
_cpu_init for the second core in each processor.
Some systems crashed when _get was called with
an odd-numbered core because it tried to
dereference a NULL pointer since the data
structure had not been created.
The attached patch solves the problem by
initializing data structures for all shared
cores in the _cpu_init function. It should
apply to 2.6.12-rc6 and has been tested by
AMD and Sun.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch:
http://marc.theaimsgroup.com/?l=bk-commits-head&m=111955644929114&w=2
uncovered a k7m bios bug, where the VT82C686A router is reported as
being "586-compatible". The two chips have different pirq mapping, so
this leads to "irq routing conflict" on many pci devices.
The suggested fix was discussed with Aleksey Gorelov, who helped me
to identify the problem as a probable bios bug.
Signed-off-by: Giancarlo Formicuccia <giancarlo.formicuccia@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_get_thread_area does not memset to 0 its struct user_desc info before
copying it to user space... since sizeof(struct user_desc) is 16 while the
actual datas which are filled are only 12 bytes + 9 bits (across the
bitfields), there is a (small) information leak.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no help text for CONFIG_DEBUG_STACKOVERFLOW - add one.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
powernow-k8.c: In function `query_current_values_with_pending_wait':
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
machine_power_off now always switches to the boot cpu so there
is no reason for APM to also do that.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Call machine_shutdown() to move to the boot cpu
and disable apics. Both acpi_power_off and
apm_power_off want to move to the boot cpu.
and we are already disabling the local apics
so calling machine_shutdown simply reuses
code.
ia64 doesn't have a special path in power_off
for efi so there is no reason i386 should. If
we really need to call the efi power off path
the efi driver can set pm_power_off like everyone
else.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This appears to be a typo I introduced when cleaning
this code up earlier. Ooops.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_cpus_allowed is not safe in interrupt context
and disabling apics is complicated code so don't
call machine_shutdown on i386 from emergency_restart().
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with. Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is no longer valid to not replace instructions, since we depend on
different behaviour depending on CPU capabilities.
If you need to limit the capabilities of the replacements (because the
boot CPU has features that non-boot CPU's do not have, for example), you
need to explicitly disable those capabilities that are not shared across
all CPU's.
For example, if your boot CPU has FXSR, but other CPU's in your system
do not, you need to use the "nofxsr" kernel command line, not disable
instruction replacement per se.
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.
This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
These days %gs is normally the TLS segment, so it's no longer zero. As
a result, we shouldn't just assume that %fs/%gs tend to be zero
together, but test them independently instead.
Also, fix setting of debug registers to use the "next" pointer instead
of "current". It so happens that the scheduler will have set the new
current pointer before calling __switch_to(), but that's just an
implementation detail.
More fallout from the i386_ksyms.c cleanups.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch:
[PATCH] Remove i386_ksyms.c, almost
made files like smp.c do their own EXPORT_SYMBOLS. This means that all
subarchitectures that override these symbols now have to do the exports
themselves. This patch adds the exports for voyager (which is the most
affected since it has a separate smp harness). However, someone should
audit all the other subarchitectures to see if any others got broken.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.
To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.
Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
http://bugme.osdl.org/show_bug.cgi?id=4016
Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.
If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated. In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.
The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>