Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data. This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).
On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.
This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.
[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
sys_vm86 uses a struct kernel_vm86_regs, which is identical to pt_regs, but
adds an extra space for all the segment registers. Previously this structure
was completely independent, so changes in pt_regs had to be reflected in
kernel_vm86_regs. This changes just embeds pt_regs in kernel_vm86_regs, and
makes the appropriate changes to vm86.c to deal with the new naming.
Also, since %gs is dealt with differently in the kernel, this change adjusts
vm86.c to reflect this.
While making these changes, I also cleaned up some frankly bizarre code which
was added when auditing was added to sys_vm86.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
hi,
The motivation behind the patch below was to address messages in
/var/log/messages such as:
Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing
multiple contexts (1)
Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing
multiple contexts (2)
I can reproduce by running 'get-edid' from:
http://john.fremlin.de/programs/linux/read-edid/.
These messages come about in the log b/c the vm86 calls do not exit via
the normal system call exit paths and thus do not call
'audit_syscall_exit'. The next system call will then free the context for
itself and for the vm86 context, thus generating the above messages. This
patch addresses the issue by simply adding a call to 'audit_syscall_exit'
from the vm86 code.
Besides fixing the above error messages the patch also now allows vm86
system calls to become auditable. This is useful since strace does not
appear to properly record the return values from sys_vm86.
I think this patch is also a step in the right direction in terms of
cleaning up some core auditing code. If we can correct any other paths
that do not properly call the audit exit and entries points, then we can
also eliminate the notion of context chaining.
I've tested this patch by verifying that the log messages no longer
appear, and that the audit records for sys_vm86 appear to be correct.
Also, 'read_edid' produces itentical output.
thanks,
-Jason
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I tried to send the forcedeth maintainer an email, but it came back with:
"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."
This patch fixes this.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.
The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
screen fits inside the one page table, as indeed it does.
The sh __do_page_fault: which handles both kernel faults (without lock) and
user mm faults (locked - though it set_pte without locking before).
The sh64 flush_cache_range and helpers: which wrongly thought callers held
page_table_lock before (only its tlb_start_vma did, and no longer does so);
moved the flush loop down, and adjusted the large versus small range decision
to consider a range which spans page tables as large.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 inline assembler cleanup.
This change encapsulates descriptor and task register management. Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the virtual 86 machine reaches an instruction which raises a General
Protection Fault (such as CLI or STI), the instruction is emulated (in
handle_vm86_fault). However, the emulation ignored the TF bit, so the
hardware debug interrupt was not invoked after such an emulated instruction
(and the DOS debugger missed it).
This patch fixes the problem by emulating the hardware debug interrupt as
the last action before control is returned to the VM86 program.
Signed-off-by: Petr Tesarik <kernel@tesarici.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There were still a few comments left refering to verify_area, and two
functions, verify_area_skas & verify_area_tt that just wrap corresponding
access_ok_skas & access_ok_tt functions, just like verify_area does for
access_ok - deprecate those.
There was also a few places that still used verify_area in commented-out
code, fix those up to use access_ok.
After applying this one there should not be anything left but finally
removing verify_area completely, which will happen after a kernel release
or two.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch solves VM86 interrupt emulation deadlock on SMP systems. The VM86
interrupt emulation has been heavily tested and works well on UP systems
after last update, but it seems to deadlock when we have used it on SMP/HT
boxes now.
It seems, that disable_irq() cannot be called from interrupts, because it
waits until disabled interrupt handler finishes
(/kernel/irq/manage.c:synchronize_irq():while(IRQ_INPROGRESS);). This
blocks one CPU after another. Solved by use disable_irq_nosync.
There is the second problem. If IRQ source is fast, it is possible, that
interrupt is sometimes processed and re-enabled by the second CPU, before
it is disabled by the first one, but negative IRQ disable depths are not
allowed. The spinlocking and disabling IRQs over call to
disable_irq_nosync/enable_irq is the only solution found reliable till now.
Signed-off-by: Michal Sojka <sojkam1@control.felk.cvut.cz>
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,
movl (%eax),%ds
movl %ds,(%eax)
To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,
mov (%eax),%ds
mov %ds,(%eax)
should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support
movw (%eax),%ds
movw %ds,(%eax)
without the 0x66 prefix. I am enclosing patches for 2.4 and 2.6 kernels
here. The resulting kernel binaries should be unchanged as before, with
old and new assemblers, if gcc never generates memory access for
unsigned gsindex;
asm volatile("movl %%gs,%0" : "=g" (gsindex));
If gcc does generate memory access for the code above, the upper bits
in gsindex are undefined and the new assembler doesn't allow it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!