This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.
Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Moved struct path from fs/namei.c to include/linux/namei.h. This allows many
places in the VFS, as well as any stackable filesystem to easily keep track of
dentry-vfsmount pairs.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename Reiserfs's struct path to struct treepath to prevent name collision
between it and struct path from fs/namei.c.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce several fsstack_copy_* functions which allow stackable filesystems
(such as eCryptfs and Unionfs) to easily copy over (currently only) inode
attributes. This prevents code duplication and allows for code reuse.
[akpm@osdl.org: Remove unneeded wrapper]
[bunk@stusta.de: fs/stack.c should #include <linux/fs_stack.h>]
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch replaces bitreverse() by bitrev32. The only users of bitreverse()
are crc32 itself and via-velocity.
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides two bit reverse functions and bit reverse table.
- reverse the order of bits in a u32 value
u8 bitrev8(u8 x);
- reverse the order of bits in a u32 value
u32 bitrev32(u32 x);
- byte reverse table
const u8 byte_rev_table[256];
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A warning is a warning, not a BUG.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The BUG changes in -mm3 need some arch support. This patch adds the UML
support needed. For the most part, it was stolen from the underlying
architecture. The exception is the kernel eip < PAGE_OFFSET test, which is
wrong for skas mode UMLs.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes x86-64 use the generic BUG machinery.
The main advantage in using the generic BUG machinery for x86-64 is that
the inlined overhead of BUG is just the ud2a instruction; the file+line
information are no longer inlined into the instruction stream. This
reduces cache pollution.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes i386 use the generic BUG machinery. There are no functional
changes from the old i386 implementation.
The main advantage in using the generic BUG machinery for i386 is that the
inlined overhead of BUG is just the ud2a instruction; the file+line(+function)
information are no longer inlined into the instruction stream. This reduces
cache pollution, and makes disassembly work properly.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds common handling for kernel BUGs, for use by architectures as
they wish. The code is derived from arch/powerpc.
The advantages of having common BUG handling are:
- consistent BUG reporting across architectures
- shared implementation of out-of-line file/line data
- implement CONFIG_DEBUG_BUGVERBOSE consistently
This means that in inline impact of BUG is just the illegal instruction
itself, which is an improvement for i386 and x86-64.
A BUG is represented in the instruction stream as an illegal instruction,
which has file/line information associated with it. This extra information is
stored in the __bug_table section in the ELF file.
When the kernel gets an illegal instruction, it first confirms it might
possibly be from a BUG (ie, in kernel mode, the right illegal instruction).
It then calls report_bug(). This searches __bug_table for a matching
instruction pointer, and if found, prints the corresponding file/line
information. If report_bug() determines that it wasn't a BUG which caused the
trap, it returns BUG_TRAP_TYPE_NONE.
Some architectures (powerpc) implement WARN using the same mechanism; if the
illegal instruction was the result of a WARN, then report_bug(Q) returns
CONFIG_DEBUG_BUGVERBOSE; otherwise it returns BUG_TRAP_TYPE_BUG.
lib/bug.c keeps a list of loaded modules which can be searched for __bug_table
entries. The architecture must call
module_bug_finalize()/module_bug_cleanup() from its corresponding
module_finalize/cleanup functions.
Unsetting CONFIG_DEBUG_BUGVERBOSE will reduce the kernel size by some amount.
At the very least, filename and line information will not be recorded for each
but, but architectures may decide to store no extra information per BUG at
all.
Unfortunately, gcc doesn't have a general way to mark an asm() as noreturn, so
architectures will generally have to include an infinite loop (or similar) in
the BUG code, so that gcc knows execution won't continue beyond that point.
gcc does have a __builtin_trap() operator which may be useful to achieve the
same effect, unfortunately it cannot be used to actually implement the BUG
itself, because there's no way to get the instruction's address for use in
generating the __bug_table entry.
[randy.dunlap@oracle.com: Handle BUG=n, GENERIC_BUG=n to prevent build errors]
[bunk@stusta.de: include/linux/bug.h must always #include <linux/module.h]
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
md_open takes ->reconfig_mutex which causes lockdep to complain. This
(normally) doesn't have deadlock potential as the possible conflict is with a
reconfig_mutex in a different device.
I say "normally" because if a loop were created in the array->member hierarchy
a deadlock could happen. However that causes bigger problems than a deadlock
and should be fixed independently.
So we flag the lock in md_open as a nested lock. This requires defining
mutex_lock_interruptible_nested.
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the old complex and crufty bd_mutex annotation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a sysfs and debugfs interface to the pktcdvd driver.
Look into the Documentation/ABI/testing/* files in the patch for more info.
Signed-off-by: Thomas Maier <balagi@justmail.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a bio write queue congestion control to the pktcdvd driver with
fixed on/off marks. It prevents that the driver consumes a unlimited
amount of write requests.
[akpm@osdl.org: sync with congestion_wait() renaming]
Signed-off-by: Thomas Maier <balagi@justmail.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make set_special_pids() static, the only caller is daemonize().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the locking of signal->tty.
Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand. And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.
(NOTE: sys_unshare() is broken wrt ->sighand locking rules)
Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
are governed by their open file handles. This leaves some holes for tty
access from signal->tty (or any other non file related tty access).
It solves the tty SLAB scribbles we were seeing.
(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
be examined by someone familiar with the security framework, I think
it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
invocations)
[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch supports "m32r-g00ff" bootloader for an OPSPUT platform.
Applying this patch, it is possible to do ATA-boot from an IDE drive or
HTTP-boot from network by m32r-g00ff.
* arch/m32r/boot/compressed/m32r_sio.c: Fix hangup on OPSPUT at boot.
* arch/m32r/kernel/io_opsput.c: IDE support for OPSPUT.
* arch/m32r/kernel/setup_opsput.c: ditto.
* include/asm-m32r/ide.h: ditto.
Signed-off-by: Kazuhiro Inaoka <inaoka@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is for supporting a synthesizable M32700 core for the Mappi-II FPGA
board.
On the core, location of MFT (Multi-Function Timer) registers is slightly
different from the M32700 chip.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The m32r kernel 2.6.18-rc1 or after cause build errors of "unknown isa
configuration" for userspace application programs, such as glibc, gdb, etc.
This is because the recent kernel do not include linux/config.h not to expose
kernel headers for userspace.
To fix the above compile errors, this patch fixes two headers ptrace.h and
sigcontext.h for m32r and makes them platform-independent.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the implicit addition of a virtual address
to the UDC registers. This should have been done
by ioremap() in the driver, not by a static map.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Don't set HWCAP_VFP in the processor support file; not only does it
depend on the processor features, but it also depends on the support
code being present. Therefore, only set it if the support code
detects that we have a VFP coprocessor attached.
Also, move the VFP handling of the coprocessor access register into
the VFP support code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Virtual memmap support for s390. Inspired by the ia64 implementation.
Unlike ia64 we need a mechanism which allows us to dynamically attach
shared memory regions.
These memory regions are accessed via the dcss device driver. dcss
implements the 'direct_access' operation, which requires struct pages
for every single shared page.
Therefore this implementation provides an interface to attach/detach
shared memory:
int add_shared_memory(unsigned long start, unsigned long size);
int remove_shared_memory(unsigned long start, unsigned long size);
The purpose of the add_shared_memory function is to add the given
memory range to the 1:1 mapping and to make sure that the
corresponding range in the vmemmap is backed with physical pages.
It also initialises the new struct pages.
remove_shared_memory in turn only invalidates the page table
entries in the 1:1 mapping. The page tables and the memory used for
struct pages in the vmemmap are currently not freed. They will be
reused when the next segment will be attached.
Given that the maximum size of a shared memory region is 2GB and
in addition all regions must reside below 2GB this is not too much of
a restriction, but there is room for improvement.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is now possible to enable/disable ERP related logging without re-compile
and re-ipl. A additional sysfs-attribute 'erplog' allows to switch the
logging non-interruptive.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The macb driver will probe for the PHY chip and read the mac address
from the MACB registers, so we don't need them in eth_platform_data
anymore.
Since u-boot doesn't currently initialize the MACB registers with the
mac addresses, the tag parsing code is kept but instead of sticking
the information into eth_platform_data, it uses it to initialize
the MACB registers (in case the boot loader didn't do it.) This code
should be unnecessary at some point in the future.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Rename portmux_set_func to at32_select_periph, add at32_select_gpio
and add flags parameter to specify the initial state of the pins.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
The e300c2 has no FPU. Its MSR[FP] is grounded to zero. If an attempt
is made to execute a floating point instruction (including floating-point
load, store, or move instructions), the e300c2 takes a floating-point
unavailable interrupt.
This patch adds support for FP emulation on the e300c2 by declaring a
new CPU_FTR_FP_TAKES_FPUNAVAIL, where FP unavail interrupts are
intercepted and redirected to the ProgramCheck exception path for
correct emulation handling.
(If we run out of CPU_FTR bits we could look to reclaim this bit by adding
support to test the cpu_user_features for PPC_FEATURE_HAS_FPU instead)
It adds a nop to the exception path for 32-bit processors with a FPU.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The powerpc version of pci_resource_to_user() and associated hooks
used by /proc/bus/pci and /sys/bus/pci mmap have been broken for some
time on machines that don't have a 1:1 mapping of devices (basically
on non-PowerMacs) and have PCI devices above 32 bits.
This attempts to fix it as well as possible.
The rule is supposed to be that pci_resource_to_user() always converts
the resources back into a BAR values since that's what the /proc
interface was supposed to deal with. However, for X to work on
platforms where PCI MMIO is not mapped 1:1, it became a habit of
platforms like powerpc to pass "fixed up" values there since X expects
to be able to use values from /proc/bus/pci/devices as offsets to mmap
of /dev/mem...
So we keep that contraption here, causing also /sys/*/resource to
expose fully absolute MMIO addresses instead of BAR values, which is
ugly, but should still work as long as those are only used to calculate
alignment within a page.
X is still broken when built 32 bits on machines where PCI MMIO can be
above 32-bit space unfortunately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To test for the existence of an RTAS function, we typically do:
foo_token = rtas_token("foo");
if (foo_token == RTAS_UNKNOWN_SERVICE)
return;
Add a rtas_service_present method, which provides a more conventional
boolean interface for testing the existence of an RTAS method.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ack_irq macro is unused and conflicts with James' work to template
the generic irq code. mask_irq and unmask_irq are also unused, so delete
those macros too.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current PowerPC code makes pci_unmap_addr(), pci_unmap_addr_set(),
and friends trivial for all 32-bit kernels. This is reasonable, since
for those kernels it is true that pci_unmap_single() does not need the
DMA address from the original DMA mapping -- in fact, it is a NOP.
However, I recently tried the tg3 driver on a PowerPC 440SPe machine,
which runs a 32-bit kernel and has non-cache-coherent PCI DMA. I
found that the tg3 driver crashed in pci_dma_sync_single_for_cpu(),
since for non-coherent systems, that function must invalidate the
cache for the DMA address range requested, and therefore it does use
the address passed in. tg3 uses a DMA address it stashes away with
pci_unmap_addr_set() and retrieves with pci_unmap_addr(). Of course,
since pci_unmap_addr() is defined to (0) right now, this doesn't work.
It seems to me that the tg3 driver is using pci_unmap_addr() in a
legitimate way -- I wouldn't want to have to teach all drivers that
they should use pci_unmap_addr() if they only need the address for
unmapping functions, but if they want the pci_dma_sync functions, then
they have to store the DMA address without the helper macros.
The right fix therefore seems to be in the definition of the macros in
<asm/pci.h> -- we should use the trivial versions only for 32-bit
kernels for coherent systems, and the real versions for both 64-bit
kernels and non-coherent systems.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It uses #ifdef __KERNEL__, so needs to be processed with unifdef.
Signed-off-by: Arnd Bergann <arnd.bergmann@de.ibm.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move pSeries_mach_cpu_die() into platforms/pseries/hotplug-cpu.c,
this allows rtas_stop_self() to be static so remove the prototype.
Wire up pSeries_mach_cpu_die() in the initcall, rather than statically
in setup.c, the initcall will still run prior to the cpu hotplug code
being callable, so there should be no change in behaviour.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (76 commits)
[ARM] 4002/1: S3C24XX: leave parent IRQs unmasked
[ARM] 4001/1: S3C24XX: shorten reboot time
[ARM] 3983/2: remove unused argument to __bug()
[ARM] 4000/1: Osiris: add third serial port in
[ARM] 3999/1: RX3715: suspend to RAM support
[ARM] 3998/1: VR1000: LED platform devices
[ARM] 3995/1: iop13xx: add iop13xx support
[ARM] 3968/1: iop13xx: add iop13xx_defconfig
[ARM] Update mach-types
[ARM] Allow gcc to optimise arm_add_memory a little more
[ARM] 3991/1: i.MX/MX1 high resolution time source
[ARM] 3990/1: i.MX/MX1 more precise PLL decode
[ARM] 3986/1: H1940: suspend to RAM support
[ARM] 3985/1: ixp4xx clocksource cleanup
[ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2)
[ARM] 3994/1: ixp23xx: fix handling of pci master aborts
[ARM] 3981/1: sched_clock for PXA2xx
[ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to 63 bit
[ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit period
[ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter
...
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] replace kmalloc+memset with kzalloc
[IA64] resolve name clash by renaming is_available_memory()
[IA64] Need export for csum_ipv6_magic
[IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
[PATCH] Add support for type argument in PAL_GET_PSTATE
[IA64] tidy up return value of ip_fast_csum
[IA64] implement csum_ipv6_magic for ia64.
[IA64] More Itanium PAL spec updates
[IA64] Update processor_info features
[IA64] Add se bit to Processor State Parameter structure
[IA64] Add dp bit to cache and bus check structs
[IA64] SN: Correctly update smp_affinty mask
[IA64] sparse cleanups
[IA64] IA64 Kexec/kdump
Merge:
Atmel AT91RM9200 and AT91SAM9260 changes
General ARM developments
Disconfiguous memory cleanups
64-bit/32-bit division and sched_clock extension patches
EP93xx support changes
IOP support changes
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cut down the time between requesting a reboot
and actually getting the reboot to happen by
a quarter.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It appears that include/asm-arm/bug.h requires include/linux/stddef.h
for the definition of NULL. It seems that stddef.h was always included
indirectly in most cases, and that issue was properly fixed a while ago.
Then commit 5047f09b56 incorrectly reverted
change from commit ff10952a54 (bad dwmw2)
and the problem recently resurfaced.
Because the third argument to __bug() is never used anyway, RMK suggested
getting rid of it entirely instead of readding #include <linux/stddef.h>
which this patch does.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
make allnoconfig currently fails to build because it selects DISCONTIGMEM
without VIRTUAL_MEM_MAP. I see no particular reason this combination
ought to fail, so I fixed it by:
- Including memory_model.h in all circumstances, except when both
DISCONTIGMEM and VIRTUAL_MEM_MAP are enabled.
- Defining ia64_pfn_valid() to 1 unless VIRTUAL_MEM_MAP is enabled
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
PAL_GET_PSTATE accepts a type argument to return different kinds of
frequency information.
Refer: Intel ItaniumArchitecture Software Developer's Manual -
Volume 2: System Architecture, Revision 2.2
(http://developer.intel.com/design/itanium/manuals/245318.htm)
Add the support for type argument and use Instantaneous frequency
in the acpi driver.
Also fix a bug, where in return value of PAL_GET_PSTATE was getting compared
with 'control' bits instead of 'status' bits.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The asm version is 4.4 times faster than the generic C version and
10X smaller in code size.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rev 2.2 of Volume 2 of "Intel Itanium Architecture Software Developer's
Manual" (January 2006) adds a se bit to the Processor State Parameter
fields (pages 2:299). This patch gets the structs back in sync
with the spec.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rev 2.2 of Volume 2 of "Intel Itanium Architecture Software Developer's
Manual" (January 2006) adds a dp bit to the cache_check and bus_check
fields (pages 2:401-2:404). This patch gets the structs back in sync
with the spec.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changes and updates.
1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This allows workqueue users to run just their own pending work, rather
than wait for the whole workqueue to finish running. This solves the
deadlock with networking libphy that was due to other workqueue entries
possibly needing a lock that was held by the routine that wanted to
flush its own work.
It's not wonderful: if you absolutely need to synchronize with the work
function having been executed, any user strictly speaking should have
its own completion tracking logic, since when we run things explicitly
by hand, the generic workqueue layer can no longer help us synchronize.
Also, this is strictly only usable for work that has been scheduled
without any delayed timers. You can not mix the new interface with
schedule_delayed_work().
But it's better than what we had currently.
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The iop348 processor integrates an Xscale (XSC3 512KB L2 Cache) core with a
Serial Attached SCSI (SAS) controller, multi-ported DDR2 memory
controller, 3 Application Direct Memory Access (DMA) controllers, a 133Mhz
PCI-X interface, a x8 PCI-Express interface, and other peripherals to form
a system-on-a-chip RAID subsystem engine.
The iop342 processor replaces the SAS controller with a second Xscale core
for dual core embedded applications.
The iop341 processor is the single core version of iop342.
This patch supports the two Intel customer reference platforms iq81340mc
for external storage and iq81340sc for direct attach (HBA) development.
The developer's manual is available here:
ftp://download.intel.com/design/iio/docs/31503701.pdf
Changelog:
* removed virtual addresses from resource definitions
* cleaned up some unnecessary #include's
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits)
[DLM] Clean up lowcomms
[GFS2] Change gfs2_fsync() to use write_inode_now()
[GFS2] Fix indent in recovery.c
[GFS2] Don't flush everything on fdatasync
[GFS2] Add a comment about reading the super block
[GFS2] Mount problem with the GFS2 code
[GFS2] Remove gfs2_check_acl()
[DLM] fix format warnings in rcom.c and recoverd.c
[GFS2] lock function parameter
[DLM] don't accept replies to old recovery messages
[DLM] fix size of STATUS_REPLY message
[GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
[DLM] fix add_requestqueue checking nodes list
[GFS2] Fix recursive locking in gfs2_getattr
[GFS2] Fix recursive locking in gfs2_permission
[GFS2] Reduce number of arguments to meta_io.c:getbuf()
[GFS2] Move gfs2_meta_syncfs() into log.c
[GFS2] Fix journal flush problem
[GFS2] mark_inode_dirty after write to stuffed file
[GFS2] Fix glock ordering on inode creation
...
In file included from include/asm/patch.h:14,
from arch/ia64/kernel/patch.c:10:
include/linux/elf.h:375: warning: "struct file" declared inside parameter list
include/linux/elf.h:375: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The IPMI BT subdriver has been patched to survive "long busy" timeouts seen
during firmware upgrades and resets. The patch never returns the HOSED state,
synthesizes response messages with meaningful completion codes, and recovers
gracefully when the hardware finishes the long busy. The subdriver now issues
a "Get BT Capabilities" command and properly uses those results. More
informative completion codes are returned on error from transaction starts;
this logic was propogated to the KCS and SMIC subdrivers. Finally, indent and
other style quirks were normalized.
Signed-off-by: Rocky Craig <rocky.craig@hp.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some commands and operations on a BMC can cause the BMC to "go away" for a
while. This can cause the automatic flag processing and other things of that
nature to timeout and generate annoying logs, or possibly cause other bad
things to happen when in firmware update mode.
Add detection of those commands (cold reset, warm reset, and any firmware
command) and turns off automatic processing for 30 seconds. It also add a
manual override either way.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass in the sysfs name from the lower-level IPMI driver, as the coming IPMI
serial driver will need that to link properly from the serial device sysfs
directory.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow nbd to expose the nbd-client daemon's PID in /sys/block/nbd<x>/pid.
This is helpful for tracking connection status of a device and for
determining which nbd devices are currently in use.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the ki_retried member from struct kiocb. I think the idea was
bounced around a while back, but Arnaldo pointed out another reason that we
should dig it up when he pointed out that the last cacheline of struct
kiocb only contains 4 bytes. By removing the debugging member, we save
more than the 8 byte on 64 bit machines.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The elf note saving code is currently duplicated over several
architectures. This cleanup patch simply adds code to a common file and
then replaces the arch-specific code with calls to the newly added code.
The only drawback with this approach is that s390 doesn't fully support
kexec-on-panic which for that arch leads to introduction of unused code.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the contents of the userspace asm/setup.h header consistent on all
architectures:
- export setup.h to userspace on all architectures
- export only COMMAND_LINE_SIZE to userspace
- frv: move COMMAND_LINE_SIZE from param.h
- i386: remove duplicate COMMAND_LINE_SIZE from param.h
- arm:
- export ATAGs to userspace
- change u8/u16/u32 to __u8/__u16/__u32
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- move some file_operations structs into the .rodata section
- move static strings from policy_types[] array into the .rodata section
- fix generic seq_operations usages, so that those structs may be defined
as "const" as well
[akpm@osdl.org: couple of fixes]
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"extern inline" generates a warning with -Wmissing-prototypes and I'm
currently working on getting the kernel cleaned up for adding this to the
CFLAGS since it will help us to avoid a nasty class of runtime errors.
If there are places that really need a forced inline, __always_inline would be
the correct solution.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"extern inline" generates a warning with -Wmissing-prototypes and I'm
currently working on getting the kernel cleaned up for adding this to the
CFLAGS since it will help us to avoid a nasty class of runtime errors.
If there are places that really need a forced inline, __always_inline would be
the correct solution.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add 3 more files to get "unifdef"ed when creating sanitized headers with
"make headers_install".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename a poorly worded PCI ID for the Geode GX and CS5535 companion chips.
The graphics processor and host bridge actually live in the northbridge on
the integrated processor, not in the companion chip.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a proper prototype for remove_inode_dquot_ref() in
include/linux/quotaops.h
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the locking self-test failures (of 'FAILURE' type) easier to debug by
printing more information.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the carta_random32.h header file. The carta_random32() function was
was put in and removed in favor of random32(). In the removal process, the
header file was forgotten.
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a common interface for all the subsystems to lock and unlock their
per-subsystem hotcpu mutexes.
When CONFIG_HOTPLUG_CPU is not set, these operations would be no-ops.
[akpm@osdl.org: macros -> inlines]
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass struct dev pointer to dma_cache_sync()
dma_cache_sync() is ill-designed in that it does not have a struct device
pointer argument which makes proper support for systems that consist of a
mix of coherent and non-coherent DMA devices hard. Change dma_cache_sync
to take a struct device pointer as first argument and fix all its callers
to pass it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dma_is_consistent() is ill-designed in that it does not have a struct
device pointer argument which makes proper support for systems that consist
of a mix of coherent and non-coherent DMA devices hard. Change
dma_is_consistent to take a struct device pointer as first argument and fix
the sole caller to pass it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On 32bits SMP platforms, 64bits i_size is protected by a seqcount
(i_size_seqcount).
When i_size is read or written, i_size_seqcount is read/written as well, so
it make sense to group these two fields together in the same cache line.
This patch moves i_size_seqcount next to i_size, and also moves i_version
to let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for
32bits platforms).
For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a
'long i_version' should not introduce a new hole because of padding.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the following needlessly global functions static:
- nlm_lookup_host()
- nsm_find()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the no longer used lockdep_internal().
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
smp_call_function_single() needs to be visible in non-SMP builds, to fix:
arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function 'smp_call_function_single'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer. The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag. If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.
However, the processing of this check routine takes a long time. So, this
patch introduces the garbage collection mechanism of insn_slot. It also
introduces the "dirty" flag to free_insn_slot because of efficiency.
The "clean" instruction slots (dirty flag is cleared) are released
immediately. But the "dirty" slots which are used by boosted kprobes, are
marked as garbages. collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we allocate 64k space on the user stack and use it the msgbuf for
sys_{msgrcv,msgsnd} for compat and the results are later copied in user [
by copy_in_user]. This patch introduces helper routines for
sys_{msgrcv,msgsnd} as below:
do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with
the msqid and msgflg.
do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to
the buffer. The mtype has to be copied back the user space msgbuf by the
caller.
These changes avoid the need to allocate the msgsize on the userspace (
thus removing the size limt ) and the overhead of an extra copy_in_user().
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 32 bit implementation of ktime_to_ns returns unsigned value, while the
64 bit version correctly returns an signed value. There is no current user
affected by this, but it has to be fixed, as ktime values can be negative.
Pointed-out-by: Helmut Duregger <Helmut.Duregger@student.uibk.ac.at>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has no users and it's doubtful that we'll need it again.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last thing we agreed on was to remove the macros entirely for 2.6.19,
on all architectures. Unfortunately, I think nobody actually _did_ that,
so they are still there.
[akpm@osdl.org: x86_64 fix]
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Schafer <gschafer@zip.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Workqueue functions should not leak locks, assert so, printing the
last function ran.
Use macros in lockdep.h to avoid include dependency pains.
[akpm@osdl.org: build fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement prof=sleep profiling. TASK_UNINTERRUPTIBLE sleeps will be taken
as a profile hit, and every millisecond spent sleeping causes a profile-hit
for the call site that initiated the sleep.
Sample readprofile output on i386:
306 ps2_sendbyte 1.3973
432 call_usermodehelper_keys 1.9548
484 ps2_command 0.6453
790 __driver_attach 4.7879
1593 msleep 44.2500
3976 sync_buffer 64.1290
4076 do_lookup 12.4648
8587 sync_page 122.6714
20820 total 0.0067
(NOTE: architectures need to check whether get_wchan() can be called from
deep within the wakeup path.)
akpm: we need to mark more functions __sched. lock_sock(), msleep(), others..
akpm: the contention in do_lookup() is a surprise. Presumably doing disk
reads for directory contents while holding i_mutex.
[akpm@osdl.org: various fixes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Name some of the remaning 'old_style_spin_init' locks
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is on our "Envoy" boxes which we have, according to the documentation, an
"Exar ST16C554/554D Quad UART with 16-byte Fifo's". The box also has two
other "on-board" serial ports and a modem chip.
The two on-board serial UARTs were being detected along with the first two
Exar UARTs. The last two Exar UARTs were not showing up and neither was the
modem.
This patch was the only way I could the kernel to see beyond the standard four
serial ports and get all four of the Exar UARTs to show up.
[akpm@osdl.org: build fix]
Signed-off-by: Paul B Schroeder <pschroeder@uplogix.com>
Cc: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One of the mistakes a module_param() user can make is to supply default
value of module parameter as the last argument. module_param() accepts
permissions instead. If default value is, say, 3 (-------wx), parameter
becomes world-writeable.
So far, the only remedy was to apply grep(1) and read drivers submitted
to -mm. BTDT.
With this patch applied, compiler will finally do some job.
*) bounds checking on permissions
*) world-writeable bit checking on permissions
*) compile breakage if checks trigger
First version of this check (only "& 2" part) directly caught 4 out of 7
places during my last grep.
Subject: Neverending module_param() bugs
[X] drivers/acpi/sbs.c:101:module_param(capacity_mode, int, CAPACITY_UNIT);
[X] drivers/acpi/sbs.c:102:module_param(update_mode, int, UPDATE_MODE);
[ ] drivers/acpi/sbs.c:103:module_param(update_info_mode, int, UPDATE_INFO_MODE);
[ ] drivers/acpi/sbs.c:104:module_param(update_time, int, UPDATE_TIME);
[ ] drivers/acpi/sbs.c:105:module_param(update_time2, int, UPDATE_TIME2);
[X] drivers/char/watchdog/sbc8360.c:203:module_param(timeout, int, 27);
[X] drivers/media/video/tuner-simple.c:13:module_param(offset, int, 0666);
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
public interface.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
do_exit:
taskstats_exit_alloc()
...
taskstats_exit_send()
taskstats_exit_free()
I think this is not good, let it be a single function exported to the core
kernel, taskstats_exit(), which does alloc + send + free itself.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
probe_kernel_address() purports to be generic, only it forgot to select
KERNEL_DS, so it presently won't work right on all architectures.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the parallel port (implemented as separate PCI function) on
the Oxford Semiconductor OX16PCI952.
Signed-off-by: Ryan Underwood <nemesis@icequake.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
linux/cdev.h uses struct kobject and other structs and should therefore
include them. Currently, a module either needs to add the missing includes
itself, or, in case a module includes other headers already, needs to put
<linux/cdev.h> last, which goes against a alphabetically-sorted include
list.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a DESTROY operation for block device based filesystems. With the help of
this operation, such a filesystem can flush dirty data to the device
synchronously before the umount returns.
This is needed in situations where the filesystem is assumed to be clean
immediately after unmount (e.g. ejecting removable media).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the BMAP operation for block device based filesystems. This
is needed to support swap-files and lilo.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a flag to the RELEASE message which specifies that a FLUSH operation
should be performed as well. This interface update is needed for the FreeBSD
port, and doesn't actually touch the Linux implementation at all.
Also rename the unused 'flush_flags' in the FLUSH message to 'unused'.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the signature of i_size_read(), IMINOR() and IMAJOR() because they,
or the functions they call, will never modify the argument.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stick NFS sockets in their own class to avoid some lockdep warnings. NFS
sockets are never exposed to user-space, and will hence not trigger certain
code paths that would otherwise pose deadlock scenarios.
[akpm@osdl.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a driver for the Xilinx uartlite serial controller used in boards with
the PPC405 core in the Xilinx V2P/V4 fpgas.
The hardware is very simple (baudrate/start/stopbits fixed and no break
support). See the datasheet for details:
http://www.xilinx.com/bvdocs/ipcenter/data_sheet/opb_uartlite.pdf
See http://thread.gmane.org/gmane.linux.serial/1237/ for the email thread.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the support for a large number of logical volumes. We will soon have
hardware that support up to 1024 logical volumes.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More fallout of the post 2.6.19-rc1 IRQ changes...
CC init/main.o
In file included from
/home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/include/linux/rtc.h:102,
from
/home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/include/linux/efi.h:19,
from
/home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/init/main.c:43:
/home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/include/linux/interrupt.h:67:
error: conflicting types for 'irq_handler_t'
include2/asm/irq.h:49: error: previous declaration of 'irq_handler_t' was here
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it possible to create a workqueue the worker thread of which will be
frozen during suspend, along with other kernel threads.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the loop from thaw_processes() to a separate function and call it
independently for kernel threads and user space processes so that the order
of thawing tasks is clearly visible.
Drop thaw_kernel_threads() which is never used.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp support i386 systems with PAE or without PSE.
This is done by creating temporary page tables located in resume-safe page
frames before the suspend image is restored in the same way as x86_64 does
it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Nigel Cunningham <ncunningham@linuxmail.org>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify process thawing so that we can thaw kernel space without thawing
userspace, and thaw kernelspace first. This will be useful in later
patches, where I intend to get swsusp thawing kernel threads only before
seeking to free memory.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.
[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently swsusp saves the contents of highmem pages by copying them to the
normal zone which is quite inefficient (eg. it requires two normal pages
to be used for saving one highmem page). This may be improved by using
highmem for saving the contents of saveable highmem pages.
Namely, during the suspend phase of the suspend-resume cycle we try to
allocate as many free highmem pages as there are saveable highmem pages.
If there are not enough highmem image pages to store the contents of all of
the saveable highmem pages, some of them will be stored in the "normal"
memory. Next, we allocate as many free "normal" pages as needed to store
the (remaining) image data. We use a memory bitmap to mark the allocated
free pages (ie. highmem as well as "normal" image pages).
Now, we use another memory bitmap to mark all of the saveable pages
(highmem as well as "normal") and the contents of the saveable pages are
copied into the image pages. Then, the second bitmap is used to save the
pfns corresponding to the saveable pages and the first one is used to save
their data.
During the resume phase the pfns of the pages that were saveable during the
suspend are loaded from the image and used to mark the "unsafe" page
frames. Next, we try to allocate as many free highmem page frames as to
load all of the image data that had been in the highmem before the suspend
and we allocate so many free "normal" page frames that the total number of
allocated free pages (highmem and "normal") is equal to the size of the
image. While doing this we have to make sure that there will be some extra
free "normal" and "safe" page frames for two lists of PBEs constructed
later.
Now, the image data are loaded, if possible, into their "original" page
frames. The image data that cannot be written into their "original" page
frames are loaded into "safe" page frames and their "original" kernel
virtual addresses, as well as the addresses of the "safe" pages containing
their copies, are stored in one of two lists of PBEs.
One list of PBEs is for the copies of "normal" suspend pages (ie. "normal"
pages that were saveable during the suspend) and it is used in the same way
as previously (ie. by the architecture-dependent parts of swsusp). The
other list of PBEs is for the copies of highmem suspend pages. The pages
in this list are restored (in a reversible way) right before the
arch-dependent code is called.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp use block device offsets instead of swap offsets to identify swap
locations and make it use the same code paths for writing as well as for
reading data.
This allows us to use the same code for handling swap files and swap
partitions and to simplify the code, eg. by dropping rw_swap_page_sync().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Linux kernel handles swap files almost in the same way as it handles swap
partitions and there are only two differences between these two types of swap
areas:
(1) swap files need not be contiguous,
(2) the header of a swap file is not in the first block of the partition
that holds it. From the swsusp's point of view (1) is not a problem,
because it is already taken care of by the swap-handling code, but (2) has
to be taken into consideration.
In principle the location of a swap file's header may be determined with the
help of appropriate filesystem driver. Unfortunately, however, it requires
the filesystem holding the swap file to be mounted, and if this filesystem is
journaled, it cannot be mounted during a resume from disk. For this reason we
need some other means by which swap areas can be identified.
For example, to identify a swap area we can use the partition that holds the
area and the offset from the beginning of this partition at which the swap
header is located.
The following patch allows swsusp to identify swap areas this way. It changes
swap_type_of() so that it takes an additional argument representing an offset
of the swap header within the partition represented by its first argument.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make radix tree lookups safe to be performed without locks. Readers are
protected against nodes being deleted by using RCU based freeing. Readers
are protected against new node insertion by using memory barriers to ensure
the node itself will be properly written before it is visible in the radix
tree.
Each radix tree node keeps a record of their height (above leaf nodes).
This height does not change after insertion -- when the radix tree is
extended, higher nodes are only inserted in the top. So a lookup can take
the pointer to what is *now* the root node, and traverse down it even if
the tree is concurrently extended and this node becomes a subtree of a new
root.
"Direct" pointers (tree height of 0, where root->rnode points directly to
the data item) are handled by using the low bit of the pointer to signal
whether rnode is a direct pointer or a pointer to a radix tree node.
When a reader wants to traverse the next branch, they will take a copy of
the pointer. This pointer will be either NULL (and the branch is empty) or
non-NULL (and will point to a valid node).
[akpm@osdl.org: cleanups]
[Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
[clameter@sgi.com: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we we use the lru head link of the second page of a compound page
to hold its destructor. This was ok when it was purely an internal
implmentation detail. However, hugetlbfs overrides this destructor
violating the layering. Abstract this out as explicit calls, also
introduce a type for the callback function allowing them to be type
checked. For each callback we pre-declare the function, causing a type
error on definition rather than on use elsewhere.
[akpm@osdl.org: cleanups]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_DMA is an alias of GFP_DMA. This is the last one so we
remove the leftover comment too.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_ATOMIC is an alias of GFP_ATOMIC
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_USER is an alias of GFP_USER
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NOFS is an alias of GFP_NOFS.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NOIO is an alias of GFP_NOIO with a single instance of use.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_LEVEL_MASK is only used internally to the slab and is
and alias of GFP_LEVEL_MASK.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is only used internally in the slab.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86 NUMA systems only define bootmem for node 0. alloc_bootmem_node() and
friends therefore ignore the passed pgdat and use NODE_DATA(0) in all
cases. This leads to the following warnings as we are not using the passed
parameter:
.../mm/page_alloc.c: In function 'zone_wait_table_init':
.../mm/page_alloc.c:2259: warning: unused variable 'pgdat'
One option would be to define all variables used with these macros
__attribute__ ((unused)), but this would leave us exposed should these
become genuinely unused.
The key here is that we _are_ using the value, we ignore it but that is a
deliberate action. This patch adds a nested local variable within the
alloc_bootmem_node helper to which the pgdat parameter is assigned making
it 'used'. The nested local is marked __attribute__ ((unused)) to silence
this same warning for it.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NUMA node ids are passed as either int or unsigned int almost exclusivly
page_to_nid and zone_to_nid both return unsigned long. This is a throw
back to when page_to_nid was a #define and was thus exposing the real type
of the page flags field.
In addition to fixing up the definitions of page_to_nid and zone_to_nid I
audited the users of these functions identifying the following incorrect
uses:
1) mm/page_alloc.c show_node() -- printk dumping the node id,
2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
against numa_node_id() which returns an int from cpu_to_node(), and
3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
uses bit_set which in generic code takes an int.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all uses of kmem_cache_t (the most were left in slab.h). The
typedef for kmem_cache_t is then only necessary for other kernel
subsystems. Add a comment to that effect.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The names_cachep is used for getname() and putname(). So lets put it into
fs.h near those two definitions.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs_cachep is only used in kernel/exit.c and in kernel/fork.c.
It is used to store fs_struct items so it should be placed in linux/fs_struct.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
filp_cachep is only used in fs/file_table.c and in fs/dcache.c where
it is defined.
Move it to related definitions in linux/file.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Proper place is in file.h since files_cachep uses are rated to file I/O.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
vm_area_cachep is used to store vm_area_structs. So move to mm.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move sighand_cachep definitioni to linux/signal.h
The sighand cache is only used in fs/exec.c and kernel/fork.c. It is defined
in kernel/fork.c but only used in fs/exec.c.
The sighand_cachep is related to signal processing. So add the definition to
signal.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove bio_cachep from slab.h - it no longer exists.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Node-aware allocation of skbs for the receive path.
Details:
- __alloc_skb gets a new node argument and cals the node-aware
slab functions with it.
- netdev_alloc_skb passed the node number it gets from dev_to_node
to it, everyone else passes -1 (any node)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For node-aware skb allocations we need information about the node in struct
net_device or struct device. Davem suggested to put it into struct device
which this patch does.
In particular:
- struct device gets a new int numa_node member if CONFIG_NUMA is set
- there are two new helpers, dev_to_node and set_dev_node to
transparently deal with the non-numa case
- for pci devices the node-info is set to the value we get from
pcibus_to_node.
Note that for some architectures pcibus_to_node doesn't work yet at the time
we call it currently. This is harmless and will just mean skb allocations
aren't node-local on this architectures until the implementation of
pcibus_to_node on these architectures have been updated (There are patches for
x86 and x86_64 floating around)
[akpm@osdl.org: cleanup]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have variants of kmalloc and kmem_cache_alloc that leave leak tracking to
the caller. This is used for subsystem-specific allocators like skb_alloc.
To make skb_alloc node-aware we need similar routines for the node-aware slab
allocator, which this patch adds.
Note that the code is rather ugly, but it mirrors the non-node-aware code 1:1:
[akpm@osdl.org: add module export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make kmap_atomic/kunmap_atomic denote a pagefault disabled scope. All non
trivial implementations already do this anyway.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.
Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.
(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following up with the work on shared page table done by Dave McCracken. This
set of patch target shared page table for hugetlb memory only.
The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments. In the normal
page case, the amount of memory saved from process' page table is quite
significant. For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.
With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio. One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency. These two effects contribute to higher
application performance.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an arch_alloc_page to match arch_free_page.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new swap token patches replace the current token traversal algo. The old
algo had a crude timeout parameter that was used to handover the token from
one task to another. This algo, transfers the token to the tasks that are in
need of the token. The urgency for the token is based on the number of times
a task is required to swap-in pages. Accordingly, the priority of a task is
incremented if it has been badly affected due to swap-outs. To ensure that
the token doesnt bounce around rapidly, the token holders are given a priority
boost. The priority of tasks is also decremented, if their rate of swap-in's
keeps reducing. This way, the condition to check whether to pre-empt the swap
token, is a matter of comparing two task's priority fields.
[akpm@osdl.org: cleanups]
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rearrange the struct members in the 'struct zonelist_cache' structure, so
as to put the readonly (once initialized) z_to_n[] array first, where it
will come right after the zones[] array in struct zonelist.
This pretty much eliminates the chance that the two frequently written
elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap
times, will end up on the same cache line as the performance sensitive,
frequently read, never (after init) written zones[] array.
Keeping frequently written data off frequently read cache lines is good for
performance.
Thanks to Rohit Seth for the suggestion.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize the critical zonelist scanning for free pages in the kernel memory
allocator by caching the zones that were found to be full recently, and
skipping them.
Remembers the zones in a zonelist that were short of free memory in the
last second. And it stashes a zone-to-node table in the zonelist struct,
to optimize that conversion (minimize its cache footprint.)
Recent changes:
This differs in a significant way from a similar patch that I
posted a week ago. Now, instead of having a nodemask_t of
recently full nodes, I have a bitmask of recently full zones.
This solves a problem that last weeks patch had, which on
systems with multiple zones per node (such as DMA zone) would
take seeing any of these zones full as meaning that all zones
on that node were full.
Also I changed names - from "zonelist faster" to "zonelist cache",
as that seemed to better convey what we're doing here - caching
some of the key zonelist state (for faster access.)
See below for some performance benchmark results. After all that
discussion with David on why I didn't need them, I went and got
some ;). I wanted to verify that I had not hurt the normal case
of memory allocation noticeably. At least for my one little
microbenchmark, I found (1) the normal case wasn't affected, and
(2) workloads that forced scanning across multiple nodes for
memory improved up to 10% fewer System CPU cycles and lower
elapsed clock time ('sys' and 'real'). Good. See details, below.
I didn't have the logic in get_page_from_freelist() for various
full nodes and zone reclaim failures correct. That should be
fixed up now - notice the new goto labels zonelist_scan,
this_zone_full, and try_next_zone, in get_page_from_freelist().
There are two reasons I persued this alternative, over some earlier
proposals that would have focused on optimizing the fake numa
emulation case by caching the last useful zone:
1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
have seen real customer loads where the cost to scan the zonelist
was a problem, due to many nodes being full of memory before
we got to a node we could use. Or at least, I think we have.
This was related to me by another engineer, based on experiences
from some time past. So this is not guaranteed. Most likely, though.
The following approach should help such real numa systems just as
much as it helps fake numa systems, or any combination thereof.
2) The effort to distinguish fake from real numa, using node_distance,
so that we could cache a fake numa node and optimize choosing
it over equivalent distance fake nodes, while continuing to
properly scan all real nodes in distance order, was going to
require a nasty blob of zonelist and node distance munging.
The following approach has no new dependency on node distances or
zone sorting.
See comment in the patch below for a description of what it actually does.
Technical details of note (or controversy):
- See the use of "zlc_active" and "did_zlc_setup" below, to delay
adding any work for this new mechanism until we've looked at the
first zone in zonelist. I figured the odds of the first zone
having the memory we needed were high enough that we should just
look there, first, then get fancy only if we need to keep looking.
- Some odd hackery was needed to add items to struct zonelist, while
not tripping up the custom zonelists built by the mm/mempolicy.c
code for MPOL_BIND. My usual wordy comments below explain this.
Search for "MPOL_BIND".
- Some per-node data in the struct zonelist is now modified frequently,
with no locking. Multiple CPU cores on a node could hit and mangle
this data. The theory is that this is just performance hint data,
and the memory allocator will work just fine despite any such mangling.
The fields at risk are the struct 'zonelist_cache' fields 'fullzones'
(a bitmask) and 'last_full_zap' (unsigned long jiffies). It should
all be self correcting after at most a one second delay.
- This still does a linear scan of the same lengths as before. All
I've optimized is making the scan faster, not algorithmically
shorter. It is now able to scan a compact array of 'unsigned
short' in the case of many full nodes, so one cache line should
cover quite a few nodes, rather than each node hitting another
one or two new and distinct cache lines.
- If both Andi and Nick don't find this too complicated, I will be
(pleasantly) flabbergasted.
- I removed the comment claiming we only use one cachline's worth of
zonelist. We seem, at least in the fake numa case, to have put the
lie to that claim.
- I pay no attention to the various watermarks and such in this performance
hint. A node could be marked full for one watermark, and then skipped
over when searching for a page using a different watermark. I think
that's actually quite ok, as it will tend to slightly increase the
spreading of memory over other nodes, away from a memory stressed node.
===============
Performance - some benchmark results and analysis:
This benchmark runs a memory hog program that uses multiple
threads to touch alot of memory as quickly as it can.
Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of
the total 96 GBytes on the system, and using 1, 19, 37, or 55
threads (on a 56 CPU system.) System, user and real (elapsed)
timings were recorded for each run, shown in units of seconds,
in the table below.
Two kernels were tested - 2.6.18-mm3 and the same kernel with
this zonelist caching patch added. The table also shows the
percentage improvement the zonelist caching sys time is over
(lower than) the stock *-mm kernel.
number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent
GBs N ------------ -------------- ---------------- systime
mem threads sys user real sys user real sys user real better
12 1 153 24 177 151 24 176 -2 0 -1 1%
12 19 99 22 8 99 22 8 0 0 0 0%
12 37 111 25 6 112 25 6 1 0 0 -0%
12 55 115 25 5 110 23 5 -5 -2 0 4%
38 1 502 74 576 497 73 570 -5 -1 -6 0%
38 19 426 78 48 373 76 39 -53 -2 -9 12%
38 37 544 83 36 547 82 36 3 -1 0 -0%
38 55 501 77 23 511 80 24 10 3 1 -1%
64 1 917 125 1042 890 124 1014 -27 -1 -28 2%
64 19 1118 138 119 965 141 103 -153 3 -16 13%
64 37 1202 151 94 1136 150 81 -66 -1 -13 5%
64 55 1118 141 61 1072 140 58 -46 -1 -3 4%
90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4%
90 19 2392 199 192 2116 189 176 -276 -10 -16 11%
90 37 3313 238 175 2972 225 145 -341 -13 -30 10%
90 55 1948 210 104 1843 213 100 -105 3 -4 5%
Notes:
1) This test ran a memory hog program that started a specified number N of
threads, and had each thread allocate and touch 1/N'th of
the total memory to be used in the test run in a single loop,
writing a constant word to memory, one store every 4096 bytes.
Watching this test during some earlier trial runs, I would see
each of these threads sit down on one CPU and stay there, for
the remainder of the pass, a different CPU for each thread.
2) The 'real' column is not comparable to the 'sys' or 'user' columns.
The 'real' column is seconds wall clock time elapsed, from beginning
to end of that test pass. The 'sys' and 'user' columns are total
CPU seconds spent on that test pass. For a 19 thread test run,
for example, the sum of 'sys' and 'user' could be up to 19 times the
number of 'real' elapsed wall clock seconds.
3) Tests were run on a fresh, single-user boot, to minimize the amount
of memory already in use at the start of the test, and to minimize
the amount of background activity that might interfere.
4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM.
5) Notice that the 'real' time gets large for the single thread runs, even
though the measured 'sys' and 'user' times are modest. I'm not sure what
that means - probably something to do with it being slow for one thread to
be accessing memory along ways away. Perhaps the fake numa system, running
ostensibly the same workload, would not show this substantial degradation
of 'real' time for one thread on many nodes -- lets hope not.
6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs)
ran quite efficiently, as one might expect. Each pair of threads needed
to allocate and touch the memory on the node the two threads shared, a
pleasantly parallizable workload.
7) The intermediate thread count passes, when asking for alot of memory forcing
them to go to a few neighboring nodes, improved the most with this zonelist
caching patch.
Conclusions:
* This zonelist cache patch probably makes little difference one way or the
other for most workloads on real numa hardware, if those workloads avoid
heavy off node allocations.
* For memory intensive workloads requiring substantial off-node allocations
on real numa hardware, this patch improves both kernel and elapsed timings
up to ten per-cent.
* For fake numa systems, I'm optimistic, but will have to leave that up to
Rohit Seth to actually test (once I get him a 2.6.18 backport.)
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: David Rientjes <rientjes@cs.washington.edu>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The zone table is mostly not needed. If we have a node in the page flags
then we can get to the zone via NODE_DATA() which is much more likely to be
already in the cpu cache.
In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
access an exact replica of zonetable in the node_zones field. In all of
the above cases there will be no need at all for the zone table.
The only remaining case is if in a NUMA system the node numbers do not fit
into the page flags. In that case we make sparse generate a table that
maps sections to nodes and use that table to to figure out the node number.
This table is sized to fit in a single cache line for the known 32 bit
NUMA platform which makes it very likely that the information can be
obtained without a cache miss.
For sparsemem the zone table seems to be have been fairly large based on
the maximum possible number of sections and the number of zones per node.
There is some memory saving by removing zone_table. The main benefit is to
reduce the cache foootprint of the VM from the frequent lookups of zones.
Plus it simplifies the page allocator.
[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_SMP=n:
drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable'
drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable'
Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h
includes sched.h which will need spinlock.h.
So the patch breaks the _bh declarations out into a separate header and
includes it in both interrupt.h and spinlock.h.
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enhanced resolution for time measurement functions.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add support to suspend and resume, using the
H1940's bootloader
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes an error in the numbering of the disk LEDs on the
Linksys NSLU2. The error crept in because the physical location
of the LEDs has the Disk 2 LED *above* the Disk 1 LED.
Thanks to Gordon Farquharson for reporting this.
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is done in a completely lockless fashion. Bits 0 to 31 of the count
are provided by the hardware while bits 32 to 62 are stored in memory.
The top bit in memory is used to synchronize with the hardware count
half-period. When the top bit of both counters (hardware and in memory)
differ then the memory is updated with a new value, incrementing it when
the hardware counter wraps around. Because a word store in memory is
atomic then the incremented value will always be in synch with the top
bit indicating to any potential concurrent reader if the value in memory
is up to date or not wrt the needed increment. And any race in updating
the value in memory is harmless as the same value would be stored more
than once.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On ARM all divisions have to be performed "manually". For 64-bit
divisions that may take more than a hundred cycles in many cases.
With 32-bit divisions gcc already use the recyprocal of constant
divisors to perform a multiplication, but not with 64-bit divisions.
Since the kernel is increasingly relying upon 64-bit divisions it is
worth optimizing at least those cases where the divisor is a constant.
This is what this patch does using plain C code that gets optimized away
at compile time.
For example, despite the amount of added C code, do_div(x, 10000) now
produces the following assembly code (where x is assigned to r0-r1):
adr r4, .L0
ldmia r4, {r4-r5}
umull r2, r3, r4, r0
mov r2, #0
umlal r3, r2, r5, r0
umlal r3, r2, r4, r1
mov r3, #0
umlal r2, r3, r5, r1
mov r0, r2, lsr #11
orr r0, r0, r3, lsl #21
mov r1, r3, lsr #11
...
.L0:
.word 948328779
.word 879609302
which is the fastest that can be done for any value of x in that case,
many times faster than the __do_div64 code (except for the small x value
space for which the result ends up being zero or a single bit).
The fact that this code is generated inline produces a tiny increase in
.text size, but not significant compared to the needed code around each
__do_div64 call site this code is replacing.
The algorithm used has been validated on a 16-bit scale for all possible
values, and then recodified for 64-bit values. Furthermore I've been
running it with the final BUG_ON() uncommented for over two months now
with no problem.
Note that this new code is compiled with gcc versions 4.0 or later.
Earlier gcc versions proved themselves too problematic and only the
original code is used with them.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add Tg3_FLG2_IS_NIC flag to unambiguously determine whether the
device is NIC or onboard. Previously, the EEPROM_WRITE_PROT flag was
overloaded to also mean onboard. With the separation, we can
support some devices that are onboard but do not use eeprom write
protect.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disables auditing in ipsec when CONFIG_AUDITSYSCALL is
disabled in the kernel.
Also includes a bug fix for xfrm_state.c as a result of
original ipsec audit patch.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
include/net/irda/irlan_filter.h:31: warning: 'struct seq_file' declared inside parameter list
include/net/irda/irlan_filter.h:31: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A lot of cypher modes need multiplications in GF(2^128). LRW, ABL, GCM...
I use functions from this library in my LRW implementation and I will
also use them in my ABL (Arbitrary Block Length, an unencumbered (correct
me if I am wrong, wide block cipher mode).
Elements of GF(2^128) must be presented as u128 *, it encourages automatic
and proper alignment.
The library contains support for two different representations of GF(2^128),
see the comment in gf128mul.h. There different levels of optimization
(memory/speed tradeoff).
The code is based on work by Dr Brian Gladman. Notable changes:
- deletion of two optimization modes
- change from u32 to u64 for faster handling on 64bit machines
- support for 'bbe' representation in addition to the, already implemented,
'lle' representation.
- move 'inline void' functions from header to 'static void' in the
source file
- update to use the linux coding style conventions
The original can be found at:
http://fp.gladman.plus.com/AES/modes.vc8.19-06-06.zip
The copyright (and GPL statement) of the original author is preserved.
Signed-off-by: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
128bit is a common blocksize in linux kernel cryptography, so it helps to
centralize some common operations.
The code, while mostly trivial, is based on a header file mode_hdr.h in
http://fp.gladman.plus.com/AES/modes.vc8.19-06-06.zip
The original copyright (and GPL statement) of the original author,
Dr Brian Gladman, is preserved.
Signed-off-by: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch removes the following no longer used functions:
- api.c: crypto_alg_available()
- digest.c: crypto_digest_init()
- digest.c: crypto_digest_update()
- digest.c: crypto_digest_final()
- digest.c: crypto_digest_digest()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch moves command capabilities to command flags. Other than
being cleaner, saves several bytes.
We increment the nlctrl version so as to signal to user space that
to not expect the attributes. We will try to be careful
not to do this too often ;->
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The .eh_frame section contents is never written to, so it can as well
benefit from CONFIG_DEBUG_RODATA.
Diff-ed against firstfloor tree.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Since v->counter is both read and written, it should be an output as well
as an input for the asm. The current code only gets away with this because
counter is volatile. Also, according to Documents/atomic_ops.txt,
atomic_add_return should provide a memory barrier, in particular a compiler
barrier, so the asm should be marked as clobbering memory.
Test case:
#include <stdio.h>
typedef struct { int counter; } atomic_t; /* NB: no "volatile" */
#define ATOMIC_INIT(i) { (i) }
#define atomic_read(v) ((v)->counter)
static __inline__ int atomic_add_return(int i, atomic_t *v)
{
int __i = i;
__asm__ __volatile__(
"lock; xaddl %0, %1;"
:"=r"(i)
:"m"(v->counter), "0"(i));
/* __asm__ __volatile__(
"lock; xaddl %0, %1"
:"+r" (i), "+m" (v->counter)
: : "memory"); */
return i + __i;
}
int main (void) {
atomic_t a = ATOMIC_INIT(0);
int x;
x = atomic_add_return (1, &a);
if ((x!=1) || (atomic_read(&a)!=1))
printf("fail: %i, %i\n", x, atomic_read(&a));
}
Signed-off-by: Duncan Sands <baldrick@free.fr>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Nothing in include/asm-x86_64/cpufeature.h is part of the
userspace<->kernel interface.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Idle callbacks has some races when enter_idle() sets isidle and subsequent
interrupts that can happen on that CPU, before CPU goes to idle. Due to this,
an IDLE_END can get called before IDLE_START. To avoid these races, disable
interrupts before enter_idle and make sure that all idle routines do not
enable interrupts before entering idle.
Note that poll_idle() still has a this race as it has to enable interrupts
before going to idle. But, all other idle routines have the race fixed.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Tighten the requirements on both input to and output from the Dwarf2
unwinder.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
-mregparm=3 has been enabled by default for some time on i386, and AFAIK
there aren't any problems with it left.
This patch removes the REGPARM config option and sets -mregparm=3
unconditionally.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
When using memmap kernel parameter in EFI boot we should also add to memory map
memory regions of runtime services to enable their mapping later.
AK: merged and cleaned up the patch
Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a small patch for i386 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.
changelog:
- add CPU_FEATURE_BTS
- add Branch Trace Store detection
signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a small patch for x86-64 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.
changelog:
- add CPU_FEATURE_BTS
- add Branch Trace Store detection
signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Function efi_get_time called not only during init kernel phase but also
during suspend (from get_cmos_time).
When it is called from get_cmos_time the corresponding runtime service
should be called in virtual and not in physical mode.
Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Narayanan, Chandramouli" <chandramouli.narayanan@intel.com>
Cc: "Jiossy, Rami" <rami.jiossy@intel.com>
Cc: "Satt, Shai" <shai.satt@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
quirks.
And add a PCI quirk for these platforms to check(which happens very late
during the boot) if the APIC routing is indeed set to default flat mode.
This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
selects physical mode instead of the logical flat(as needed for this errata
workaround).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Change the 'no_control' field in the cpu struct to a more positive
and better term 'hotpluggable'. And change(/cleanup) the logic accordingly.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add 'enable_cpu_hotplug' flag and when cleared, the hotplug control file
("online") will not be added under /sys/devices/system/cpu/cpuX/
Next patch doing PCI quirks will use this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Mechanism of selecting physical mode in genapic when cpu hotplug is enabled on
x86_64, broke the quirk(quirk_intel_irqbalance()) introduced for working
around the transposing interrupt message errata in E7520/E7320/E7525 (revision
ID 0x9 and below. errata #23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf).
This errata requires the mode to be in logical flat, so that interrupts can be
directed to more than one cpu(and thus use hardware IRQ balancing enabled by
BIOS on these platforms).
Following four patches fixes this by moving the quirk to early quirk and
forcing the x86_64 genapic selection to logical flat on these platforms.
Thanks to Shaohua for pointing out the breakage.
This patch:
Add write_pci_config_byte() to direct PCI access routines
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Make pmd_bad() symmetrical to pgd_bad() and pud_bad(). At once,
simplify them all.
TBD: tighten down the checks again as suggested by Hugh D.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The final line of /proc/<pid>/maps on x86_64 for native 64-bit
tasks shows an incorrect ending address and incorrect permissions. There
is only a single page mapped in this vsyscall region, and it is accessible
for both read and execute.
The patch below fixes this. (Since 32-bit-compat tasks have a real vma
with correct perms/range, no change is necessary for that scenario.)
Before the patch, a "cat /proc/self/maps | tail -1" shows this:
ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]
After the patch, this is the output:
ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]
Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
gcc doesn't support -mtune=core2 yet, but will be soon. Use -mtune=generic or -mtune=i686
as fallback
TBD need benchmarking for INTEL_USERCOPY etc. So far I used the same defaults as MPENTIUMM
Signed-off-by: Andi Kleen <ak@suse.de>
The function ptep_get_and_clear uses an atomic instruction sequence to get and
clear an active pte. Rather than add such an atomic operator to all virtual
machine implementations in paravirt-ops, it is easier to support the raw
atomic sequence and use either a trapping writable pagetable approach, or a
post-update notification. For the post update notification, we require the
pte_update function to be called after the access. Combine the 2-level and
3-level paging operators into one common function which does the post-update
notification, and rename the actual atomic sequences to raw_ptep_xxx
operators.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Make parameter names match function argument names for the yet to be defined
pte_update_defer accessor.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Move header includes for the nopud / nopmd types to the location of the actual
pte / pgd type definitions. This allows generic 4-level page type code to be
written before the split 2/3 level page table headers are included.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add the three bare TLB accessor functions to paravirt-ops. Most amusingly,
flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
Instead, I chose to indicate the actual flush type, kernel (global) vs. user
(non-global). Global in this sense means using the global bit in the page
table entry, which makes TLB entries persistent across CR3 reloads, not
global as in the SMP sense of invoking remote shootdowns, so the term is
confusingly overloaded.
AK: folded in fix from Zach for PAE compilation
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add APIC accessors to paravirt-ops. Unfortunately, we need two write
functions, as some older broken hardware requires workarounds for
Pentium APIC errata - this is the purpose of apic_write_atomic.
AK: replaced __inline with inline
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Allow selected bug checks to be skipped by paravirt kernels. The two most
important are the F00F workaround (which is either done by the hypervisor,
or not required), and the 'hlt' instruction check, which can break under
some hypervisors.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
1) Each hypervisor writes a probe function to detect whether we are
running under that hypervisor. paravirt_probe() registers this
function.
2) If vmlinux is booted with ring != 0, we call all the probe
functions (with registers except %esp intact) in link order: the
winner will not return.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Both lhype and Xen want to call the core of the x86 cpu detect code before
calling start_kernel.
(extracted from larger patch)
AK: folded in start_kernel header patch
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
It turns out that the most called ops, by several orders of magnitude,
are the interrupt manipulation ops. These are obvious candidates for
patching, so mark them up and create infrastructure for it.
The method used is that the ops structure has a patch function, which
is called for each place which needs to be patched: this returns a
number of instructions (the rest are NOP-padded).
Usually we can spare a register (%eax) for the binary patched code to
use, but in a couple of critical places in entry.S we can't: we make
the clobbers explicit at the call site, and manually clobber the
allowed registers in debug mode as an extra check.
And:
Don't abuse CONFIG_DEBUG_KERNEL, add CONFIG_DEBUG_PARAVIRT.
And:
AK: Fix warnings in x86-64 alternative.c build
And:
AK: Fix compilation with defconfig
And:
^From: Andrew Morton <akpm@osdl.org>
Some binutlises still like to emit references to __stop_parainstructions and
__start_parainstructions.
And:
AK: Fix warnings about unused variables when PARAVIRT is disabled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Create a paravirt.h header for all the critical operations which need to be
replaced with hypervisor calls, and include that instead of defining native
operations, when CONFIG_PARAVIRT.
This patch does the dumbest possible replacement of paravirtualized
instructions: calls through a "paravirt_ops" structure. Currently these are
function implementations of native hardware: hypervisors will override the ops
structure with their own variants.
All the pv-ops functions are declared "fastcall" so that a specific
register-based ABI is used, to make inlining assember easier.
And:
+From: Andy Whitcroft <apw@shadowen.org>
The paravirt ops introduce a 'weak' attribute onto memory_setup().
Code ordering leads to the following warnings on x86:
arch/i386/kernel/setup.c:651: warning: weak declaration of
`memory_setup' after first use results in unspecified behavior
Move memory_setup() to avoid this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
For both i386 and x86_64, copy from arch/$ARCH/lib/delay.c comments about the
used magic constants, plus a few other niceties.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andi Kleen <ak@suse.de>
include/asm-i386/delay.h | 5 ++++-
include/asm-x86_64/delay.h | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)
Port two patches from i386 to x86_64 delay.c to make sure all rounding is done
upward instead of downward.
There is no sign in commit messages that the mismatch was done on purpose, and
"delay() guarantees sleeping at least for the specified time" is still a valid
rule IMHO.
The original x86 patches are both from pre-GIT era, i.e.:
"[PATCH] round up in __udelay()" in commit
54c7e1f5cc6771ff644d7bc21a2b829308bd126f
"[PATCH] add 1 in __const_udelay()" in commit
42c77a9801b8877d8b90f65f75db758822a0bccc
(both commits are from converted BK repository to x86_64).
AK: fixed gcc warning
linux/arch/x86_64/lib/delay.c:43: warning: suggest parentheses around + or - inside shift
(did this actually work?)
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch makes it possible to compile Calgary in but not use it by
default. In this mode, use 'iommu=calgary' to activate it.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch cleans up the previous "Use BIOS supplied BBAR information"
patch. Mostly stylistic clenaups, but also check for ioremap failure
when we ioremap the BBAR rather than when trying to use it.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Laurent Vivier <Laurent.Vivier@bull.net>
Find the BBAR register address of each Calgary using the "Extended
BIOS Data Area" rather than calculating it ourselves. Also get the bus
topology (what PHB each bus is on) from Calgary rather than
calculating it ourselves.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=7407.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Caller of probe_kernel_address shouldn't need to know that
pka is internally implemented with __get_user. So move the
__user cast into pka.
Signed-off-by: Andi Kleen <ak@suse.de>
Code that wants to use struct desc_struct cannot do so on i386 because
desc.h contains other code that will only compile on x86_64.
So extract the structure definitions into a asm-x86_64/desc_defs.h.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andi Kleen <ak@suse.de>
include/asm-x86_64/desc.h | 53 -------------------------------
include/asm-x86_64/desc_defs.h | 69 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 70 insertions(+), 52 deletions(-)
o Now CONFIG_PHYSICAL_START is being replaced with CONFIG_PHYSICAL_ALIGN.
Hardcoding the kernel physical start value creates a problem in relocatable
kernel context due to boot loader limitations. For ex, if somebody
compiles a relocatable kernel to be run from address 4MB, but this kernel
will run from location 1MB as grub loads the kernel at physical address
1MB. Kernel thinks that I am a relocatable kernel and I should run from
the address I have been loaded at. So somebody wanting to run kernel
from 4MB alignment location (for improved performance regions) can't do
that.
o Hence, Eric proposed that probably CONFIG_PHYSICAL_ALIGN will make
more sense in relocatable kernel context. At run time kernel will move
itself to a physical addr location which meets user specified alignment
restrictions.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch modifies the i386 kernel so that if CONFIG_RELOCATABLE is
selected it will be able to be loaded at any 4K aligned address below
1G. The technique used is to compile the decompressor with -fPIC and
modify it so the decompressor is fully relocatable. For the main
kernel relocations are generated. Resulting in a kernel that is relocatable
with no runtime overhead and no need to modify the source code.
A reserved 32bit word in the parameters has been assigned
to serve as a stack so we figure out where are running.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Defining __PHYSICAL_START and __KERNEL_START in asm-i386/page.h works but
it triggers a full kernel rebuild for the silliest of reasons. This
modifies the users to directly use CONFIG_PHYSICAL_START and linux/config.h
which prevents the full rebuild problem, which makes the code much
more maintainer and hopefully user friendly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
On x86_64 we have to be careful with calculating the physical
address of kernel symbols. Both because of compiler odditities
and because the symbols live in a different range of the virtual
address space.
Having a defintition of __pa_symbol that works on both x86_64 and
i386 simplifies writing code that works for both x86_64 and
i386 that has these kinds of dependencies.
So this patch adds the trivial i386 __pa_symbol definition.
Added assembly magic similar to RELOC_HIDE as suggested by Andi Kleen.
Just picked it up from x86_64.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Ld knows about 2 kinds of symbols, absolute and section
relative. Section relative symbols symbols change value
when a section is moved and absolute symbols do not.
Currently in the linker script we have several labels
marking the beginning and ending of sections that
are outside of sections, making them absolute symbols.
Having a mixture of absolute and section relative
symbols refereing to the same data is currently harmless
but it is confusing.
This must be done carefully as newer revs of ld do not place
symbols that appear in sections without data and instead
ld makes those symbols global :(
My ultimate goal is to build a relocatable kernel. The
safest and least intrusive technique is to generate
relocation entries so the kernel can be relocated at load
time. The only penalty would be an increase in the size
of the kernel binary. The problem is that if absolute and
relocatable symbols are not properly specified absolute symbols
will be relocated or section relative symbols won't be, which
is fatal.
The practical motivation is that when generating kernels that
will run from a reserved area for analyzing what caused
a kernel panic, it is simpler if you don't need to hard code
the physical memory location they will run at, especially
for the distributions.
[AK: and merged:]
o Also put a message so that in future people can be aware of it and
avoid introducing absolute symbols.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch fixes the math emulator, which had not been adjusted
to match the changed struct pt_regs.
AK: extracted from larger patch by Jeremy.
Signed-off-by: Andi Kleen <ak@suse.de>
Use the pcurrent field in the PDA to implement the "current" macro. This ends
up compiling down to a single instruction to get the current task.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Use the cpu_number in the PDA to implement raw_smp_processor_id. This is a
little simpler than using thread_info, though the cpu field in thread_info
cannot be removed since it is used for things other than getting the current
CPU in common code.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
sys_vm86 uses a struct kernel_vm86_regs, which is identical to pt_regs, but
adds an extra space for all the segment registers. Previously this structure
was completely independent, so changes in pt_regs had to be reflected in
kernel_vm86_regs. This changes just embeds pt_regs in kernel_vm86_regs, and
makes the appropriate changes to vm86.c to deal with the new naming.
Also, since %gs is dealt with differently in the kernel, this change adjusts
vm86.c to reflect this.
While making these changes, I also cleaned up some frankly bizarre code which
was added when auditing was added to sys_vm86.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
There are a few places where the change in struct pt_regs and the use of %gs
affect the userspace ABI. These are primarily debugging interfaces where
thread state can be inspected or extracted.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch is the meat of the PDA change. This patch makes several related
changes:
1: Most significantly, %gs is now used in the kernel. This means that on
entry, the old value of %gs is saved away, and it is reloaded with
__KERNEL_PDA.
2: entry.S constructs the stack in the shape of struct pt_regs, and this
is passed around the kernel so that the process's saved register
state can be accessed.
Unfortunately struct pt_regs doesn't currently have space for %gs
(or %fs). This patch extends pt_regs to add space for gs (no space
is allocated for %fs, since it won't be used, and it would just
complicate the code in entry.S to work around the space).
3: Because %gs is now saved on the stack like %ds, %es and the integer
registers, there are a number of places where it no longer needs to
be handled specially; namely context switch, and saving/restoring the
register state in a signal context.
4: And since kernel threads run in kernel space and call normal kernel
code, they need to be created with their %gs == __KERNEL_PDA.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
When a CPU is brought up, a PDA and GDT are allocated for it. The GDT's
__KERNEL_PDA entry is pointed to the allocated PDA memory, so that all
references using this segment descriptor will refer to the PDA.
This patch rearranges CPU initialization a bit, so that the GDT/PDA are set up
as early as possible in cpu_init(). Also for secondary CPUs, GDT+PDA are
preallocated and initialized so all the secondary CPU needs to do is set up
the ldt and load %gs. This will be important once smp_processor_id() and
current use the PDA.
In all cases, the PDA is set up in head.S, before a CPU starts running C code,
so the PDA is always available.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch has the basic definitions of struct i386_pda, and the segment
selector in the GDT.
asm-i386/pda.h is more or less a direct copy of asm-x86_64/pda.h. The most
interesting difference is the use of _proxy_pda, which is used to give gcc a
model for the actual memory operations on the real pda structure. No actual
reference is ever made to _proxy_pda, so it is never defined.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add o the x86-64 tree a bunch of MSRs related to performance
monitoring for the processors based on Intel Core microarchitecture.
It also adds some architectural MSRs for PEBS. A similar patch for i386 will
follow.
changelog:
- add Intel Precise-Event Based sampling (PEBS) related MSR
- add Intel Data Save (DS) Area related MSR
- add Intel Core microarchitecure performance counter MSRs
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
i386 port of the sLeAZY-fpu feature. Chuck reports that this gives him a +/-
0.4% improvement on his simple benchmark
x86_64 description follows:
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive save/restore
all the time). However for very frequent FPU users... you take an extra trap
every context switch.
The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the trap
is avoided. (the chance of the 6th time slice using FPU after the previous 5
having done so are quite high obviously).
After 256 switches, this is reset and lazy behavior is returned (until there
are 5 consecutive ones again). The reason for this is to give apps that do
longer bursts of FPU use still the lazy behavior back after some time.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Clean up the espfix code:
- Introduced PER_CPU() macro to be used from asm
- Introduced GET_DESC_BASE() macro to be used from asm
- Rewrote the fixup code in asm, as calling a C code with the altered %ss
appeared to be unsafe
- No longer altering the stack from a .fixup section
- 16bit per-cpu stack is no longer used, instead the stack segment base
is patched the way so that the high word of the kernel and user %esp
are the same.
- Added the limit-patching for the espfix segment. (Chuck Ebbert)
[jeremy@goop.org: use the x86 scaling addressing mode rather than shifting]
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Acked-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.
Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch removes the default_ldt[] array, as it has been unused since
iBCS stopped being supported. This means it is now possible to actually
set an empty LDT segment.
In order to deal with this, the set_ldt_desc/load_LDT pair has been
replaced with a single set_ldt() operation which is responsible for both
setting up the LDT descriptor in the GDT, and reloading the LDT register.
If there are no LDT entries, the LDT register is loaded with a NULL
descriptor.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Here is a patch (used by perfmon2) to detect the presence of the Precise Event
Based Sampling (PEBS) feature for i386. The patch also adds the cpu_has_pebs
macro.
- adds X86_FEATURE_PEBS
- adds cpu_has_pebs to test for X86_FEATURE_PEBS
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Here is a patch (used by perfmon2) that renames X86_FEATURE_DTES to
X86_FEATURE_DS to match Intel's documentation for the Debug Store save area on
i386. The patch also adds cpu_has_ds.
- rename X86_FEATURE_DTES to X86_FEATURE_DS to match documentation
- adds cpu_has_ds to test for X86_FEATURE_DS
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
inline function cpu_mask_to_apicid in smp.h is duplicated with macro
in mach_apic.h.
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a patch (used by perfmon2) to detect the presence of the
Precise Event Based Sampling (PEBS) feature for Intel 64-bit processors.
The patch also adds the cpu_has_pebs macro.
changelog:
- adds X86_FEATURE_PEBS
- adds cpu_has_pebs to test for X86_FEATURE_PEBS
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a patch (used by perfmon2) that renamed X86_FEATURE_DTES
to X86_FEATURE_DS to match Intel's documentation for the Debug Store
save area. The patch also adds cpu_has_ds.
changelog:
- rename X86_FEATURE_DTES to X86_FEATURE_DS to match documentation
- adds cpu_has_ds to test for X86_FEATURE_DS
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Import updates from i386's i8259.c
[MIPS] *-berr: Header inclusions for DEC bus error handlers
[MIPS] Compile __do_IRQ() when really needed
[MIPS] genirq: use name instead of typename
[MIPS] Do not use handle_level_irq for ioasic_dma_irq_type.
[MIPS] pte_offset(dir,addr): parenthesis fix
Any code that relies on the volatile would be a bug waiting to happen
anyway.
Don't encourage people to think that putting 'volatile' on data
structures somehow fixes problems. We should always use proper locking
(and other serialization) techniques.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a resubmission of patches originally created by Ingo Molnar.
The link below is the initial (?) posting of the patch.
http://marc.theaimsgroup.com/?l=linux-kernel&m=115217423929806&w=2
Remove 'volatile' from spinlock_types as it causes GCC to generate bad
code (see link) and locking should be used on kernel data.
Signed-off-by: Art Haas <ahaas@airmail.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds missing parenthesis around 'dir' argument in pte_offset()
macro definition.
It also removes an extra space in the definition of pte_offset_kernel()
macro.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (43 commits)
sh: sh775x/titan fixes for irq header changes.
sh: update r7780rp defconfig.
sh: compile fixes for header cleanup.
sh: Fixup pte_mkhuge() build failure.
sh: set KBUILD_IMAGE to something sensible.
sh: show held locks in stack trace with lockdep.
sh: platform_pata support for R7780RP
sh: stacktrace/lockdep/irqflags tracing support.
sh: Fixup movli.l/movco.l atomic ops for gcc4.
sh: dyntick infrastructure.
sh: Clock framework tidying.
sh: Turn off IRQs around get_timer_offset() calls.
sh: Get the PGD right in oops case with 64-bit PTEs.
sh: Fix store queue bitmap end.
sh: More flexible + SH7780 earlyprintk SCIF support.
sh: Fixup various PAGE_SIZE == 4096 assumptions.
sh: Fixup 4K irq stacks.
sh: dma-api channel capability extensions.
sh: Drop name overload in dma-sh.
sh: Make dma-isa depend on ISA_DMA_API.
...
* git://git.infradead.org/users/dhowells/workq-2.6:
Actually update the fixed up compile failures.
WorkQueue: Fix up arch-specific work items where possible
WorkStruct: make allyesconfig
WorkStruct: Pass the work_struct pointer instead of context data
WorkStruct: Merge the pending bit into the wq_data pointer
WorkStruct: Typedef the work function prototype
WorkStruct: Separate delayable and non-delayable events.