This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In pre-cgroup cpusets, a few config files enabled cpusets by default.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses vm_get_page_prot() to setup vma->vm_page_prot.
Though inside vm_get_page_prot() the protection flags is AND with
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).
This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/ia64/kernel/machine_kexec.c: In function `arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:131: error: `pgdat_list' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:131: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:131: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:134: error: `node_memblk' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:135: error: `NR_NODE_MEMBLKS' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:136: error: invalid application of `sizeof' to incomplete type `node_memblk_s'
arch/ia64/kernel/machine_kexec.c:137: error: dereferencing pointer to incomplete type
arch/ia64/kernel/machine_kexec.c:138: error: dereferencing pointer to incomplete type
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them. So these names should be changed. This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.
makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.
To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)
Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)
And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It makes more sense to make instrumentation support experimental on a
case-by-case basis.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
list_del() hardly can fail, so checking for return value is pointless
(and current code always return 0).
Nobody really cared that return value anyway.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
allows for more flexibility in the note type for the state of 'extended
floating point' implementations in coredumps. New note types can now be
added with an appropriate #define.
This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
current users so there's are no change in behaviour.
This will let us use different note types on powerpc for the Altivec/VMX
state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
(signal processing extension) state that some embedded PowerPC cpus from
Freescale have.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sg list elements might not be continuous.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
d5a7430ddc missed a spot where we
use cpu_sibling_map and cpu_core_map. These don't exist on a
uni-processor build. Wrap #ifdef CONFIG_SMP ... #endif around it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: introduce ccflags-y, asflags-y and ldflags-y
kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
kbuild: enable use of AFLAGS and CFLAGS on commandline
kbuild: enable 'make AFLAGS=...' to add additional options to AS
kbuild: fix AFLAGS use in h8300 and m68knommu
kbuild: check for wrong use of CFLAGS
kbuild: enable 'make CFLAGS=...' to add additional options to CC
kbuild: fix up CFLAGS usage
kbuild: make modpost detect unterminated device id lists
kbuild: call export_report from the Makefile
kbuild: move Kai Germaschewski to CREDITS
kconfig/menuconfig: distinguish between selected-by-another options and comments
kconfig: tristate choices with mixed tristate and boolean values
include/linux/Kbuild: remove duplicate entries
kbuild: kill backward compatibility checks
kbuild: kill EXTRA_ARFLAGS
kbuild: fix documentation in makefiles.txt
kbuild: call make once for all targets when O=.. is used
kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
kbuild: update _shipped files for kconfig syntax cleanup
...
Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.
- fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
- For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
which returns -EINVAL.
- removed remove_pages() only used in powerpc.
- removed no-op remove_memory() in i386, sh, sparc64, x86_64.
- only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
to return -EINVAL.
Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Logic.
- set all pages in [start,end) as isolated migration-type.
by this, all free pages in the range will be not-for-use.
- Migrate all LRU pages in the range.
- Test all pages in the range's refcnt is zero or not.
Todo:
- allocate migration destination page from better area.
- confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
(I don't like this kind of page but..
- Find out pages which cannot be migrated.
- more running tests.
- Use reclaim for unplugging other memory type area.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently mobility grouping works at the MAX_ORDER_NR_PAGES level. This makes
sense for the majority of users where this is also the huge page size.
However, on platforms like ia64 where the huge page size is runtime
configurable it is desirable to group at a lower order. On x86_64 and
occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.
This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It
uses a compile-time constant if possible and a variable where the huge page
size is runtime configurable.
It is assumed that grouping should be done at the lowest sensible order and
that the user would not want to override this. If this is not true,
page_block order could be forced to a variable initialised via a boot-time
kernel parameter.
One potential issue with this patch is that IA64 now parses hugepagesz with
early_param() instead of __setup(). __setup() is called after the memory
allocator has been initialised and the pageblock bitmaps already setup. In
tests on one IA64 there did not seem to be any problem with using
early_param() and in fact may be more correct as it guarantees the parameter
is handled before the parsing of hugepages=.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte(). This is too late. This patch removes lazy_mmu_prot_update and
add modfied set_pte() for flushing if necessary.
This patch flush icache of a page when
new pte has exec bit.
&& new pte has present bit
&& new pte is user's page.
&& (old *ptep is not present
|| new pte's pfn is not same to old *ptep's ptn)
&& new pte's page has no Pg_arch_1 bit.
Pg_arch_1 is set when a page is cache consistent.
I think this condition checks are much easier to understand than considering
"Where sync_icache_dcache() should be inserted ?".
pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
clean-up. So, I added it again.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The checks for node_online in the uncached allocator are made to make sure
that memory is available on these nodes. Thus switch all the checks to use
N_HIGH_MEMORY and to N_ONLINE.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.
Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or be otherwise handled, and makes it very obvious
that something has gone wrong.
This change allows the entire process group to be taken down, rather
than just the one thread.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Equip IA64 sparsemem with a virtual memmap. This is similar to the existing
CONFIG_VIRTUAL_MEM_MAP functionality for DISCONTIGMEM. It uses a PAGE_SIZE
mapping.
This is provided as a minimally intrusive solution. We split the 128TB
VMALLOC area into two 64TB areas and use one for the virtual memmap.
This should replace CONFIG_VIRTUAL_MEM_MAP long term.
[apw@shadowen.org: convert to new helper based initialisation]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This option is true if a low-level driver can support sg
chaining. This will be removed eventually when all the drivers are
converted to support sg chaining. q->max_phys_segments is set to
SCSI_MAX_SG_SEGMENTS if false.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The variable CPPFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
This patch replace use of CPPFLAGS with KBUILD_CPPFLAGS all over the
tree and enabling one to use:
make CPPFLAGS=...
to specify additional CPP commandline options.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k, s390
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Update defonfig file for sn2 to match recent changes in config options.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The variable CFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
tree and enabling one to use:
make CFLAGS=...
to specify additional gcc commandline options.
One usecase is when trying to find gcc bugs but other
use cases has been requested too.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k
Test was simple to do a defconfig build, apply the patch and check
that nothing got rebuild.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
PM: merge device power-management source files
sysfs: add copyrights
kobject: update the copyrights
kset: add some kerneldoc to help describe what these strange things are
Driver core: rename ktype_edd and ktype_efivar
Driver core: rename ktype_driver
Driver core: rename ktype_device
Driver core: rename ktype_class
driver core: remove subsystem_init()
sysfs: move sysfs file poll implementation to sysfs_open_dirent
sysfs: implement sysfs_open_dirent
sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
sysfs: make sysfs_root a regular directory dirent
sysfs: open code sysfs_attach_dentry()
sysfs: make s_elem an anonymous union
sysfs: make bin attr open get active reference of parent too
sysfs: kill unnecessary NULL pointer check in sysfs_release()
sysfs: kill unnecessary sysfs_get() in open paths
sysfs: reposition sysfs_dirent->s_mode.
sysfs: kill sysfs_update_file()
...
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Don't take semaphore in cpufreq_quick_get()
[CPUFREQ] Support different families in fid/did to frequency conversion
[CPUFREQ] cpufreq_stats: misc cpuinit section annotations
[CPUFREQ] implement !CONFIG_CPU_FREQ stub for cpufreq_unregister_notifier()
[CPUFREQ] mark hotplug notifier callback as __cpuinit
[CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)
[CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default
[CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core
[CPUFREQ] Longhaul - Add support for PM133 northbridge
[CPUFREQ] x86: use num_online_nodes to get physical cpus numbers for
Fix the problem that kdump on INIT hung up if kdump kernel image is
not configured.
The kdump_init_notifier() on monarch CPU stops its operation at
DIE_INIT_MONARCH_LEAVE time if the kdump kernel image is not
configured. On the other hand, kdump_init_notifier() on non-monarch
CPUs get into spin because they don't know the fact the monarch stops
its operation. This is the cause of this problem. To fix this problem,
we need to check the kdump kernel image at the top of the
kdump_init_notifier() function.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that kdump on INIT causes a kernel panic if kdump
kernel image is not configured. The cause of this problem is
machine_kexec_on_init() is using printk in INIT context. It should
use ia64_mca_printk() instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The use of vector in ia64_machine_kexec() seems spurious,
and removing it simplifies the code slightly.
As suggested by Alex Williamson <alex.williamson@hp.com>
Cc: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Additional testing uncovered a situation where the MCA recovery code could
hang due to a race condition.
According to the SAL spec, SAL sends a rendezvous interrupt to all but the first
CPU that goes into MCA. This includes other CPUs that go into MCA at the same
time. Those other CPUs will go into the linux MCA handler (rather than the
slave loop) with the rendezvous interrupt pending. When all the CPUs have
completed MCA processing and the last monarch completes, freeing all the CPUs,
the CPUs with the pended rendezvous interrupt then go into the
ia64_mca_rendez_int_handler(). In ia64_mca_rendez_int_handler() the CPUs
get marked as rendezvoused, but then leave the handler (due to no MCA).
That leaves the CPUs marked as rendezvoused _before_ the next MCA event.
When the next MCA hits, the monarch will mistakenly believe that all the CPUs
are rendezvoused when they are not, opening up a window where a CPU can get
stuck in the slave loop.
This patch avoids leaving CPUs marked as rendezvoused when they are not.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing the MCA recovery code, noticed that some machines would have a
five second delay rendezvousing cpus. What was happening is that
ia64_wait_for_slaves() would check to see if all the slave CPUs had
rendezvoused. If any had not, it would wait 1 millisecond then check again.
If any CPUs had still not rendezvoused, it would wait 5 seconds before
checking again.
On some configs the rendezvous takes more than 1 millisecond, causing the code
to wait the full 5 seconds, even though the last CPU rendezvoused after only
a few milliseconds.
The fix is to check every 1 millisecond to see if all the cpus have
rendezvoused. After 5 seconds the code concludes the CPUs will never
rendezvous (same as before).
The MCA code is, by definition, not performance critical, but a needless
delay of 5 seconds is senseless. The 5 seconds also adds up quickly
when running the error injection code in a loop.
This patch both simplifies the code and removes the needless delay.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This driver for HPQ5001 devices installs a global ACPI OpRegion handler.
AML methods can use this OpRegion to call native firmware entry points.
ACPI does not define a mechanism for AML methods to call native firmware
interfaces such as PAL or SAL. This OpRegion handler adds such a mechanism.
After the handler is installed, an AML method can call native firmware by
storing the arguments and firmware entry point to specific offsets in the
OpRegion. When AML reads the "return value" offset from the OpRegion, this
handler loads up the arguments, makes the firmware call, and returns the
result.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.
Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Because it is dead code and not referenced by anybody else (that file cannot
be built modular).
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* palinfo.c:
palinfo_cpu_notifier is a CPU hotplug notifier_block, and can be
marked __cpuinitdata, and the callback function palinfo_cpu_callback()
itself can be marked __cpuinit. create_palinfo_proc_entries() is only
called from __cpuinit callback or general __init code, therefore a
candidate for __cpuinit itself. remove_palinfo_proc_entries() is only
called from __cpuinit callback or general __exit code, therefore a
candidate for __cpuexit.
* salinfo.c:
The CPU hotplug notifier_block can be __cpuinitdata. The callback
salinfo_cpu_callback() is incorrectly marked __devinit -- it must
be __cpuinit instead.
* topology.c:
cache_sysfs_init() is only called at device_initcall() time so marking
it as __cpuinit is wrong and wasteful. It should be unconditionally
__init. Also cleanup reference to hotplug notifier callback function
from this function and replace with cache_add_dev(), which could also
enable us to use other tricks to replace __cpuinit{data} annotations,
as recently discussed on this list.
cache_shared_cpu_map_setup() is only ever called from __cpuinit-marked
functions hence both its definitions (SMP or !SMP) are candidates for
__cpuinit itself. Also all_cpu_cache_info can be __cpuinitdata because
only referenced from __cpuinit code.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changing the global CPPFLAGS is not the recommended way
to add additional include dirs.
Changed to use EXTRA_CFLAGS.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Jes Sorensen <jes@sgi.com>
If scsi_add_host returned an error, the host would never be freed.
We need to call scsi_host_put() if an error happens.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits)
[SKY2]: status polling loop (post merge)
[NET]: Fix NAPI completion handling in some drivers.
[TCP]: Limit processing lost_retrans loop to work-to-do cases
[TCP]: Fix lost_retrans loop vs fastpath problems
[TCP]: No need to re-count fackets_out/sacked_out at RTO
[TCP]: Extract tcp_match_queue_to_sack from sacktag code
[TCP]: Kill almost unused variable pcount from sacktag
[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
[TCP]: Add bytes_acked (ABC) clearing to FRTO too
[IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
[NETFILTER]: x_tables: add missing ip6t_modulename aliases
[NETFILTER]: nf_conntrack_tcp: fix connection reopening
[QETH]: fix qeth_main.c
[NETLINK]: fib_frontend build fixes
[IPv6]: Export userland ND options through netlink (RDNSS support)
[9P]: build fix with !CONFIG_SYSCTL
[NET]: Fix dev_put() and dev_hold() comments
[NET]: make netlink user -> kernel interface synchronious
[NET]: unify netlink kernel socket recognition
[NET]: cleanup 3rd argument in netlink_sendskb
...
Fix up conflicts manually in Documentation/feature-removal-schedule.txt
and my new least favourite crap, the "mod_devicetable" support in the
files include/linux/mod_devicetable.h and scripts/mod/file2alias.c.
(The latter files seem to be explicitly _designed_ to get conflicts when
different subsystems work with them - that have an absolutely horrid
lack of subsystem separation!)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>