Commit Graph

7423 Commits

Author SHA1 Message Date
Paul Jackson
ef08e3b498 [PATCH] cpusets: confine oom_killer to mem_exclusive cpuset
Now the real motivation for this cpuset mem_exclusive patch series seems
trivial.

This patch keeps a task in or under one mem_exclusive cpuset from provoking an
oom kill of a task under a non-overlapping mem_exclusive cpuset.  Since only
interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
containment, there is little to gain from oom killing a task under a
non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
allocation must come from disjoint memory nodes.

This patch enables configuring a system so that a runaway job under one
mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
that might be using very high compute and memory resources for a prolonged
time.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:40 -07:00
Paul Jackson
9bf2229f88 [PATCH] cpusets: formalize intermediate GFP_KERNEL containment
This patch makes use of the previously underutilized cpuset flag
'mem_exclusive' to provide what amounts to another layer of memory placement
resolution.  With this patch, there are now the following four layers of
memory placement available:

 1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
 2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
 3) The current tasks cpuset (GFP_USER allocations constrained to here), and
 4) Specific node placement, using mbind and set_mempolicy.

These nest - each layer is a subset (same or within) of the previous.

Layer (2) above is new, with this patch.  The call used to check whether a
zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
extended to take a gfp_mask argument, and its logic is extended, in the case
that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
placement is allowed.  The definition of GFP_USER, which used to be identical
to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
cpuset_gfp_hardwall_flag patch.

GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
cpuset, so long as any node therein is not too tight on memory, but will
escape to the larger layer, if need be.

The intended use is to allow something like a batch manager to handle several
jobs, each job in its own cpuset, but using common kernel memory for caches
and such.  Swapper and oom_kill activity is also constrained to Layer (2).  A
task in or below one mem_exclusive cpuset should not cause swapping on nodes
in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
task in another such cpuset.  Heavy use of kernel memory for i/o caching and
such by one job should not impact the memory available to jobs in other
non-overlapping mem_exclusive cpusets.

This patch enables providing hardwall, inescapable cpusets for memory
allocations of each job, while sharing kernel memory allocations between
several jobs, in an enclosing mem_exclusive cpuset.

Like Dinakar's patch earlier to enable administering sched domains using the
cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
that had previously done nothing much useful other than restrict what cpuset
configurations were allowed.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:40 -07:00
Paul Jackson
f90b1d2f1a [PATCH] cpusets: new __GFP_HARDWALL flag
Add another GFP flag: __GFP_HARDWALL.

A subsequent "cpuset_zone_allowed" patch will use this flag to mark GFP_USER
allocations, and distinguish them from GFP_KERNEL allocations.

Allocations (such as GFP_USER) marked GFP_HARDWALL are constrainted to the
current tasks cpuset.  Other allocations (such as GFP_KERNEL) can steal from
the possibly larger nearest mem_exclusive cpuset ancestor, if memory is tight
on every node in the current cpuset.

This patch collides with Mel Gorman's patch to reduce fragmentation in the
standard buddy allocator, which adds two GFP flags.  This was discussed on
linux-mm in July.  Most likely, one of his flags for user reclaimable memory
can be the same as my __GFP_HARDWALL flag, under some generic name meaning its
user address space memory.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:40 -07:00
Paul Jackson
a49335ccea [PATCH] cpusets: oom_kill tweaks
This patch series extends the use of the cpuset attribute 'mem_exclusive'
to support cpuset configurations that:
 1) allow GFP_KERNEL allocations to come from a potentially larger
    set of memory nodes than GFP_USER allocations, and
 2) can constrain the oom killer to tasks running in cpusets in
    a specified subtree of the cpuset hierarchy.

Here's an example usage scenario.  For a few hours or more, a large NUMA
system at a University is to be divided in two halves, with a bunch of student
jobs running in half the system under some form of batch manager, and with a
big research project running in the other half.  Each of the student jobs is
placed in a small cpuset, but should share the classic Unix time share
facilities, such as buffered pages of files in /bin and /usr/lib.  The big
research project wants no interference whatsoever from the student jobs, and
has highly tuned, unusual memory and i/o patterns that intend to make full use
of all the main memory on the nodes available to it.

In this example, we have two big sibling cpusets, one of which is further
divided into a more dynamic set of child cpusets.

We want kernel memory allocations constrained by the two big cpusets, and user
allocations constrained by the smaller child cpusets where present.  And we
require that the oom killer not operate across the two halves of this system,
or else the first time a student job runs amuck, the big research project will
likely be first inline to get shot.

Tweaking /proc/<pid>/oom_adj is not ideal -- if the big research project
really does run amuck allocating memory, it should be shot, not some other
task outside the research projects mem_exclusive cpuset.

I propose to extend the use of the 'mem_exclusive' flag of cpusets to manage
such scenarios.  Let memory allocations for user space (GFP_USER) be
constrained by a tasks current cpuset, but memory allocations for kernel space
(GFP_KERNEL) by constrained by the nearest mem_exclusive ancestor of the
current cpuset, even though kernel space allocations will still _prefer_ to
remain within the current tasks cpuset, if memory is easily available.

Let the oom killer be constrained to consider only tasks that are in
overlapping mem_exclusive cpusets (it won't help much to kill a task that
normally cannot allocate memory on any of the same nodes as the ones on which
the current task can allocate.)

The current constraints imposed on setting mem_exclusive are unchanged.  A
cpuset may only be mem_exclusive if its parent is also mem_exclusive, and a
mem_exclusive cpuset may not overlap any of its siblings memory nodes.

This patch was presented on linux-mm in early July 2005, though did not
generate much feedback at that time.  It has been built for a variety of
arch's using cross tools, and built, booted and tested for function on SN2
(ia64).

There are 4 patches in this set:
  1) Some minor cleanup, and some improvements to the code layout
     of one routine to make subsequent patches cleaner.
  2) Add another GFP flag - __GFP_HARDWALL.  It marks memory
     requests for USER space, which are tightly confined by the
     current tasks cpuset.
  3) Now memory requests (such as KERNEL) that not marked HARDWALL can
     if short on memory, look in the potentially larger pool of memory
     defined by the nearest mem_exclusive ancestor cpuset of the current
     tasks cpuset.
  4) Finally, modify the oom killer to skip any task whose mem_exclusive
     cpuset doesn't overlap ours.

Patch (1), the one time I looked on an SN2 (ia64) build, actually saved 32
bytes of kernel text space.  Patch (2) has no affect on the size of kernel
text space (it just adds a preprocessor flag).  Patches (3) and (4) added
about 600 bytes each of kernel text space, mostly in kernel/cpuset.c, which
matters only if CONFIG_CPUSET is enabled.

This patch:

This patch applies a few comment and code cleanups to mm/oom_kill.c prior to
applying a few small patches to improve cpuset management of memory placement.

The comment changed in oom_kill.c was seriously misleading.  The code layout
change in select_bad_process() makes room for adding another condition on
which a process can be spared the oom killer (see the subsequent
cpuset_nodes_overlap patch for this addition).

Also a couple typos and spellos that bugged me, while I was here.

This patch should have no material affect.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
John Hawkes
f68f447e83 [PATCH] ia64 cpuset + build_sched_domains() mangles structures
I've already sent this to the maintainers, and this is now being sent to a
larger community audience.  I have fixed a problem with the ia64 version of
build_sched_domains(), but a similar fix still needs to be made to the
generic build_sched_domains() in kernel/sched.c.

The "dynamic sched domains" functionality has recently been merged into
2.6.13-rcN that sees the dynamic declaration of a cpu-exclusive (a.k.a.
"isolated") cpuset and rebuilds the CPU Scheduler sched domains and sched
groups to separate away the CPUs in this cpu-exclusive cpuset from the
remainder of the non-isolated CPUs.  This allows the non-isolated CPUs to
completely ignore the isolated CPUs when doing load-balancing.

Unfortunately, build_sched_domains() expects that a sched domain will
include all the CPUs of each node in the domain, i.e., that no node will
belong in both an isolated cpuset and a non-isolated cpuset.  Declaring a
cpuset that violates this presumption will produce flawed data structures
and will oops the kernel.

To trigger the problem (on a NUMA system with >1 CPUs per node):
   cd /dev/cpuset
   mkdir newcpuset
   cd newcpuset
   echo 0 >cpus
   echo 0 >mems
   echo 1 >cpu_exclusive

I have fixed this shortcoming for ia64 NUMA (with multiple CPUs per node).
A similar shortcoming exists in the generic build_sched_domains() (in
kernel/sched.c) for NUMA, and that needs to be fixed also.  The fix
involves dynamically allocating sched_group_nodes[] and
sched_group_allnodes[] for each invocation of build_sched_domains(), rather
than using global arrays for these structures.  Care must be taken to
remember kmalloc() addresses so that arch_destroy_sched_domains() can
properly kfree() the new dynamic structures.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
Brian King
38f1852759 [PATCH] block: CFQ refcounting fix
I ran across a memory leak related to the cfq scheduler. The cfq
init function increments the refcnt of the associated request_queue.

This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
only calls the elevator exit function when its refcnt goes to zero, the
request_q never gets cleaned up. It didn't look like other io schedulers were
incrementing this refcnt, so I removed the refcnt increment and it fixed the
memory leak for me.

To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
attribute to scan "- - -" repeatedly on a scsi host and watch the memory
vanish.

Signed-off-by: Brian King <brking@us.ibm.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
Max Kellermann
49e31cbac5 [PATCH] sunrpc: print unsigned integers in stats
The sunrpc stats are collected in unsigned integers, but they are printed
with '%d'.  That can result in negative numbers in /proc/net/rpc when the
highest bit of a counter is set.  The following patch changes '%d' to '%u'
where appropriate.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
John McCutchan
7ea6040b0e [PATCH] inotify: fix event loss on hardlinked files
People have run into a problem when they do this:

watch (file1, all_events);
watch (file2, some_events);

if file2 is a hard link to file1, some events will be missed because by
default we replace the mask.  The patch below adds a flag IN_MASK_ADD which
will cause inotify to add to the existing mask if present.

Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
Stephen Rothwell
8191151d09 [PATCH] Consolidate the asm-ppc*/fcntl.h files into asm-powerpc
This makes sense now that we have asm-powerpc.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
Stephen Rothwell
8d286aa5ea [PATCH] Clean up struct flock64 definitions
This patch gathers all the struct flock64 definitions (and the operations),
puts them under !CONFIG_64BIT and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
5ac353f9ba [PATCH] Clean up struct flock definitions
This patch just gathers together all the struct flock definitions except
xtensa into asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
1abf62afb6 [PATCH] Clean up the fcntl operations
This patch puts the most popular of each fcntl operation/flag into
asm-generic/fcntl.h and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
e64ca97fd8 [PATCH] Clean up the open flags
This patch puts the most popular of each open flag into asm-generic/fcntl.h
and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
2b2fa38e5f [PATCH] Consolidate asm-ppc*/fcntl.h
These two files are basically identical, so make one just include the other
(protecting the 32-bit-only parts with __powerpc64__).  Also remove some
completely unused defines.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:37 -07:00
Stephen Rothwell
9317259ead [PATCH] Create asm-generic/fcntl.h
This set of patches creates asm-generic/fcntl.h and consolidates as much as
possible from the asm-*/fcntl.h files into it.

This patch just gathers all the identical bits of the asm-*/fcntl.h files into
asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:37 -07:00
Philipp Matthias Hahn
5ba4d46dc4 [PATCH] tpm: fix tpm_atmel.c on ICH6
While installing Debian on our new IBM X41 Tablet, I tried briefly to use
the built-in Atmel TPM.  The Athmel TPM is also located on the LPC-bus of
the ICH6.  To make it work I had to apply the following patch:

Signed-off-by: Philipp Matthias Hahn <pmhahn@titan.lahn.de>
Acked-by: Kylene Jo Hall <kjhall@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:36 -07:00
Jesper Juhl
1e1a5cc77e [PATCH] isdn_v110 warning fix
Here's a small warning fix for drivers/isdn/i4l/isdn_v110.c
 drivers/isdn/i4l/isdn_v110.c:523: warning: `ret' might be used uninitialized in this function

In addition to Karsten Keil signing off on the patch, Thomas Pfeiffer also
commented on the patch, saying
"initializing ret with the value zero is correct and should be done."

Please apply.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:36 -07:00
Alex Williamson
96803820b3 [PATCH] hpet: fix drift and url
The HPET driver is using a parts per second drift factor instead of the
standard parts per million drift the time interpolator code expects.  This
patch fixes that problem and updates the URL for the HPET spec.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Cc: "Robert W. Picco" <bob.picco@hp.com>
Acked-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:36 -07:00
Antonino A. Daplas
414edcd32a [PATCH] vt: fix possible memory corruption in complement_pos
Based on a patch from Andr Pereira de Almeida <andre@cachola.com.br>

It might be possible for the saved pointer (*p) to become invalid in
between vc_resizes, so saving the screen offset instead of the screen
pointer is saner.

This bug is very hard to trigger though, but Andre probably did, if he's
submitting this patch.  Anyway, with Andre's patch, it's still possible for
the offsets to be still illegal, if the new screen size is smaller than the
old one.  So I've also added checks if the offsets are still within the
screenbuffer size.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:36 -07:00
Ralf Baechle
eed74dfcd4 [PATCH] optimise 64bit unaligned access on 32bit kernel
I've rewriten Atushi's fix for the 64-bit put_unaligned on 32-bit systems
bug to generate more efficient code.

This case has buzilla URL http://bugzilla.kernel.org/show_bug.cgi?id=5138.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:36 -07:00
Jesper Juhl
82a25b5670 [PATCH] remove verify_area(): remove fs/umsdos/notes as it only contain a verify_area related note
The file `fs/umsdos/notes' contains only a small note about a possible bug
involving verify_area().  Since umsdos is no longer in the kernel and
verify_area() is also gone, it seems to make sense that this file goes the way
of the Dodo.

After applying this patch the `fs/umsdos/' directory will be empty and can be
removed entirely.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Jesper Juhl
720a845911 [PATCH] remove verify_area(): remove or edit references to verify_area in Documentation/
Remove (or edit) remaining references to the now dead verify_area() function
from files in Documentation/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Jesper Juhl
97de50c0ad [PATCH] remove verify_area(): remove verify_area() from various uaccess.h headers
Remove the deprecated (and unused) verify_area() from various uaccess.h
headers.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Pekka Enberg
5e5d7a2229 [PATCH] pipe: remove redundant fifo_poll abstraction
Remove a redundant fifo_poll() abstraction from fs/pipe.c and adds a big
fat comment stating we set POLLERR for FIFOs too on Linux unlike most
Unices.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Kumar Gala
9c45817f41 [PATCH] Remove non-arch consumers of asm/segment.h
asm/segment.h varies greatly on different architectures but is clearly
deprecated.  Removing all non-architecture consumers will make it easier
for us to get ride of asm/segment.h all together.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:34 -07:00
Stuart McLaren
309c0a1d5d [PATCH] blk: Use blk_queue_xxx functions to set parameters
Per-queue parameters should be updated using the appropriate blk_queue_xxx
functions.

Signed-off-by: Stuart McLaren <stuart.mclaren@hp.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:34 -07:00
john stultz
b149ee2233 [PATCH] NTP: ntp-helper functions
This patch cleans up a commonly repeated set of changes to the NTP state
variables by adding two helper inline functions:

ntp_clear(): Clears the ntp state variables

ntp_synced(): Returns 1 if the system is synced with a time server.

This was compile tested for alpha, arm, i386, x86-64, ppc64, s390, sparc,
sparc64.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:34 -07:00
Ravikiran G Thirumalai
6c231b7bab [PATCH] Additions to .data.read_mostly section
Mark variables which are usually accessed for reads with __readmostly.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Pekka Enberg
39ed3fdeec [PATCH] futex: remove duplicate code
This patch cleans up the error path of futex_fd() by removing duplicate
code.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Alexey Dobriyan
580b2e3c01 [PATCH] Adapt scripts/ver_linux to new util-linux version strings
Tested with 2.12i and 2.13-pre2.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Oleg Nesterov
e752dd6cc6 [PATCH] fix send_sigqueue() vs thread exit race
posix_timer_event() first checks that the thread (SIGEV_THREAD_ID case)
does not have PF_EXITING flag, then it calls send_sigqueue() which locks
task list.  But if the thread exits in between the kernel will oops
(->sighand == NULL after __exit_sighand).

This patch moves the PF_EXITING check into the send_sigqueue(), it must be
done atomically under tasklist_lock.  When send_sigqueue() detects exiting
thread it returns -1.  In that case posix_timer_event will send the signal
to thread group.

Also, this patch fixes task_struct use-after-free in posix_timer_event.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Dave Johnson
a97c9bf33f [PATCH] fix cramfs making duplicate entries in inode cache
Every time cramfs_lookup() is called to lookup and inode for a dentry,
get_cramfs_inode() will allocate a new inode without checking to see if that
inode already exists in the inode cache.

This is fine the first time, but if the dentry cache entry(ies) associated
with that inode are aged out, but the inode entry is not aged out (which can
be quite common if the inode has buffer cache linked to it), cramfs_lookup()
will be called again and another inode will be allocated and added to the
inode cache creating a duplicate in the inode cache.

The big issue here is that the buffers associated with each inode cache entry
are not shared between the duplicates!

The older inode entries are now orphaned as no dentry points to it and won't
be freed until the buffer cache assoicated with them are first freed.  The
newest entry will have to create all new buffer cache for each part of its
file as the old buffer cache is now orphaned as well.

Patch below fixes this by making get_cramfs_inode() use the inode cache before
blindly creating a new entry every time.  This eliminates the duplicate inodes
and duplicate buffer cache.

Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Adrian Bunk
7f4bde9a34 [PATCH] remove the second arg of do_timer_interrupt()
The second arg of do_timer_interrupt() is not used in the functions, and
all callers pass NULL.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Paul Mundt <lethal@Linux-SH.ORG>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:32 -07:00
Eric Dumazet
2832e9366a [PATCH] remove file.f_maxcount
struct file cleanup: f_maxcount has an unique value (INT_MAX).  Just use
the hard-wired value.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:32 -07:00
Jesper Juhl
0730ded5be [PATCH] remove a redundant variable in sys_prctl()
The patch removes a redundant variable `sig' from sys_prctl().

For some reason, when sys_prctl is called with option == PR_SET_PDEATHSIG
then the value of arg2 is assigned to an int variable named sig.  Then sig
is tested with valid_signal() and later used to set the value of
current->pdeath_signal .

There is no reason to use this intermediate variable since valid_signal()
takes a unsigned long argument, so it can handle being passed arg2
directly, and if the call to valid_signal is OK, then we know the value of
arg2 is in the range zero to _NSIG and thus it'll easily fit in a plain int
and thus there's no problem assigning it later to current->pdeath_signal
(which is an int).

The patch gets rid of the pointless variable `sig'.
This reduces the size of kernel/sys.o in 2.6.13-rc6-mm1 by 32 bytes on my
system.

Patch has been compile tested, boot tested, and just to make damn sure I
didn't break anything I wrote a quick test app that calls
prctl(PR_SET_PDEATHSIG ...) with the entire range of values for a
unsigned long, and it behaves as expected with and without the patch.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:32 -07:00
Tejun Heo
5acd57936c [PATCH] fs: remove redundant timespec_equal test in update_atime()
In update_atime(), timespec_equal() test is done twice in succession and
the second is always false.  This patch removes the second test.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Peter Staubach
6c9c0b52b8 [PATCH] largefile support for accounting
There is a problem in the accounting subsystem in the kernel can not
correctly handle files larger than 2GB.  The output file containing the
process accounting data can grow very large if the system is large enough
and active enough.  If the 2GB limit is reached, then the system simply
stops storing process accounting data.

Another annoying problem is that once the system reaches this 2GB limit,
then every process which exits will receive a signal, SIGXFSZ.  This signal
is generated because an attempt was made to write beyond the limit for the
file descriptor.  This signal makes it look like every process has exited
due to a signal, when in fact, they have not.

The solution is to add the O_LARGEFILE flag to the list of flags used to
open the accounting file.  The rest of the accounting support is already
largefile safe.

The changes were tested by constructing a large file (just short of 2GB),
enabling accounting, and then running enough commands to cause the
accounting data generated to increase the size of the file to 2GB.  Without
the changes, the file grows to 2GB and the last command run in the test
script appears to exit due a signal when it has not.  With the changes,
things work as expected and quietly.

There are some user level changes required so that it can deal with
largefiles, but those are being handled separately.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Adrian Bunk
439c430e3d [PATCH] arm26: one -g is enough for everyone
The main Makefile is already adding -g to the CFLAGS if
CONFIG_DEBUG_INFO=y.

Not that two -g would do harm, but one works as well.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Oleg Nesterov
bc505a478d [PATCH] do_notify_parent_cldstop() cleanup
This patch simplifies the usage of do_notify_parent_cldstop(), it lessens
the source and .text size slightly, and makes the code (in my opinion) a
bit more readable.

I am sending this patch now because I'm afraid Paul will touch
do_notify_parent_cldstop() really soon, It's better to cleanup first.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Stephane Doyon
2d237c6365 [PATCH] Console blanking locking fix
I've had WARN_CONSOLE_UNLOCKED warnings when calling TIOCLINUX
TIOCL_BLANKSCREEN and TIOCL_UNBLANKSCREEN.

(I'm blind and I use a braille display.  I use those functions to blank my
laptop's screen so people don't read it, and hopefully to conserve power.)

The warnings are from these places:
do_blank_screen at drivers/char/vt.c:2754 (Not tainted)
save_screen at drivers/char/vt.c:575 (Not tainted)
do_unblank_screen at drivers/char/vt.c:2822 (Not tainted)
set_palette at drivers/char/vt.c:2908 (Not tainted)

At a glance I would think the following patch ought to fix that.  Tested on
one machine.  Could you please tell me if this is correct and/or forward
the patch where appropriate...

Signed-off-by: Stephane Doyon <s.doyon@videotron.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Neil Horman
f62c6d0a26 [PATCH] Add missing overflow check in get_blkdev_list
Patch to clean up missing overflow check in get_blkdev_list.  The printf
which adds the "Block Devices" string in /proc/devices can overflow the
presented page if get_chrdev_list eats up the entire 4k space.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:30 -07:00
Ralf Baechle
f23ef184b4 [PATCH] Delete unused do_nanosleep declaration
There is no do_nanosleep function so kill it's declaration in <linux/time.h>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:30 -07:00
Tommy S. Christensen
2de93fbf3c [PATCH] 3c59x: read current link status from phy
The phy status register must be read twice in order to get the actual link
state.

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:30 -07:00
Christoph Hellwig
c8d127418d [PATCH] remove asm-*/hdreg.h
unused and useless..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:30 -07:00
David Gibson
96d0821cac [PATCH] Fix function/macro name collision on i386 oprofile
The i386 OProfile code has a function named nmi_exit(), which collides with
the nmi_exit() macro in linux/hardirq.h.  At the moment, we get away with
it, because hardirq.h isn't included in the oprofile code.  I hit this as a
bug when working with a patch which (indirectly) adds a #include of
hardirq.h to oprofile.

Regardless, the name collision is probably not a good idea, so this patch
fixes it, renaming the oprofile function to op_nmi_exit().  It also renames
the nmi_init() and nmi_timer_init() functions similarly, for consistency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Karsten Wiese
f26fdd5992 [PATCH] CHECK_IRQ_PER_CPU() to avoid dead code in __do_IRQ()
IRQ_PER_CPU is not used by all architectures.  This patch introduces the
macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation
of dead code in __do_IRQ().

ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their
include/asm_ARCH/irq.h file.

Through grepping the tree I found the following architectures currently use
IRQ_PER_CPU:

        cris, ia64, ppc, ppc64 and parisc.

Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
H. Peter Anvin
f8eeaaf418 [PATCH] Make the bzImage format self-terminating
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Frank Sorenson <frank@tuxrocks.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Adrian Bunk
5e1efe4931 [PATCH] jffs/jffs2: remove wrong function prototypes
This patch removes prototypes for the generic_file_open and
generic_file_llseek functions.

Besides being superfluous because they are already present in fs.h, they
were also wrong because the actual functions aren't weak functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Adrian Bunk
919532a545 [PATCH] fs/Kconfig: quota help text updates
This patch contains the following updates to the help texts:
- QUOTA: most people will get the quota utilities from their
         distribution, and if not the mini-HOWTO will tell them
- QFMT_V2: quota utilities 3.01 are no longer recent, they are now
           ancient
           and 3.01 is lower than the minimal version documented in
           Documentation/Changes

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Maximilian Attems
85747f0325 [PATCH] parport: add NetMOS 9805 support
This interface is said to be commonly used in germany: "The patch has been
proven to work fine in a beige G3 Mac."
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=262324

Signed-off-by: maximilian attems <janitor@sternwelten.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:28 -07:00