Commit Graph

71160 Commits

Author SHA1 Message Date
Eric W. Biederman
49ffcf8f99 sysctl: update sysctl_check_table
Well it turns out after I dug into the problems a little more I was returning
a few false positives so this patch updates my logic to remove them.

- Don't complain about 0 ctl_names in sysctl_check_binary_path
  It is valid for someone to remove the sysctl binary interface
  and still keep the same sysctl proc interface.

- Count ctl_names and procnames as matching if they both don't
  exist.

- Only warn about missing min&max when the generic functions care.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
fc6cd25b73 sysctl: Error on bad sysctl tables
After going through the kernels sysctl tables several times it has become
clear that code review and testing is just not effective in prevent
problematic sysctl tables from being used in the stable kernel.  I certainly
can't seem to fix the problems as fast as they are introduced.

Therefore this patch adds sysctl_check_table which is called when a sysctl
table is registered and checks to see if we have a problematic sysctl table.

The biggest part of the code is the table of valid binary sysctl entries, but
since we have frozen our set of binary sysctls this table should not need to
change, and it makes it much easier to detect when someone unintentionally
adds a new binary sysctl value.

As best as I can determine all of the several hundred errors spewed on boot up
now are legitimate.

[bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
f429cd37a2 sysctl: properly register the irda binary sysctl numbers
Grumble.  These numbers should have been in sysctl.h from the beginning if we
ever expected anyone to use them.  Oh well put them there now so we can find
them and make maintenance easier.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
c65f92398e sysctl: remove the cad_pid binary sysctl path
It looks like we inadvertently killed the cad_pid binary sysctl support when
cap_pid was changed to be a struct pid.  Since no one has complained just
remove the binary path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
064b5bba0c sysctl: remove broken netfilter binary sysctls
No one has bothered to set strategy routine for the the netfilter sysctls that
return jiffies to be sysctl_jiffies.

So it appears the sys_sysctl path is unused and untested, so this patch
removes the binary sysctl numbers.

Which fixes the netfilter oops in 2.6.23-rc2-mm2 for me.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
35834ca1e4 sysctl: simplify the pty sysctl logic
Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl
logic into drivers/char/pty.c

As well as cleaning up the logic this prevents sysctl_check_table from
complaining that the root table has a NULL data pointer on something with
generic methods.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
25398a158d sysctl: parport remove binary paths
The sysctl binary paths don't look as if they even code work, .data is not
filled in, and all of the proc_handlers look at extra1 and there is not
strategy routine.

So just kill the binary paths.

In addition this patch removes the setting of extra1 on directories.  It
doesn't look like the parport code ever examines it, and it's bad sysctl form.

[bunk@kernel.org: remove parport_device_num()]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
0d135a4a8c sysctl: remove the binary interface for aio-nr, aio-max-nr, acpi_video_flags
aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl
does not handle properly with a 64bit kernel and a 32bit user space.

Since no one is likely to be using the binary sysctl values and the ascii
interface still works, this patch just removes support for the binary sysctl
interface from the kernel.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
49641b58a7 sysctl: ipv4 remove binary sysctl paths where they are broken
Currently tcp_available_congestion_control does not even attempt being read
from sys_sysctl, and ipfrag_max_dist while it works allows setting of invalid
values using sys_sysctl.

So just kill the binary sys_sysctl support for these sysctls.  If the support
is not important enough to test and get right it probably isn't important
enough to keep.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
06489b4eec sysctl: remove broken cdrom binary sysctls
The binary interface for the cdrom sysctls can't possilby work.  So remove the
binary sysctls and update the test for finding out which sysctl table entry we
are dealy with to use the procname and not the ctl_name (which I am removing).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00
Eric W. Biederman
282a821f18 sysctl: x86_64 remove unnecessary binary paths
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
baa3a2a0d2 sysctl: remove broken sunrpc debug binary sysctls
This is debug code so no need to support binary sysctl, and the binary sysctls
as they were written were not consistent with what showed up in /proc so
remove the binary sysctl support.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
428b367bff sysctl: ipv6 route flushing (kill binary path)
We don't preoperly support the sysctl binary path for flushing the ipv6
routes.  So remove support for a binary path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
d12af679bc sysctl: fix neighbour table sysctls.
- In ipv6 ndisc_ifinfo_syctl_change so it doesn't depend on binary
  sysctl names for a function that works with proc.

- In neighbour.c reorder the table to put the possibly unused entries
  at the end so we can remove them by terminating the table early.

- In neighbour.c kill the entries with questionable binary sysctl
  handling behavior.

- In neighbour.c if we don't have a strategy routine remove the
  binary path.  So we don't the default sysctl strategy routine
  on data that is not ready for it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
f5ead5cefc sysctl: remove binary sysctl support where it clearly doesn't work
These functions are all wrapper functions for the proc interface that are
needed for them to work correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
97aeacf492 sysctl mqueue: remove the binary sysctl numbers
Because of a conflict with FS_INODE_NR none of the binary sysctl numbers use
by mqueue, were available to user space.  So just remove them.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
49a0c45833 sysctl: Factor out sysctl_data.
There as been no easy way to wrap the default sysctl strategy routine except
for returning 0.  Which is not always what we want.  The few instances I have
seen that want different behaviour have written their own version of
sysctl_data.  While not too hard it is unnecessary code and has the potential
for extra bugs.

So to make these situations easier and make that part of sysctl more symetric
I have factord sysctl_data out of do_sysctl_strategy and exported as a
function everyone can use.

Further having sysctl_data be an explicit function makes checking for badly
formed sysctl tables much easier.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Eric W. Biederman
d8217f076b sysctl core: Stop using the unnecessary ctl_table typedef
In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't
needed and is a bit of a nuisance because it makes it harder to recognize
ctl_table is a type name.

So this patch removes it from the generic sysctl code.  Hopefully I will have
enough energy to send the rest of my patches will follow and to remove it from
the rest of the kernel.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
d32c4f2626 CIFS: ignore mode change if it's just for clearing setuid/setgid bits
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
the setuid/setgid bits.  For CIFS, skip the mode change and let the server
handle it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
188b95dd8e NFS: if ATTR_KILL_S*ID bits are set, then skip mode change
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
the setuid/setgid bits.  For NFS, skip the mode change and let the server
handle it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
6de0ec00ba VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations
When an unprivileged process attempts to modify a file that has the setuid or
setgid bits set, the VFS will attempt to clear these bits.  The VFS will set
the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call
notify_change to clear these bits and set the mode accordingly.

With a networked filesystem (NFS and CIFS in particular but likely others),
the client machine or process may not have credentials that allow for setting
the mode.  In some situations, this can lead to file corruption, an operation
failing outright because the setattr fails, or to races that lead to a mode
change being reverted.

In this situation, we'd like to just leave the handling of this to the server
and ignore these bits.  The problem is that by the time the setattr op is
called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode
change.  The setattr operation has no way to know its intent.

The following patch fixes this by making notify_change no longer clear the
ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off
to the setattr inode op.  setattr can then check for the presence of these
bits, and if they're set it can assume that the mode change was only for the
purposes of clearing these bits.

This means that we now have an implicit assumption that notify_change is never
called with ATTR_MODE and either ATTR_KILL_S*ID bit set.  Nothing currently
enforces that, so this patch also adds a BUG() if that occurs.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
cdd6fe6e2f reiserfs: turn of ATTR_KILL_S*ID at beginning of reiserfs_setattr
reiserfs_setattr can call notify_change recursively using the same
iattr struct. This could cause it to trip the BUG() in notify_change.
Fix reiserfs to clear those bits near the beginning of the function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
8a0ce7d99a knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set
It's theoretically possible for a single SETATTR call to come in that sets the
mode and the uid/gid.  In that case, don't set the ATTR_KILL_S*ID bits since
that would trip the BUG() in notify_change.  Just fix up the mode to have the
same effect.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Jeff Layton
1ac564ecab ecryptfs: allow lower fs to interpret ATTR_KILL_S*ID
Make sure ecryptfs doesn't trip the BUG() in notify_change.  This also allows
the lower filesystem to interpret ATTR_KILL_S*ID in its own way.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
ef1d7151d2 cpu hotplug: intel_cacheinfo: fix cpu hotplug error handling
- Fix resource leakage in error case within detect_cache_attributes()

- Don't register hotcpu notifier when cache_add_dev() returns error

- Introduce cache_dev_map cpumask to track whether cache interface for
  CPU is successfully added by cache_add_dev() or not.

  cache_add_dev() may fail with out of memory error. In order to
  avoid cache_remove_dev() with that uninitialized cache interface when
  CPU_DEAD event is delivered we need to have the cache_dev_map cpumask.

  (We cannot change cache_add_dev() from CPU_ONLINE event handler
  to CPU_UP_PREPARE event handler. Because cache_add_dev() needs
  to do cpuid and store the results with its CPU online.)

[nix.or.die@googlemail.com: fix a section mismatch warning]
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
d435d862ba cpu hotplug: mce: fix cpu hotplug error handling
- Clear kobject in percpu device_mce before calling sysdev_register() with

  Because mce_create_device() may fail and it leaves kobject filled with
  junk. It will be the problem when mce_create_device() will be called
  next time.

- Fix error handling in mce_create_device()

  Error handling should not do sysdev_remove_file() with not yet added
  attributes.

- Don't register hotcpu notifier when mce_create_device() returns error

- Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
881a841f4a cpu hotplug: msr: fix cpu hotplug error handling
Do msr_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
c7e38a9c27 cpu hotplug: thermal_throttle: fix cpu hotplug error handling
Do thermal_throttle_add_dev() in CPU_UP_PREPARE instead of CPU_ONLINE.

Cc: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
9780e3e968 cpu hotplug: topology: remove topology_dev_map
By previous cpu hotplug notifier change, we don't need to track topology_dev
existence for each cpu by topology_dev_map.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
a0d8cdb652 cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE
The functions in a CPU notifier chain is called with CPU_UP_PREPARE event
before making the CPU online.  If one of the callback returns NOTIFY_BAD, it
stops to deliver CPU_UP_PREPARE event, and CPU online operation is canceled.
Then CPU_UP_CANCELED event is delivered to the functions in a CPU notifier
chain again.

This CPU_UP_CANCELED event is delivered to the functions which have been
called with CPU_UP_PREPARE, not delivered to the functions which haven't been
called with CPU_UP_PREPARE.

The problem that makes existing cpu hotplug error handlings complex is that
the CPU_UP_CANCELED event is delivered to the function that has returned
NOTIFY_BAD, too.

Usually we don't expect to call destructor function against the object that
has failed to initialize.  It is like:

	err = register_something();
	if (err) {
		unregister_something();
		return err;
	}

So it is natural to deliver CPU_UP_CANCELED event only to the functions that
have returned NOTIFY_OK with CPU_UP_PREPARE event and not to call the function
that have returned NOTIFY_BAD.  This is what this patch is doing.

Otherwise, every cpu hotplug notifiler has to track whether notifiler event is
failed or not for each cpu.  (drivers/base/topology.c is doing this with
topology_dev_map)

Similary this patch makes same thing with CPU_DOWN_PREPARE and CPU_DOWN_FAILED
evnets.

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
12d00f6a12 cpu hotplug: slab: fix memory leak in cpu hotplug error path
This patch fixes memory leak in error path.

In reality, we don't need to call cpuup_canceled(cpu) for now.  But upcoming
cpu hotplug error handling change needs this.

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Akinobu Mita
fbf1e473bd cpu hotplug: slab: cleanup cpuup_callback()
cpuup_callback() is too long.  This patch factors out CPU_UP_CANCELLED and
CPU_UP_PREPARE handlings from cpuup_callback().

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Andy Whitcroft
6c72ffaab9 update checkpatch.pl to version 0.11
This version brings a more cautious checkpatch.pl by default.  The more
subjective checks are only applied with the --strict option.  It also
brings the usual slew of corrections for false positives.  Of note:

  - new tree detection, the source tree will be found via the executable
  - a major revamp of the unary detection to make it more parser like
  - a new summary at the bottom of the report
  - --strict option for subjective checks
  - --file to enable checking on complete files
  - support for use in emacs "compile" window

Andy Whitcroft (27):
      Version: 0.11
      fix up cat_vet for the case where there are no control characters
      any cast to a pointer introduces a type
      cpp unary operator detection needs to float
      attributes are also valid in type definitions
      sizeof may be a bareword and makes its argument unary
      unary checks for #ifdef et al need to find end of line
      add new --file mode to handle raw source files
      add --strict/--subjective which enables the subjective tests
      add some additional standard type suffixes
      cpp #elif is also a unary prefix
      case is not a function name
      widen asm volatile exceptions
      __kprobes is a type attribute
      typeof is a unary operator
      function open parenthesis checks should check all occurances
      expand sizeof() binary exceptions
      linux/irq.h should not be recommended
      work harder to find the kernel root and add --root=
      fix --emacs mode line numbers and string concatenation warnings
      add a summary to the bottom of the main report
      loosen assignment in if checks
      update operator spacing to maintain tabs in output
      revamp unary detection
      corruption/line wrapped patches need only reporting once
      revamp s/u/be/le 8/16/32/64 bit types
      handle missing ,1 in uni-diff header

Mike D. Day (2):
      Adds support to checkpatch.pl for running in the emacs compile window.
      checkpatch: Fix line number reporting

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Dave Young
faf8c714f4 param_sysfs_builtin memchr argument fix
If memchr argument is longer than strlen(kp->name), there will be some
weird result.

It will casuse duplicate filenames in sysfs for the "nousb".  kernel
warning messages are as bellow:

sysfs: duplicate filename 'usbcore' can not be created
WARNING: at fs/sysfs/dir.c:416 sysfs_add_one()
 [<c01c4750>] sysfs_add_one+0xa0/0xe0
 [<c01c4ab8>] create_dir+0x48/0xb0
 [<c01c4b69>] sysfs_create_dir+0x29/0x50
 [<c024e0fb>] create_dir+0x1b/0x50
 [<c024e3b6>] kobject_add+0x46/0x150
 [<c024e2da>] kobject_init+0x3a/0x80
 [<c053b880>] kernel_param_sysfs_setup+0x50/0xb0
 [<c053b9ce>] param_sysfs_builtin+0xee/0x130
 [<c053ba33>] param_sysfs_init+0x23/0x60
 [<c024d062>] __next_cpu+0x12/0x20
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052a856>] do_initcalls+0x46/0x1e0
 [<c01bdb12>] create_proc_entry+0x52/0x90
 [<c0158d4c>] register_irq_proc+0x9c/0xc0
 [<c01bda94>] proc_mkdir_mode+0x34/0x50
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052aa92>] kernel_init+0x62/0xb0
 [<c0104f83>] kernel_thread_helper+0x7/0x14
 =======================
kobject_add failed for usbcore with -EEXIST, don't try to register things with the same name in the same directory.
 [<c024e466>] kobject_add+0xf6/0x150
 [<c053b880>] kernel_param_sysfs_setup+0x50/0xb0
 [<c053b9ce>] param_sysfs_builtin+0xee/0x130
 [<c053ba33>] param_sysfs_init+0x23/0x60
 [<c024d062>] __next_cpu+0x12/0x20
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052a856>] do_initcalls+0x46/0x1e0
 [<c01bdb12>] create_proc_entry+0x52/0x90
 [<c0158d4c>] register_irq_proc+0x9c/0xc0
 [<c01bda94>] proc_mkdir_mode+0x34/0x50
 [<c052aa30>] kernel_init+0x0/0xb0
 [<c052aa92>] kernel_init+0x62/0xb0
 [<c0104f83>] kernel_thread_helper+0x7/0x14
 =======================
Module 'usbcore' failed to be added to sysfs, error number -17
The system will be unstable now.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Andrew Morton
8f286c33f1 stop using DMA_xxBIT_MASK
Now that we have DMA_BIT_MASK(), these macros are pointless.

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Borislav Petkov
34c6538413 unify DMA_..BIT_MASK definitions: v3.1
Remove redundant DMA_..BIT_MASK definitions across two drivers.  The
computation of the majority of the bitmasks is done by the compiler.  The
initial split of the patch touching each a different file got removed due
to possible git bisect breakage.

Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Reviewed-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:21 -07:00
Tony Breeds
2c62214831 Fix discrepancy between VDSO based gettimeofday() and sys_gettimeofday().
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).

This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:20 -07:00
Alexey Dobriyan
6212e3a388 Remove struct task_struct::io_wait
Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
during 2.6.9 development.  Commit introduced io_wait field which remained
write-only than and still remains write-only.

Also garbage collect macros which "use" io_wait.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:20 -07:00
Rafael J. Wysocki
9cd9a0058d Hibernation: Enter platform hibernation state in a consistent way
Make hibernation_platform_enter() execute the enter-a-sleep-state sequence
instead of the mixed shutdown-with-entering-S4 thing.

Replace the shutting down of devices done by kernel_shutdown_prepare(), before
entering the ACPI S4 sleep state, with suspending them and the shutting down
of sysdevs with calling device_power_down(PMSG_SUSPEND) (just like before
entering S1 or S3, but the target state is now S4).   Also, disable the
nonboot CPUs before entering the sleep state (S4), which generally always is a
good idea.

This is known to fix the "double disk spin down during hibernation" on some
machines, eg.  HPC nx6325 (ref.  http://lkml.org/lkml/2007/8/7/316 and the
following thread).   Moreover, it has been reported to make
/sys/class/rtc/rtc0/wakealarm work correctly with hibernation for some users.
It also generally causes the hibernation state (ACPI S4) to be entered faster.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:20 -07:00
Rafael J. Wysocki
c7e0831d38 Hibernation: Check if ACPI is enabled during restore in the right place
The following scenario leads to total confusion of the platform firmware on
some boxes (eg. HPC nx6325):
* Hibernate with ACPI enabled
* Resume passing "acpi=off" to the boot kernel

To prevent this from happening it's necessary to check if ACPI is enabled (and
enable it if that's not the case) _right_ _after_ control has been transfered
from the boot kernel to the image kernel, before device_power_up() is called
(ie.  with interrupts disabled).   Enabling ACPI after calling
device_power_up() turns out to be insufficient.

For this reason, introduce new hibernation callback ->leave() that will be
executed before device_power_up() by the restored image kernel.   To make it
work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c
(it's name is changed to "create_image", which is more up to the point).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:20 -07:00
Rafael J. Wysocki
efa4d2fb04 Hibernation: Use temporary page tables for kernel text mapping on x86_64
Use temporary page tables for the kernel text mapping during hibernation
restore on x86_64.

Without the patch, the original boot kernel's page tables that represent the
kernel text mapping are used while the core of the image kernel is being
restored.  However, in principle, if the boot kernel is not identical to the
image kernel, the location of these page tables in the image kernel need not
be the same, so we should create a safe copy of the kernel text mapping prior
to restoring the core of the image kernel.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:20 -07:00
Rafael J. Wysocki
c30bb68c26 Hibernation: Pass CR3 in the image header on x86_64
Since we already pass the address of restore_registers() in the image header,
we can also pass the value of the CR3 register from before the hibernation in
the same way.  This will allow us to avoid using init_level4_pgt page tables
during the restore.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Rafael J. Wysocki
d158cbdf39 Hibernation: Arbitrary boot kernel support on x86_64
Make it possible to restore a hibernation image on x86_64 with the help of a
kernel different from the one in the image.

The idea is to split the core restoration code into two separate parts and to
place each of them in a different page.   The first part belongs to the boot
kernel and is executed as the last step of the image kernel's memory
restoration procedure.   Before being executed, it is relocated to a safe page
that won't be overwritten while copying the image kernel pages.

The final operation performed by it is a jump to the second part of the core
restoration code that belongs to the image kernel and has just been restored.
This code makes the CPU switch to the image kernel's page tables and restores
the state of general purpose registers (including the stack pointer) from
before the hibernation.

The main issue with this idea is that in order to jump to the second part of
the core restoration code the boot kernel needs to know its address.
 However, this address may be passed to it in the image header.   Namely, the
part of the image header previously used for checking if the version of the
image kernel is correct can be replaced with some architecture specific data
that will allow the boot kernel to jump to the right address within the image
kernel.   These data should also be used for checking if the image kernel is
compatible with the boot kernel (as far as the memory restroration procedure
is concerned).  It can be done, for example, with the help of a "magic" value
that has to be equal in both kernels, so that they can be regarded as
compatible.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Rafael J. Wysocki
d307c4a8e8 Hibernation: Arbitrary boot kernel support - generic code
Add the bits needed for supporting arbitrary boot kernels to the common
hibernation code.

To support arbitrary boot kernels, make it possible to replace the 'struct
new_utsname' and the kernel version in the hibernation image header by some
architecture specific data that will be used to verify if the image is valid
and to restore the image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Pavel Machek
50a1efe14f s2ram: kill old debugging junk
This removes old debugging stuff, that should be no longer neccessary.  It
accessed VGA hardware (which may not be ready at this point), and used LEDs
at port 80 for debugging.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Andres Salomon
8f4ce8c32f serial: turn serial console suspend a boot rather than compile time option
Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop
the serial console from being suspended when the rest of the machine goes
to sleep.  This is incredibly useful for debugging power management-related
things; however, having it as a compile-time option has proved to be
incredibly inconvenient for us (OLPC).  There are plenty of times that we
want serial console to not suspend, but for the most part we'd like serial
console to be suspended.

This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel
boot parameter (no_console_suspend).  By default, the serial console will
be suspended along with the rest of the system; by passing
'no_console_suspend' to the kernel during boot, serial console will remain
alive during suspend.

For now, this is pretty serial console specific; further fixes could be
applied to make this work for things like netconsole.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Rafael J. Wysocki
438e2ce68d freezer: measure freezing time
Measure the time of the freezing of tasks, even if it doesn't fail.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Rafael J. Wysocki
b842ee578e freezer: be more verbose
Increase the freezer's verbosity a bit, so that it's easier to read problem
reports related to it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Rafael J. Wysocki
f059bca1c5 pm_trace displays the wrong time from the RTC
The way in which read_magic_time() displays the date read from the RTC is
apparently confusing to the users (cf.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250238).  Make it
print dates in the standard way.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00
Adrian Bunk
c3d42d7527 unexport pm_power_off_prepare
This patch removes the unused EXPORT_SYMBOL(pm_power_off_prepare).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:19 -07:00