Commit Graph

5506 Commits

Author SHA1 Message Date
Linus Torvalds
28cb5ccd30 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  Driver core: proper prototype for drivers/base/init.c:driver_init()
  kobject: kobject_uevent() returns manageable value
  kref refcnt and false positives
2006-12-21 00:02:03 -08:00
Adrian Bunk
1f21782e63 Driver core: proper prototype for drivers/base/init.c:driver_init()
Add a prototype for driver_init() in include/linux/device.h.

Also remove a static function of the same name in drivers/acpi/ibm_acpi.c to
ibm_acpi_driver_init() to fix the namespace collision.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:56:45 -08:00
Aneesh Kumar K.V
542cfce6f3 kobject: kobject_uevent() returns manageable value
Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is worth
changing it in order to report a proper status (success or error) instead
of returning void.

[randy.dunlap@oracle.com: Fix inline kobject functions]
Cc: Mauricio Lin <mauriciolin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:56:44 -08:00
Ralf Baechle
fb0f2b40fa PCI legacy resource fix
Since commit 368c73d4f6 the kernel will try
to update the non-writeable BAR registers 0..3 of PIIX4 IDE adapters if
pci_assign_unassigned_resources() is used to do full resource assignment of
the bus.  This fails because in the PIIX4 these BAR registers have
implicitly assumed values and read back as zero; it used to work because
the kernel used to just write zero to that register the read back value did
match what was written.

The fix is a new resource flag IORESOURCE_PCI_FIXED used to mark a resource
as non-movable.  This will also be useful to keep other import system
resources from being moved around - for example system consoles on PCI
busses.

[akpm@osdl.org: cleanup]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:43 -08:00
Adrian Bunk
7e7a43c32a PCI: don't export device IDs to userspace
I don't see any good reason for exporting device IDs to userspace.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:43 -08:00
Alan Cox
1597cacbe3 PCI: Fix multiple problems with VIA hardware
This patch is designed to fix:
- Disk eating corruptor on KT7 after resume from RAM
- VIA IRQ handling
- VIA fixups for bus lockups after resume from RAM

The core of this is to add a table of resume fixups run at resume time.
We need to do this for a variety of boards and features, but particularly
we need to do this to get various critical VIA fixups done on resume.

The second part of the problem is to handle VIA IRQ number rules which
are a bit odd and need special handling for PIC interrupts. Various
patches broke various boxes and while this one may not be perfect
(hopefully it is) it ensures the workaround is applied to the right
devices only.

From: Jean Delvare <khali@linux-fr.org>

Now that PCI quirks are replayed on software resume, we can safely
re-enable the Asus SMBus unhiding quirk even when software suspend support
is enabled.

[akpm@osdl.org: fix const warning]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:43 -08:00
Michael Ellerman
d010b51c7e PCI: Add #defines for Hypertransport MSI fields
Add a few #defines for grabbing and working with the address fields
in a HT_CAPTYPE_MSI_MAPPING capability. All from the HT spec v3.00.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:43 -08:00
Michael Ellerman
687d5fe3dc PCI: Add pci_find_ht_capability() for finding Hypertransport capabilities
There are already several places in the kernel that want to search a PCI
device for a given Hypertransport capability. Although this is possible
using pci_find_capability() etc., it makes sense to encapsulate that
logic in a helper - pci_find_ht_capability().

To cater for searching exhaustively for a capability, we also provide
pci_find_next_ht_capability().

We also need to cater for the fact that the HT capability fields may be
either 3 or 5 bits wide. pci_find_ht_capability() deals with this for you,
but callers using the #defines directly must handle that themselves.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:42 -08:00
Alan Cox
d86f90f991 pci: Introduce pci_find_present
This works like pci_dev_present but instead of returning boolean returns
the matching pci_device_id entry.  This makes it much more useful.  Code
bloat is basically nil as the old boolean function is rewritten in terms of
the new one.

This will be used by the updated VIA PCI quirks for one

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:42 -08:00
Inaky Perez-Gonzalez
42a0ee3238 pci: add class codes for Wireless RF controllers
pci: add class codes for Wireless RF controllers

Add PCI codes to include/linux/pci_ids.h for RF controllers; first
batch of these devices seem to be the Ultra-Wide-Band and Wireless USB
controllers (WHCI spec).

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:54:42 -08:00
Jens Axboe
da77526502 [PATCH] cfq-iosched: don't allow sync merges across queues
Currently we allow any merge, even if the io originates from different
processes. This can cause really bad starvation and unfairness, if those
ios happen to be synchronous (reads or direct writes).

So add a allow_merge hook to the io scheduler ops, so an io scheduler can
help decide whether a bio/process combination may be merged with an
existing request.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-20 11:04:12 +01:00
Len Brown
9774f33841 merge linus into test branch 2006-12-20 02:53:13 -05:00
Len Brown
40b20c257a Pull platform-drivers into test branch 2006-12-20 02:52:17 -05:00
Yu Luming
2dec3ba8d8 output: Add display output class support
Add generic abstract layer for display output switch control.  The output
sysfs class driver provides an abstract video output layer that can be used to
hook platform specific methods to enable/disable video output device through
common sysfs interface.

Signed-off-by: Luming Yu <Luming.yu@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2006-12-20 01:46:41 -05:00
Yu Luming
519ab5f2be ACPI: video: Add dev argument for backlight_device_register
This patch set adds generic abstract layer support for acpi video driver to
have generic user interface to control backlight and output switch control by
leveraging the existing backlight sysfs class driver, and by adding a new
video output sysfs class driver.

This patch:

Add dev argument for backlight_device_register to link the class device to
real device object.  The platform specific driver should find a way to get the
real device object for their video device.

[akpm@osdl.org: build fix]
[akpm@osdl.org: fix msi-laptop.c]
Signed-off-by: Luming Yu <Luming.yu@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2006-12-20 01:42:19 -05:00
Linus Torvalds
f238085415 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  [PATCH] Generic HID layer - update MAINTAINERS
  input/hid: Supporting more keys from the HUT Consumer Page
  [PATCH] Generic HID layer - build: USB_HID should select HID
2006-12-19 10:32:40 -08:00
Jens Axboe
8e5cfc45e7 [PATCH] Fixup blk_rq_unmap_user() API
The blk_rq_unmap_user() API is not very nice. It expects the caller to
know that rq->bio has to be reset to the original bio, and it will
silently do nothing if that is not done. Instead make it explicit that
we need to pass in the first bio, by expecting a bio argument.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-19 11:12:46 +01:00
Jens Axboe
1aa4f24fe9 [PATCH] Remove queue merging hooks
We have full flexibility of merging parameters now, so we can remove the
hooks that define back/front/request merge strategies. Nobody is using
them anymore.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-19 08:33:11 +01:00
Gabriel Mansi
d5cb8d38cd [AGPGART] K8M890 support for amd-k8.
Signed-off-by: Dave Jones <davej@redhat.com>
2006-12-18 19:13:54 -05:00
Evgeniy Polyakov
a240d9f1d8 [CONNECTOR]: Replace delayed work with usual work queue.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-18 01:53:58 -08:00
Linus Torvalds
a08727bae7 Make workqueue bit operations work on "atomic_long_t"
On architectures where the atomicity of the bit operations is handled by
external means (ie a separate spinlock to protect concurrent accesses),
just doing a direct assignment on the workqueue data field (as done by
commit 4594bf159f) can cause the
assignment to be lost due to lack of serialization with the bitops on
the same word.

So we need to serialize the assignment with the locks on those
architectures (notably older ARM chips, PA-RISC and sparc32).

So rather than using an "unsigned long", let's use "atomic_long_t",
which already has a safe assignment operation (atomic_long_set()) on
such architectures.

This requires that the atomic operations use the same atomicity locks as
the bit operations do, but that is largely the case anyway.  Sparc32
will probably need fixing.

Architectures (including modern ARM with LL/SC) that implement sane
atomic operations for SMP won't see any of this matter.

Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: David Miller <davem@davemloft.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-16 09:53:50 -08:00
Linus Torvalds
0221872a3b Fix "delayed_work_pending()" macro expansion
Nobody uses it, but it was still wrong.  Using the macro argument name
'work' meant that when we used 'work' as a member name, that would also
get replaced by the macro argument.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15 14:13:51 -08:00
Linus Torvalds
d1526e2cda Remove stack unwinder for now
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed.  We can put it back when
it's stable and isn't likely to make warning or bug events worse.

In the meantime, enable frame pointers for more readable stack traces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15 08:47:51 -08:00
Florian Festi
1c1e40b5ad input/hid: Supporting more keys from the HUT Consumer Page
On USB keyboards lots of hot/internet keys are not working. This patch
adds support for a number of keys from the USB HID Usage Table
(http://www.usb.org/developers/devclass_docs/Hut1_12.pdf).

It also adds several new key codes. Most of them are used on real world
keyboards I know. I added some others (KEY_+ EDITOR, GRAPHICSEDITOR, DATABASE,
NEWS, VOICEMAIL, VIDEOPHONE) to avoid "holes".

I also added KEY_ZOOMRESET as it is possible to have a inet keyboard and a
remote control  in parallel and it makes sense to have them behave differently.

Signed-off-by: Florian Festi <ffesti@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2006-12-14 13:37:24 +01:00
Patrick McHardy
2bf540b73e [NETFILTER]: bridge-netfilter: remove deferred hooks
Remove the deferred hooks and all related code as scheduled in
feature-removal-schedule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:54:25 -08:00
Scott Wood
6eefd34fdc Driver core: Make platform_device_add_data accept a const pointer
platform_device_add_data() makes a copy of the data that is given to it,
and thus the parameter can be const.  This removes a warning when data
from get_property() on powerpc is handed to platform_device_add_data(),
as get_property() returns a const pointer.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13 15:38:46 -08:00
Russell King
aef6fba4f9 [PATCH] Add missing KORENIX PCI ID's
Oops, sorry about that.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 10:06:55 -08:00
Atsushi Nemoto
9de455b207 [PATCH] Pass vma argument to copy_user_highpage().
To allow a more effective copy_user_highpage() on certain architectures,
a vma argument is added to the function and cow_user_page() allowing
the implementation of these functions to check for the VM_EXEC bit.

The main part of this patch was originally written by Ralf Baechle;
Atushi Nemoto did the the debugging.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:27:08 -08:00
Atsushi Nemoto
77fff4ae2b [PATCH] Fix COW D-cache aliasing on fork
Problem:

1. There is a process containing two thread (T1 and T2).  The
   thread T1 calls fork().  Then dup_mmap() function called on T1 context.

static inline int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
	...
	flush_cache_mm(current->mm);
	...	/* A */
	(write-protect all Copy-On-Write pages)
	...	/* B */
	flush_tlb_mm(current->mm);
	...

2. When preemption happens between A and B (or on SMP kernel), the
   thread T2 can run and modify data on COW pages without page fault
   (modified data will stay in cache).

3. Some time after fork() completed, the thread T2 may cause a page
   fault by write-protect on a COW page.

4. Then data of the COW page will be copied to newly allocated
   physical page (copy_cow_page()).  It reads data via kernel mapping.
   The kernel mapping can have different 'color' with user space
   mapping of the thread T2 (dcache aliasing).  Therefore
   copy_cow_page() will copy stale data.  Then the modified data in
   cache will be lost.

In order to allow architecture code to deal with this problem allow
architecture code to override copy_user_highpage() by defining
__HAVE_ARCH_COPY_USER_HIGHPAGE in <asm/page.h>.

The main part of this patch was originally written by Ralf Baechle;
Atushi Nemoto did the the debugging.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:27:07 -08:00
Linus Torvalds
bbc7610c06 Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  hwmon: Add MAINTAINERS entry for new ams driver
  hwmon: New AMS hardware monitoring driver
  hwmon/w83793: Add documentation and maintainer
  hwmon: New Winbond W83793 hardware monitoring driver
  hwmon: Update Rudolf Marek's e-mail address
  hwmon/f71805f: Fix the device address decoding
  hwmon/f71805f: Always create all fan inputs
  hwmon/f71805f: Add support for the Fintek F71872F/FG chip
  hwmon: New PC87427 hardware monitoring driver
  hwmon/it87: Remove the SMBus interface support
  hwmon/hdaps: Update the list of supported devices
  hwmon/hdaps: Move the DMI detection data to .data
  hwmon/pc87360: Autodetect the VRM version
  hwmon/f71805f: Document the fan control features
  hwmon/f71805f: Add support for "speed mode" fan speed control
  hwmon/f71805f: Support DC fan speed control mode
  hwmon/f71805f: Let the user adjust the PWM base frequency
  hwmon/f71805f: Add manual fan speed control
  hwmon/f71805f: Store the fan control registers
2006-12-13 09:13:19 -08:00
Robert P. J. Day
5cbded585d [PATCH] getting rid of all casts of k[cmz]alloc() calls
Run this:

	#!/bin/sh
	for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
	  echo "De-casting $f..."
	  perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
	done

And then go through and reinstate those cases where code is casting pointers
to non-pointers.

And then drop a few hunks which conflicted with outstanding work.

Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:58 -08:00
Geert Uytterhoeven
3161986224 [PATCH] fbdev: remove references to non-existent fbmon_valid_timings()
Remove references to non-existent fbmon_valid_timings()

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: James Simmons <jsimmons@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:55 -08:00
J.Bruce Fields
b591480bbe [PATCH] knfsd: nfsd4: reorganize compound ops
Define an op descriptor struct, use it to simplify nfsd4_proc_compound().

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:54 -08:00
J.Bruce Fields
a4f1706a9b [PATCH] knfsd: nfsd4: move replay_owner to cstate
Tuck away the replay_owner in the cstate while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:54 -08:00
J.Bruce Fields
ca3643171b [PATCH] knfsd: nfsd4: pass saved and current fh together into nfsd4 operations
Pass the saved and current filehandles together into all the nfsd4 compound
operations.

I want a unified interface to these operations so we can just call them by
pointer and throw out the huge switch statement.

Also I'll eventually want a structure like this--that holds the state used
during compound processing--for deferral.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:54 -08:00
J.Bruce Fields
e571019911 [PATCH] knfsd: nfsd4: clarify units of COMPOUND_SLACK_SPACE
A comment here incorrectly states that "slack_space" is measured in words, not
bytes.  Remove the comment, and adjust a variable name and a few comments to
clarify the situation.

This is pure cleanup; there should be no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:53 -08:00
Eric W. Biederman
2154227a2c [PATCH] ncpfs: Use struct pid to track the userspace watchdog process
This patch converts the tracking of the user space watchdog process from using
a pid_t to use struct pid.  This makes us safe from pid wrap around issues and
prepares the way for the pid namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:53 -08:00
Eric W. Biederman
a71113da44 [PATCH] smbfs: Make conn_pid a struct pid
smbfs keeps track of the user space server process in conn_pid.  This converts
that track to use a struct pid instead of pid_t.  This keeps us safe from pid
wrap around issues and prepares the way for the pid namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:53 -08:00
Eric W. Biederman
3cec556a84 [PATCH] n_r3964: Use struct pid to track user space clients
Currently this driver tracks user space clients it should send signals to.  In
the presenct of file descriptor passing this is appears susceptible to
confusion from pid wrap around issues.

Replacing this with a struct pid prevents us from getting confused, and
prepares for a pid namespace implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:53 -08:00
Al Viro
e8c5c045d7 [PATCH] lockd endianness annotations
Annotated, all places switched to keeping status net-endian.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Robert P. J. Day
cd86128088 [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Ingo Molnar
3117df0453 [PATCH] lockdep: print irq-trace info on asserts
When we print an assert due to scheduling-in-atomic bugs, and if lockdep
is enabled, then the IRQ tracing information of lockdep can be printed
to pinpoint the code location that disabled interrupts. This saved me
quite a bit of debugging time in cases where the backtrace did not
identify the irq-disabling site well enough.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Ingo Molnar
5d6f647fc6 [PATCH] debug: add sysrq_always_enabled boot option
Most distributions enable sysrq support but set it to 0 by default.  Add a
sysrq_always_enabled boot option to always-enable sysrq keys.  Useful for
debugging - without having to modify the disribution's config files (which
might not be possible if the kernel is on a live CD, etc.).

Also, while at it, clean up the sysrq interfaces.

[bunk@stusta.de: make sysrq_always_enabled_setup() static]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Chen, Kenneth W
e61c90188b [PATCH] optimize o_direct on block devices
Implement block device specific .direct_IO method instead of going through
generic direct_io_worker for block device.

direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
file system, where it needs to perform block allocation, hole detection,
extents file on write, and tons of other corner cases.  The end result is
that it takes tons of CPU time to submit an I/O.

For block device, the block allocation is much simpler and a tight triple
loop can be written to iterate each iovec and each page within the iovec in
order to construct/prepare bio structure and then subsequently submit it to
the block layer.  This significantly speeds up O_D on block device.

[akpm@osdl.org: small speedup]
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Valerie Henson
47ae32d6a5 [PATCH] relative atime
Add "relatime" (relative atime) support.  Relative atime only updates the
atime if the previous atime is older than the mtime or ctime.  Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.

A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txt

Signed-off-by: Valerie Henson <val_henson@linux.intel.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Rafael J. Wysocki
8a102eed9c [PATCH] PM: Fix SMP races in the freezer
Currently, to tell a task that it should go to the refrigerator, we set the
PF_FREEZE flag for it and send a fake signal to it.  Unfortunately there
are two SMP-related problems with this approach.  First, a task running on
another CPU may be updating its flags while the freezer attempts to set
PF_FREEZE for it and this may leave the task's flags in an inconsistent
state.  Second, there is a potential race between freeze_process() and
refrigerator() in which freeze_process() running on one CPU is reading a
task's PF_FREEZE flag while refrigerator() running on another CPU has just
set PF_FROZEN for the same task and attempts to reset PF_FREEZE for it.  If
the refrigerator wins the race, freeze_process() will state that PF_FREEZE
hasn't been set for the task and will set it unnecessarily, so the task
will go to the refrigerator once again after it's been thawed.

To solve first of these problems we need to stop using PF_FREEZE to tell
tasks that they should go to the refrigerator.  Instead, we can introduce a
special TIF_*** flag and use it for this purpose, since it is allowed to
change the other tasks' TIF_*** flags and there are special calls for it.

To avoid the freeze_process()-refrigerator() race we can make
freeze_process() to always check the task's PF_FROZEN flag after it's read
its "freeze" flag.  We should also make sure that refrigerator() will
always reset the task's "freeze" flag after it's set PF_FROZEN for it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Eric Dumazet
6a2d7a955d [PATCH] SLAB: use a multiply instead of a divide in obj_to_index()
When some objects are allocated by one CPU but freed by another CPU we can
consume lot of cycles doing divides in obj_to_index().

(Typical load on a dual processor machine where network interrupts are
handled by one particular CPU (allocating skbufs), and the other CPU is
running the application (consuming and freeing skbufs))

Here on one production server (dual-core AMD Opteron 285), I noticed this
divide took 1.20 % of CPU_CLK_UNHALTED events in kernel.  But Opteron are
quite modern cpus and the divide is much more expensive on oldest
architectures :

On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
cycle for a multiply.

Doing some math, we can use a reciprocal multiplication instead of a divide.

If we want to compute V = (A / B)  (A and B being u32 quantities)
we can instead use :

V = ((u64)A * RECIPROCAL(B)) >> 32 ;

where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B

Note :

I wrote pure C code for clarity. gcc output for i386 is not optimal but
acceptable :

mull   0x14(%ebx)
mov    %edx,%eax // part of the >> 32
xor     %edx,%edx // useless
mov    %eax,(%esp) // could be avoided
mov    %edx,0x4(%esp) // useless
mov    (%esp),%ebx

[akpm@osdl.org: small cleanups]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Paul Jackson
02a0e53d82 [PATCH] cpuset: rework cpuset_zone_allowed api
Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:

  cpuset_zone_allowed_hardwall()
  cpuset_zone_allowed_softwall()

Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.

If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.

Unfortunately, this meant that users would end up with the softwall version
without thinking about it.  Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.

The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)

The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)

This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.

If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.

This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines.  It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.

For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.

Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Christoph Lameter
55935a34a4 [PATCH] More slab.h cleanups
More cleanups for slab.h

1. Remove tabs from weird locations as suggested by Pekka

2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
   as suggested by Pekka.

3. Uses static inline for the fallback defs as also suggested by Pekka.

4. Make kmem_ptr_valid take a const * argument.

5. Separate the NUMA fallback definitions from the kmalloc_track fallback
   definitions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Christoph Lameter
2e892f43cc [PATCH] Cleanup slab headers / API to allow easy addition of new slab allocators
This is a response to an earlier discussion on linux-mm about splitting
slab.h components per allocator.  Patch is against 2.6.19-git11.  See
http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2

This patch cleans up the slab header definitions.  We define the common
functions of slob and slab in slab.h and put the extra definitions needed
for slab's kmalloc implementations in <linux/slab_def.h>.  In order to get
a greater set of common functions we add several empty functions to slob.c
and also rename slob's kmalloc to __kmalloc.

Slob does not need any special definitions since we introduce a fallback
case.  If there is no need for a slab implementation to provide its own
kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
That is sufficient for SLOB.

Sort the function in slab.h according to their functionality.  First the
functions operating on struct kmem_cache * then the kmalloc related
functions followed by special debug and fallback definitions.

Also redo a lot of comments.

Signed-off-by: Christoph Lameter <clameter@sgi.com>?
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Eric Dumazet
6a8ba9d121 [PATCH] reorder struct pipe_buf_operations
Fields of struct pipe_buf_operations have not a precise layout (ie not
optimized to fit cache lines nor reduce cache line ping pongs)

The bufs[] array is *large* and is placed near the beginning of the
structure, so all following fields have a large offset.  This is
unfortunate because many archs have smaller instructions when using small
offsets relative to a base register.  On x86 for example, 7 bits offsets
have smaller instruction lengths.

Moving bufs[] at the end of pipe_buf_operations permits all fields to have
small offsets, and reduce text size, and icache pressure.

# size vmlinux.pre vmlinux
    text    data     bss     dec     hex filename
3268989  664356  492196 4425541  438745 vmlinux.pre
3268765  664356  492196 4425317  438665 vmlinux

So this patch reduces text size by 224 bytes on my x86_64 machine. Similar
results on ia32.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:48 -08:00
Eric W. Biederman
5f8442edfb [PATCH] Revert "[PATCH] identifier to nsproxy"
This reverts commit 373beb35cd.

No one is using this identifier yet.  The purpose of this identifier is to
export nsproxy to user space which is wrong.  nsproxy is an internal
implementation optimization, which should keep our fork times from getting
slower as we increase the number of global namespaces you don't have to
share.

Adding a global identifier like this is inappropriate because it makes
namespaces inherently non-recursive, greatly limiting what we can do with
them in the future.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:47 -08:00
Eric Dumazet
d4c3cca941 [PATCH] constify pipe_buf_operations
- pipe/splice should use const pipe_buf_operations and file_operations

- struct pipe_inode_info has an unused field "start" : get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:47 -08:00
Linus Torvalds
775ba7ad49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Fix inotify maintainers entry
  Fix typo in new debug options.
  Jon needs a new shift key.
  fs: Convert kmalloc() + memset() to kzalloc() in fs/.
  configfs.h: Remove dead macro definitions.
  kconfig: Standardize "depends" -> "depends on" in Kconfig files
  e100: replace kmalloc with kcalloc
  um: replace kmalloc+memset with kzalloc
  fix typo in net/ipv4/ip_fragment.c
  include/linux/compiler.h: reject gcc 3 < gcc 3.2
  Kconfig: fix spelling error in config KALLSYMS help text
  Remove duplicate "have to" in comment
  Fix small typo in drivers/serial/icom.c
  Use consistent casing in help message
  EXT{2,3,4}_FS: remove outdated part of the help text
2006-12-12 18:51:51 -08:00
Dave Jones
c4366889dd Merge ../linus
Conflicts:

	drivers/cpufreq/cpufreq.c
2006-12-12 17:41:41 -05:00
Robert P. J. Day
df4365ce88 configfs.h: Remove dead macro definitions.
Delete the __ATTR-related macro definitions since these are now
defined in include/linux/sysfs.h.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12 20:05:50 +01:00
Alistair John Strachan
53569ab785 include/linux/compiler.h: reject gcc 3 < gcc 3.2
The kernel doesn't compile with GCC <3.2, do not allow it to succeed if GCC
3.0.x or 3.1.x are used.

Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12 19:28:50 +01:00
Rolf Eike Beer
93aec20400 Remove duplicate "have to" in comment
Introduced in commit 7cc13edc13.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12 19:23:02 +01:00
Linus Torvalds
659dba3480 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Fix OMAP clock prescaler to match the comment
  i2c: Refactor a kfree in i2c-dev
  i2c: Fix return value check in i2c-dev
  i2c: Enable PEC on more i2c-i801 devices
  i2c: Discard the i2c algo del_bus wrappers
  i2c: New ARM Versatile/Realview bus driver
  i2c: fix broken ds1337 initialization
  i2c: i2c-i801 documentation update
  i2c: Use the __ATTR macro where possible
  i2c: Whitespace cleanups
  i2c: Use put_user instead of copy_to_user where possible
  i2c: New Atmel AT91 bus driver
  i2c: Add support for nested i2c bus locking
  i2c: Cleanups to the i2c-nforce2 bus driver
  i2c: Add request/release_mem_region to i2c-ibm_iic bus driver
  i2c: New Philips PNX bus driver
  i2c: Delete the broken i2c-ite bus driver
  i2c: Update the list of driver IDs
  i2c: Fix documentation typos
2006-12-12 09:57:55 -08:00
Jean Delvare
8e9afcbbde hwmon/it87: Remove the SMBus interface support
This interface was useless as the LPC ISA-like interface is always
available, is faster, and is more reliable. This cuts the driver
size by some 20%.

This change is also required to later convert the it87 driver to a
platform driver, so that we can get rid of i2c-isa in a near future.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-12 18:18:28 +01:00
Ingo Molnar
99a3eb3845 [PATCH] lockdep: fix seqlock_init()
seqlock_init() needs to use spin_lock_init() for dynamic locks, so that
lockdep is notified about the presence of a new lock.

(this is a fallout of the recent networking merge, which started using
the so-far unused seqlock_init() API.)

This fix solves the following lockdep-internal warning on current -git:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
     __lock_acquire+0x10c/0x9f9
     lock_acquire+0x56/0x72
     _spin_lock+0x35/0x42
     neigh_destroy+0x9d/0x12e
     neigh_periodic_timer+0x10a/0x15c
     run_timer_softirq+0x126/0x18e
     __do_softirq+0x6b/0xe6
     do_softirq+0x64/0xd2
     ksoftirqd+0x82/0x138

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-12 08:10:44 -08:00
Boaz Harrosh
2b02a17920 [PATCH] remove blk_queue_activity_fn
While working on bidi support at struct request level
I have found that blk_queue_activity_fn is actually never used.
The only user is in ide-probe.c with this code:

	/* enable led activity for disk drives only */
	if (drive->media == ide_disk && hwif->led_act)
		blk_queue_activity_fn(q, hwif->led_act, drive);

And led_act is never initialized anywhere.
(Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18)
Unless it is all for future use off course.
(this patch is against linux-2.6-block.git as off 2006/12/4)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-12 10:22:23 +01:00
Linus Torvalds
4259cb25d4 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  [NETPOLL]: Fix local_bh_enable() warning.
  [IPVS]: Make ip_vs_sync.c <= 80col wide.
  [IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep()
  [HAMRADIO]: Fix baycom_epp.c compile failure.
  [DCCP]: Whitespace cleanups
  [DCCP] ccid3: Fixup some type conversions related to rtts
  [DCCP] ccid3: BUG-FIX - conversion errors
  [DCCP] ccid3: Reorder packet history source file
  [DCCP] ccid3: Reorder packet history header file
  [DCCP] ccid3: Make debug output consistent
  [DCCP] ccid3: Perform history operations only after packet has been sent
  [DCCP] ccid3: TX history - remove unused field
  [DCCP] ccid3: Shift window counter computation
  [DCCP] ccid3: Sanity-check RTT samples
  [DCCP] ccid3: Initialise RTT values
  [DCCP] ccid: Deprecate ccid_hc_tx_insert_options
  [DCCP]: Warn when discarding packet due to internal errors
  [DCCP]: Only deliver to the CCID rx side in charge
  [DCCP]: Simplify TFRC calculation
  [DCCP]: Debug timeval operations
  ...
2006-12-11 18:35:17 -08:00
Linus Torvalds
13d7d84e07 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits)
  [POWERPC] Generic BUG for powerpc
  [PPC] Fix compile failure do to introduction of PHY_POLL
  [POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
  [POWERPC] Remove old dcr.S
  [POWERPC] Fix SPU coredump code for max_fdset removal
  [POWERPC] Fix irq routing on some 32-bit PowerMacs
  [POWERPC] ps3: Add vuart support
  [POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
  [POWERPC] dont allow pSeries_probe to succeed without initialising MMU
  [POWERPC] micro optimise pSeries_probe
  [POWERPC] Add SPURR SPR to sysfs
  [POWERPC] Add DSCR SPR to sysfs
  [POWERPC] Fix 440SPe CPU table entry
  [POWERPC] Add support for FP emulation for the e300c2 core
  [POWERPC] of_device_register: propagate device_create_file return code
  [POWERPC] Fix mmap of PCI resource with hack for X
  [POWERPC] iSeries: head_64.o needs to depend on lparmap.s
  [POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group
  [POWERPC] Remove QE header files from lite5200.c
  [POWERPC] of_platform_make_bus_id(): make `magic' int
  ...
2006-12-11 18:24:58 -08:00
Arnaldo Carvalho de Melo
8109b02b53 [DCCP]: Whitespace cleanups
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:35:00 -08:00
Gerrit Renker
1a21e49a8d [DCCP] ccid3: Finer-grained resolution of sending rates
This patch
 * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
 * simplifies the present overflow problems in calculations

Current sending rate X and the cached value X_recv of the receiver-estimated
sending rate are both scaled by 64 (2^6) in order to
 * cope with low sending rates (minimally 1 byte/second)
 * allow upgrading to use a packets-per-second implementation of CCID 3
 * avoid calculation errors due to integer arithmetic cut-off

The patch implements a revised strategy from
http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html

The only difference with regard to that strategy is that t_ipi is already
used in the calculation of the nofeedback timeout, which saves one division.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:42 -08:00
Linus Torvalds
8d610dd52d Make sure we populate the initroot filesystem late enough
We should not initialize rootfs before all the core initializers have
run.  So do it as a separate stage just before starting the regular
driver initializers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 12:12:04 -08:00
Linus Torvalds
8993780a6e Make SLES9 "get_kernel_version" work on the kernel binary again
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary.  It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)

That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba:
"[PATCH] Fix linux banner utsname information").

This just restores "linux_banner" as a static string, which should fix
the version finding.  And /proc/version simply uses a different string.

To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.

Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 11:34:11 -08:00
Kumar Gala
d10f73480b [PPC] Fix compile failure do to introduction of PHY_POLL
PHY_POLL is defined in <linux/phy.h> include it in <linux/fsl_devices.h> so
board code will have it defined.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2006-12-10 23:26:16 -06:00
Jean Delvare
3269711b76 i2c: Discard the i2c algo del_bus wrappers
They are all only calling i2c_del_adapter, so we may as well do
it directly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-10 21:21:33 +01:00
David Brownell
438d6c2c01 i2c: Whitespace cleanups
Remove extraneous whitespace from various i2c headers and core files,
like space-before-tab and whitespace at end of line.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-10 21:21:31 +01:00
Jiri Kosina
6ea23039cb i2c: Add support for nested i2c bus locking
This patch adds the 'level' field into the i2c_adapter structure, which is 
used to represent the 'logical' level of nesting for the purposes of 
lockdep. This field is then used in the i2c_transfer() function, to 
acquire the per-adapter bus_lock with correct nesting level.

Signed-off-by: Jiri Kosina <jikos@jikos.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-10 21:21:30 +01:00
Vitaly Wool
41561f28e7 i2c: New Philips PNX bus driver
New I2C bus driver for Philips ARM boards (Philips IP3204 I2C IP
block). This I2C controller can be found on (at least) PNX010x,
PNX52xx and PNX4008 Philips boards.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-10 21:21:29 +01:00
Jean Delvare
51fd554b65 i2c: Delete the broken i2c-ite bus driver
The rest of the ITE8172 support was already removed from MIPS tree.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-10 21:21:29 +01:00
Jean Delvare
36cfb5ccfa i2c: Update the list of driver IDs
* A chip driver ID was assigned to the Radeon, while it is an adapter
  so it needs an i2c adapter ID.
* The SAA7191 is a video decoder, not encoder.
* The icspll driver is dead, and will never be ported to Linux 2.6.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2006-12-10 21:21:28 +01:00
Linus Torvalds
bb7320d1d9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits)
  V4L/DVB 4949b: Fix container_of pointer retreival
  V4L/DVB (4949a): Fix INIT_WORK
  V4L/DVB (4949): Cxusb: codingstyle cleanups
  V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach
  V4L/DVB (4947): Cx88: trivial cleanups
  V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area
  V4L/DVB (4945): Cx88: consolidate cx22702_config structs
  V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach
  V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners
  V4L/DVB (4953): Usbvision minor fixes
  V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP
  V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations
  V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting
  V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach
  V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations
  V4L/DVB (4942): Whitespace cleanups
  V4L/DVB (4937): Usbvision cleanup and code reorganization
  V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal
  V4L/DVB (4935): Added the capability of selecting fm gain by tuner
  V4L/DVB (4934): Usbvision radio requires GainNormal at e register
  ...
2006-12-10 09:59:18 -08:00
Avi Kivity
6aa8b732ca [PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net

mailing list: kvm-devel@lists.sourceforge.net
  (http://lists.sourceforge.net/lists/listinfo/kvm-devel)

The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture.  The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace.  Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.

Using this driver, one can start multiple virtual machines on a host.

Each virtual machine is a process on the host; a virtual cpu is a thread in
that process.  kill(1), nice(1), top(1) work as expected.  In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode.  Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm).  Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.

The driver supports i386 and x86_64 hosts and guests.  All combinations are
allowed except x86_64 guest on i386 host.  For i386 guests and hosts, both pae
and non-pae paging modes are supported.

SMP hosts and UP guests are supported.  At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.

Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch.  We plan to address this in two ways:

- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables

Currently a virtual desktop is responsive but consumes a lot of CPU.  Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization.  Linux/X is slower, probably due
to X being in a separate process.

In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.

Caveats (akpm: might no longer be true):

- The Windows install currently bluescreens due to a problem with the
  virtual APIC.  We are working on a fix.  A temporary workaround is to
  use an existing image or install through qemu
- Windows 64-bit does not work.  That's also true for qemu, so it's
  probably a problem with the device model.

[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Daniel Walker
f5f1a24a2c [PATCH] clocksource: small cleanup
Mostly changing alignment.  Just some general cleanup.

[akpm@osdl.org: build fix]
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Arjan van de Ven
4c36a5dec2 [PATCH] round_jiffies infrastructure
Introduce a round_jiffies() function as well as a round_jiffies_relative()
function.  These functions round a jiffies value to the next whole second.
The primary purpose of this rounding is to cause all "we don't care exactly
when" timers to happen at the same jiffy.

This avoids multiple timers firing within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep CPU
sleep states and thus waste power.

The exact wakeup moment is skewed by the cpu number, to avoid all cpus from
waking up at the exact same time (and hitting the same lock/cachelines
there)

[akpm@osdl.org: fix variable type]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
5466b456ed [PATCH] fdtable: Implement new pagesize-based fdtable allocator
This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries.  The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

    pagesize / 4
    pagesize / 2
    pagesize      <- memory allocator switch point
    pagesize * 2
    pagesize * 4
    ...etc...

Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.

Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop.  Even together, they will still be, by far, smaller
than the fdarray.  The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
4fd45812cb [PATCH] fdtable: Remove the free_files field
An fdtable can either be embedded inside a files_struct or standalone (after
being expanded).  When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded.  We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
bbea9f6966 [PATCH] fdtable: Make fdarray and fdsets equal in size
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Raz Ben-Jehuda(caro)
46031f9a38 [PATCH] md: allow reads that have bypassed the cache to be retried on failure
If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown <neilb@suse.de>

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown <neilb@suse.de>

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:20 -08:00
Alan Cox
ee2f344b33 [PATCH] ide-cd: Handle strange interrupt on the Intel ESB2
The ESB2 appears to emit spurious DMA interrupts when configured for native
mode and handling ATAPI devices.  Stratus were able to pin this bug down and
produce a patch.  This is a rework which applies the fixup only to the ESB2
(for now).  We can apply it to other chips later if the same problem is found.

This code has been tested and confirmed to fix the problem on the tested
systems.

Signed-off-by: Alan Cox <alan@redhat.com>
(Most of the hard work done by Stratus however)
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:20 -08:00
Chen, Kenneth W
06066714f6 [PATCH] sched: remove lb_stopbalance counter
Remove scheduler stats lb_stopbalance counter.  This counter can be
calculated by: lb_balanced - lb_nobusyg - lb_nobusyq.  There is no need to
create gazillion counters while we can derive the value.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Siddha, Suresh B
783609c6cb [PATCH] sched: decrease number of load balances
Currently at a particular domain, each cpu in the sched group will do a
load balance at the frequency of balance_interval.  More the cores and
threads, more the cpus will be in each sched group at SMP and NUMA domain.
And we endup spending quite a bit of time doing load balancing in those
domains.

Fix this by making only one cpu(first idle cpu or first cpu in the group if
all the cpus are busy) in the sched group do the load balance at that
particular sched domain and this load will slowly percolate down to the
other cpus with in that group(when they do load balancing at lower
domains).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Christoph Lameter
08c183f31b [PATCH] sched: add option to serialize load balancing
Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
to the sched domain flags.  If that flag is set then we make sure that no
other such domain is being balanced.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Christoph Lameter
c9819f4593 [PATCH] sched: use softirq for load balancing
Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
softirq.

We calculate the earliest time for each layer of sched domains to be rescanned
(this is the rescan time for idle) and use the earliest of those to schedule
the softirq via a new field "next_balance" added to struct rq.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:42 -08:00
Siddha, Suresh B
ac7d550499 [PATCH] sched domain: increase the SMT busy rebalance interval
With SMT, if the logical processor is busy, load balance happens for every
8msec(min)-16msec(max).  There is no need to do this often, as this is just
for fairness(to maintain uniform runqueue lengths) and default time slice
anyhow is 100msec.

Appended patch increases this interval to 64msec(min)-128msec(max) when the
logical processor is busy.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:42 -08:00
Andrew Morton
4a7864ca63 [PATCH] io-accounting: via taskstats
Deliver IO accounting via taskstats.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andrew Morton
f2f1f8a3b8 [PATCH] cleanup taskstats.h
Fix weird whitespace mangling in taskstats.h

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andrew Morton
7c3ab7381e [PATCH] io-accounting: core statistics
The present per-task IO accounting isn't very useful.  It simply counts the
number of bytes passed into read() and write().  So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.

(David Wright had some comments on the applicability of the present logical IO accounting:

  For billing purposes it is useless but for workload analysis it is very
  useful

  read_bytes/read_calls  average read request size
  write_bytes/write_calls average write request size

  read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
  write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                be missed

  I often look for logical larger than physical to see filesystem cache
  problems.  And the bytes/cpusec can help find applications that are
  dominating the cache and causing slow interactive response from page cache
  contention.

  I want to find the IO intensive applications and make sure they are doing
  efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
  IO commands).

This patchset adds new accounting which tries to be more accurate.  We account
for three things:

reads:

  attempt to count the number of bytes which this process really did cause
  to be fetched from the storage layer.  Done at the submit_bio() level, so it
  is accurate for block-backed filesystems.  I also attempt to wire up NFS and
  CIFS.

writes:

  attempt to count the number of bytes which this process caused to be sent
  to the storage layer.  This is done at page-dirtying time.

  The big inaccuracy here is truncate.  If a process writes 1MB to a file
  and then deletes the file, it will in fact perform no writeout.  But it will
  have been accounted as having caused 1MB of write.

  So...

cancelled_writes:

  account the number of bytes which this process caused to not happen, by
  truncating pagecache.

  We _could_ just subtract this from the process's `write' accounting.  But
  that means that some processes would be reported to have done negative
  amounts of write IO, which is silly.

  So we just report the raw number and punt this decision up to userspace.

Now, we _could_ account for writes at the physical I/O level.  But

- This would require that we track memory-dirtying tasks at the per-page
  level (would require a new pointer in struct page).

- It would mean that IO statistics for a process are usually only available
  long after that process has exitted.  Which means that we probably cannot
  communicate this info via taskstats.

This patch:

Wire up the kernel-private data structures and the accessor functions to
manipulate them.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
David Woodhouse
58f64d83c3 [PATCH] Fix noise in futex.h
There are some kernel-only bits in the middle of <linux/futex.h> which
should be removed in what we export to userspace.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Alexey Dobriyan
1f29bcd739 [PATCH] sysctl: remove unused "context" param
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Scott Wood
884b4aaaa2 [PATCH] rtc: Add rtc_merge_alarm()
Add rtc_merge_alarm(), which can be used by rtc drivers to turn a partially
specified alarm expiry (i.e.  most significant fields set to -1, as with the
RTC_ALM_SET ioctl()) into a fully specified expiry.

If the most significant specified field is earlier than the current time, the
least significant unspecified field is incremented.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:40 -08:00
Randy Dunlap
5c543eff6c [PATCH] freezer.h uses task_struct fields
freezer.h uses task_struct fields so it should include sched.h.

  CC [M]  fs/jfs/jfs_txnmgr.o
In file included from fs/jfs/jfs_txnmgr.c:49:
include/linux/freezer.h: In function 'frozen':
include/linux/freezer.h:9: error: dereferencing pointer to incomplete type
include/linux/freezer.h:9: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:9: error: (Each undeclared identifier is reported only once
include/linux/freezer.h:9: error: for each function it appears in.)
include/linux/freezer.h: In function 'freezing':
include/linux/freezer.h:17: error: dereferencing pointer to incomplete type
include/linux/freezer.h:17: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'freeze':
include/linux/freezer.h:26: error: dereferencing pointer to incomplete type
include/linux/freezer.h:26: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'do_not_freeze':
include/linux/freezer.h:34: error: dereferencing pointer to incomplete type
include/linux/freezer.h:34: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'thaw_process':
include/linux/freezer.h:43: error: dereferencing pointer to incomplete type
include/linux/freezer.h:43: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:44: warning: implicit declaration of function 'wake_up_process'
include/linux/freezer.h: In function 'frozen_process':
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h:55: error: 'PF_FROZEN' undeclared (first use in this function)
fs/jfs/jfs_txnmgr.c: In function 'freezing':
include/linux/freezer.h:18: warning: control reaches end of non-void function
make[2]: *** [fs/jfs/jfs_txnmgr.o] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:40 -08:00
Jonathan Corbet
111f33564e V4L/DVB (4798): OmniVision OV7670 driver
This patch adds a V4L2 driver for the OmniVision OV7670 camera.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-10 08:51:14 -02:00
Jonathan Corbet
d905b382d7 V4L/DVB (4797): Marvell 88ALP01 "cafe" driver
A driver for the Marvell M88ALP01 "CAFE" CMOS integrated camera
controller.  This driver has been renamed "cafe_ccic" since my previous
patch set.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-10 08:51:13 -02:00
Jonathan Corbet
9c4dfadbde V4L/DVB (4796): A couple of V4L2 defines needed by Cafe Camara driver
Two defines for V4L2, needed by the Cafe camera driver:
1) Add the RGB444 image format
2) Add the "init" internal command which is separate from "reset".

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-10 08:51:12 -02:00
David S. Miller
d3dcc077bf [NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.
GLIBC uses them etc.

They are guarded by ifndef __KERNEL__ so nobody will start
accidently using them in the kernel again, it's just for
userspace.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:33 -08:00
J Hadi Salim
93366c537b [XFRM]: Fix XFRMGRP_REPORT to use correct multicast group.
XFRMGRP_REPORT uses 0x10 which is a group that belongs
to events. The correct value is 0x20.
We should really be using xfrm_nlgroups going forward; it was tempting
to delete the definition of XFRMGRP_REPORT but it would break at
least iproute2.

Signed-off-by: J Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:30 -08:00
Eric Dumazet
f0490980a1 [NET]: Force a cache line split in hh_cache in SMP.
hh_lock was converted from rwlock to seqlock by Stephen.

To have a 100% benefit of this change, I suggest to place read mostly fields
of hh_cache in a separate cache line, because hh_refcnt may be changed quite
frequently on some busy machines.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:29 -08:00
Thomas Graf
e07bca84cd [NETLINK]: Restore API compatibility of address and neighbour bits
Restore API compatibility due to bits moved from rtnetlink.h to
separate headers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:27 -08:00
Stephen Hemminger
3644f0cee7 [NET]: Convert hh_lock to seqlock.
The hard header cache is in the main output path, so using
seqlock instead of reader/writer lock should reduce overhead.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:20 -08:00
Jiri Kosina
4c2ae844b5 [PATCH] Generic HID layer - pb_fnmode
pb_fnmode parameter has to be passed to usbhid, both for compatibility reasons
and also because it logically belongs there.

Also removes empty hid-input.c file in drivers/usb/input.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:19 -08:00
Jiri Kosina
aa8de2f038 [PATCH] Generic HID layer - input and event reporting
hid_input_report() was needlessly USB-specific in USB HID. This patch
makes the function independent of HID implementation and fixes all
the current users. Bluetooth patches comply with this prototype.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:17 -08:00
Jiri Kosina
aa938f7974 [PATCH] Generic HID layer - hiddev
- hiddev is USB-only (agreed with Marcel Holtmann that Bluetooth currently
  doesn't need it, and future planned interface (rawhid) will be more flexible
  and usable)
- both HID and USB-hid can be now compiled as modules (wasn't possible before
  hiddev was fully separated from generic HID layer)

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:15 -08:00
Jiri Kosina
4916b3a57f [PATCH] Generic HID layer - USB API
- 'dev' in struct hid_device changed from struct usb_device to
  struct device and fixed all the users
- renamed functions which are part of USB HID API from 'hid_*' to
  'usbhid_*'
- force feedback initialization moved from common part into USB-specific
  driver
- added usbhid.h header for USB HID API users
- removed USB-specific fields from struct hid_device and moved them
  to new usbhid_device, which is pointed to by hid_device->driver_data
- fixed all USB users to use this new structure

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:14 -08:00
Jiri Kosina
229695e51e [PATCH] Generic HID layer - API
- fixed generic API (added neccessary EXPORT_SYMBOL, fixed hid.h to provide correct
  prototypes)
- extended hid_device with open/close/event function pointers to driver-specific
  functions
- added driver specific driver_data to hid_device

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:12 -08:00
Jiri Kosina
dde5845a52 [PATCH] Generic HID layer - code split
The "big main" split of USB HID code into generic HID code and
USB-transport specific HID handling.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08 10:43:01 -08:00
Kiyoshi Ueda
2e93ccc193 [PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.

This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core.  This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it.  Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application.  In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.

    DMF_NOFLUSH_SUSPENDING
    ----------------------

If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend().  It
is always cleared before dm_suspend() returns.

The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.

Target drivers can check this flag by calling dm_noflush_suspending().

    DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
    -----------------------------------

A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.

Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same.  This has been labelled 'pushback'.

The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.

    dec_pending
    -----------

dec_pending() saves the pushback request in struct dm_io->error.  Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list.  Note that this supercedes any I/O errors.

It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt).  dec_pending() checks for this and
returns -EIO if it happened.

    pushdback list and pushback_lock
    --------------------------------

The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.

md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.

Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag.  So md->pushback_lock is
held when checking the flag.  Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.

Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.

The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above.  So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.

    Other notes on the current patch
    --------------------------------

- md->pushback is added to the struct mapped_device instead of using
  md->deferred directly because md->io_lock which protects md->deferred is
  rw_semaphore and can't be used in interrupt context like dec_pending(),
  and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.

- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
  ioctl option is specified, because I/Os generated by lock_fs() would be
  pushed back and never return if there were no valid devices.

- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
  flag is set, md->pushback must be flushed because I/Os may be queued to
  the list already.  (flush_and_out label in dm_suspend())

    Test results
    ------------

I have tested using multipath target with the next patch.

The following tests are for regression/compatibility:
  - I/Os succeed when valid paths exist;
  - I/Os fail when there are no valid paths and queue_if_no_path is not
    set;
  - I/Os are queued in the multipath target when there are no valid paths and
    queue_if_no_path is set;
  - The queued I/Os above fail when suspend is issued without the
    DM_NOFLUSH_FLAG ioctl option.  I/Os spanning 2 multipath targets also
    fail.

The following tests are for the normal code path of new pushback feature:
  - Queued I/Os in the multipath target are flushed from the target
    but don't return when suspend is issued with the DM_NOFLUSH_FLAG
    ioctl option;
  - The I/Os above are queued in the multipath target again when
    resume is issued without path recovery;
  - The I/Os above succeed when resume is issued after path recovery
    or table load;
  - Queued I/Os in the multipath target succeed when resume is issued
    with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
    spanning 2 multipath targets also succeed.

The following tests are for the error paths of the new pushback feature:
  - When the bdget_disk() fails in dm_suspend(), the
    DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
    pushback list are flushed properly.
  - When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
      o I/Os which had already been queued to the pushback list
        at the time don't return, and are re-issued at resume time;
      o I/Os which hadn't been returned at the time return with EIO.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:09 -08:00
Kiyoshi Ueda
81fdb096db [PATCH] dm: ioctl: add noflush suspend
Provide a dm ioctl option to request noflush suspending.  (See next patch for
what this is for.) As the interface is extended, the version number is
incremented.

Other than accepting the new option through the interface, There is no change
to existing behaviour.

Test results:
Confirmed the option is given from user-space correctly.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:09 -08:00
Kiyoshi Ueda
45cbcd7983 [PATCH] dm: map and endio return code clarification
Tighten the use of return values from the target map and end_io functions.
Values of 2 and above are now explictly reserved for future use.  There are no
existing targets using such values.

The patch has no effect on existing behaviour.

o Reserve return values of 2 and above from target map functions.
  Any positive value currently indicates "mapping complete", but all
  existing drivers use the value 1.  We now make that a requirement
  so we can assign new meaning to higher values in future.

  The new definition of return values from target map functions is:
      < 0 : error
      = 0 : The target will handle the io (DM_MAPIO_SUBMITTED).
      = 1 : Mapping completed (DM_MAPIO_REMAPPED).
      > 1 : Reserved (undefined).  Previously this was the same as '= 1'.

o Reserve return values of 2 and above from target end_io functions
  for similar reasons.
  DM_ENDIO_INCOMPLETE is introduced for a return value of 1.

Test results:

  I have tested by using the multipath target.

  I/Os succeed when valid paths exist.

  I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set.

  I/Os fail when there are no valid paths and queue_if_no_path is not set.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:09 -08:00
Kiyoshi Ueda
a3d77d35be [PATCH] dm: suspend: parameter change
Change the interface of dm_suspend() so that we can pass several options
without increasing the number of parameters.  The existing 'do_lockfs' integer
parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG.

There is no functional change to the code.

Test results:
I have tested 'dmsetup suspend' command with/without the '--nolockfs'
option and confirmed the do_lockfs value is correctly set.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:09 -08:00
Helge Deller
adf6b20654 [PATCH] fbcmap.c: mark structs const or __read_mostly
- Mark the default colormaps read-only, as nobody should be allowed to
  modify them

- Additionally mark color values as __read_mostly since they will only be
  modified (very seldom) by fb_invert_cmaps()

- Add named C99-initializers in fb_cmap structs and use the ARRAY_SIZE()
  macro

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:05 -08:00
Don Mullis
6b1b60f41e [PATCH] fault-injection: defaults likely to please a new user
Assign defaults most likely to please a new user:
 1) generate some logging output
    (verbose=2)
 2) avoid injecting failures likely to lock up UI
    (ignore_gfp_wait=1, ignore_gfp_highmem=1)

Signed-off-by: Don Mullis <dwm@meer.net>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:03 -08:00
Don Mullis
08b3df2d16 [PATCH] fault-injection: Use bool-true-false throughout
Use bool-true-false throughout.

Signed-off-by: Don Mullis <dwm@meer.net>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:03 -08:00
Akinobu Mita
329409aeda [PATCH] fault injection: stacktrace filtering
This patch provides stacktrace filtering feature.

The stacktrace filter allows failing only for the caller you are
interested in.

For example someone may want to inject kmalloc() failures into
only e100 module. they want to inject not only direct kmalloc() call,
but also indirect allocation, too.

- e100_poll --> netif_receive_skb --> packet_rcv_spkt --> skb_clone
  --> kmem_cache_alloc

This patch enables to detect function calls like this by stacktrace
and inject failures. The script Documentaion/fault-injection/failmodule.sh
helps it.

The range of text section of loaded e100 is expected to be
[/sys/module/e100/sections/.text, /sys/module/e100/sections/.exit.text)

So failmodule.sh stores these values into /debug/failslab/address-start
and /debug/failslab/address-end. The maximum stacktrace depth is specified
by /debug/failslab/stacktrace-depth.

Please see the example that demonstrates how to inject slab allocation
failures only for a specific module
in Documentation/fault-injection/fault-injection.txt

[dwm@meer.net: reject failure if any caller lies within specified range]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Don Mullis <dwm@meer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:03 -08:00
Akinobu Mita
f4f154fd92 [PATCH] fault injection: process filtering for fault-injection capabilities
This patch provides process filtering feature.
The process filter allows failing only permitted processes
by /proc/<pid>/make-it-fail

Please see the example that demostrates how to inject slab allocation
failures into module init/cleanup code
in Documentation/fault-injection/fault-injection.txt

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:02 -08:00
Akinobu Mita
c17bb49517 [PATCH] fault-injection capability for disk IO
This patch provides fault-injection capability for disk IO.

Boot option:

fail_make_request=<probability>,<interval>,<space>,<times>

	<interval> -- specifies the interval of failures.

	<probability> -- specifies how often it should fail in percent.

	<space> -- specifies the size of free space where disk IO can be issued
		   safely in bytes.

	<times> -- specifies how many times failures may happen at most.

Debugfs:

/debug/fail_make_request/interval
/debug/fail_make_request/probability
/debug/fail_make_request/specifies
/debug/fail_make_request/times

Example:

	fail_make_request=10,100,0,-1
	echo 1 > /sys/blocks/hda/hda1/make-it-fail

generic_make_request() on /dev/hda1 fails once per 10 times.

Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:02 -08:00
Akinobu Mita
6ff1cb355e [PATCH] fault-injection capabilities infrastructure
This patch provides base functions implement to fault-injection
capabilities.

- The function should_fail() is taken from failmalloc-1.0
  (http://www.nongnu.org/failmalloc/)

[akpm@osdl.org: cleanups, comments, add __init]
Cc: <okuji@enbug.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Don Mullis <dwm@meer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:02 -08:00
Jiri Slaby
1328d737f5 [PATCH] Char: istallion, variables cleanup
- wipe gcc -W warnings by int -> uint conversion
- move 2 global variables into their local place

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:00 -08:00
Jiri Slaby
1f8ec435e3 [PATCH] Char: istallion, eliminate typedefs
Use only struct <name> instead of defining a new type <name_t>.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:00 -08:00
Jiri Slaby
6b2c9457bb [PATCH] Char: stallion, variables cleanup
- fix `gcc -W' un/signed warnings by converting some ints -> uints.
- move 3 global variables into functions, where are they used.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:59 -08:00
Alan Cox
606d099cdd [PATCH] tty: switch to ktermios
This is the grungy swap all the occurrences in the right places patch that
goes with the updates.  At this point we have the same functionality as
before (except that sgttyb() returns speeds not zero) and are ready to
begin turning new stuff on providing nobody reports lots of bugs

If you are a tty driver author converting an out of tree driver the only
impact should be termios->ktermios name changes for the speed/property
setting functions from your upper layers.

If you are implementing your own TCGETS function before then your driver
was broken already and its about to get a whole lot more painful for you so
please fix it 8)

Also fill in c_ispeed/ospeed on init for most devices, although the current
code will do this for you anyway but I'd like eventually to lose that extra
paranoia

[akpm@osdl.org: bluetooth fix]
[mp3@de.ibm.com: sclp fix]
[mp3@de.ibm.com: warning fix for tty3270]
[hugh@veritas.com: fix tty_ioctl powerpc build]
[jdike@addtoit.com: uml: fix ->set_termios declaration]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:57 -08:00
Alan Cox
edc6afc549 [PATCH] tty: switch to ktermios and new framework
This is the core of the switch to the new framework.  I've split it from the
driver patches which are mostly search/replace and would encourage people to
give this one a good hard stare.

The references to BOTHER and ISHIFT are the termios values that must be
defined by a platform once it wants to turn on "new style" ioctl support.  The
code patches here ensure that providing

1. The termios overlays the ktermios in memory
2. The only new kernel only fields are c_ispeed/c_ospeed (or none)

the existing behaviour is retained.  This is true for the patches at this
point in time.

Future patches will define BOTHER, ISHIFT and enable newer termios structures
for each architecture, and once they are all done some of the ifdefs also
vanish.

[akpm@osdl.org: warning fix]
[akpm@osdl.org: IRDA fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:56 -08:00
Jiri Slaby
ca7ed0f22f [PATCH] Char: stallion, kill typedefs
Typedefs are considered ugly in the kernel. Eliminate them.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:55 -08:00
Jiri Slaby
3306ce3d05 [PATCH] Char: mxser_new, upgrade to 1.9.1
Change cloned experimental driver according to original 1.9.1 moxa driver.
Some int->ulong conversions, outb ~UART_IER_THRI constant.  Remove commented
stuff.

I also added printk line with info, if somebody wants to test it, he may
contact me as I can potentially debug the driver with him or just to confirm
it works properly.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:53 -08:00
Sukadev Bhattiprolu
84d737866e [PATCH] add child reaper to pid_namespace
Add a per pid_namespace child-reaper.  This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space.  Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.

This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:52 -08:00
Cedric Le Goater
9a575a92db [PATCH] to nsproxy
Add the pid namespace framework to the nsproxy object.  The copy of the pid
namespace only increases the refcount on the global pid namespace,
init_pid_ns, and unshare is not implemented.

There is no configuration option to activate or deactivate this feature
because this not relevant for the moment.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:52 -08:00
Sukadev Bhattiprolu
61a58c6c23 [PATCH] rename struct pspace to struct pid_namespace
Rename struct pspace to struct pid_namespace for consistency with other
namespaces (uts_namespace and ipc_namespace).  Also rename
include/linux/pspace.h to include/linux/pid_namespace.h and variables from
pspace to pid_ns.

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:52 -08:00
Cedric Le Goater
373beb35cd [PATCH] identifier to nsproxy
Add an identifier to nsproxy.  The default init_ns_proxy has identifier 0 and
allocated nsproxies are given -1.

This identifier will be used by a new syscall sys_bind_ns.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:52 -08:00
Kirill Korotaev
6b3286ed11 [PATCH] rename struct namespace to struct mnt_namespace
Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'

Signed-off-by: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Cedric Le Goater
1ec320afdc [PATCH] add process_session() helper routine: deprecate old field
Add an anonymous union and ((deprecated)) to catch direct usage of the
session field.

[akpm@osdl.org: fix various missed conversions]
[jdike@addtoit.com: fix UML bug]
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Cedric Le Goater
937949d9ed [PATCH] add process_session() helper routine
Replace occurences of task->signal->session by a new process_session() helper
routine.

It will be useful for pid namespaces to abstract the session pid number.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
David Howells
312a0c1709 [PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant
Alter roundup_pow_of_two() so that it can make use of ilog2() on a constant to
produce a constant value, retaining the ability for an arch to override it in
the non-const case.

This permits the function to be used to initialise variables.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
David Howells
f0d1b0b30d [PATCH] LOG2: Implement a general integer log2 facility in the kernel
This facility provides three entry points:

	ilog2()		Log base 2 of unsigned long
	ilog2_u32()	Log base 2 of u32
	ilog2_u64()	Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

	int do_something(long q)
	{
		...;
		y = ilog2(x)
		...;
	}

Or can be used to statically initialise global variables with constant values:

	unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant.  They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Josef Sipek
225a719f79 [PATCH] struct path: convert lockd
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:47 -08:00
Josef "Jeff" Sipek
0f7fc9e4d0 [PATCH] VFS: change struct file to use struct path
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:41 -08:00
Josef "Jeff" Sipek
346f20ff60 [PATCH] struct path: move struct path from fs/namei.c into include/linux
Moved struct path from fs/namei.c to include/linux/namei.h.  This allows many
places in the VFS, as well as any stackable filesystem to easily keep track of
dentry-vfsmount pairs.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:40 -08:00
Josef "Jeff" Sipek
fec6d055da [PATCH] struct path: rename Reiserfs's struct path
Rename Reiserfs's struct path to struct treepath to prevent name collision
between it and struct path from fs/namei.c.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:40 -08:00
Josef "Jeff" Sipek
42cf11939b [PATCH] fsstack: Introduce fsstack_copy_{attr,inode}_*
Introduce several fsstack_copy_* functions which allow stackable filesystems
(such as eCryptfs and Unionfs) to easily copy over (currently only) inode
attributes.  This prevents code duplication and allows for code reuse.

[akpm@osdl.org: Remove unneeded wrapper]
[bunk@stusta.de: fs/stack.c should #include <linux/fs_stack.h>]
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:40 -08:00
Akinobu Mita
906d66df18 [PATCH] crc32: replace bitreverse by bitrev32
This patch replaces bitreverse() by bitrev32.  The only users of bitreverse()
are crc32 itself and via-velocity.

Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Akinobu Mita
a5cfc1ec58 [PATCH] bit reverse library
This patch provides two bit reverse functions and bit reverse table.

- reverse the order of bits in a u32 value

	u8 bitrev8(u8 x);

- reverse the order of bits in a u32 value

	u32 bitrev32(u32 x);

- byte reverse table

	const u8 byte_rev_table[256];

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Jeremy Fitzhardinge
7664c5a1da [PATCH] Generic BUG implementation
This patch adds common handling for kernel BUGs, for use by architectures as
they wish.  The code is derived from arch/powerpc.

The advantages of having common BUG handling are:
 - consistent BUG reporting across architectures
 - shared implementation of out-of-line file/line data
 - implement CONFIG_DEBUG_BUGVERBOSE consistently

This means that in inline impact of BUG is just the illegal instruction
itself, which is an improvement for i386 and x86-64.

A BUG is represented in the instruction stream as an illegal instruction,
which has file/line information associated with it.  This extra information is
stored in the __bug_table section in the ELF file.

When the kernel gets an illegal instruction, it first confirms it might
possibly be from a BUG (ie, in kernel mode, the right illegal instruction).
It then calls report_bug().  This searches __bug_table for a matching
instruction pointer, and if found, prints the corresponding file/line
information.  If report_bug() determines that it wasn't a BUG which caused the
trap, it returns BUG_TRAP_TYPE_NONE.

Some architectures (powerpc) implement WARN using the same mechanism; if the
illegal instruction was the result of a WARN, then report_bug(Q) returns
CONFIG_DEBUG_BUGVERBOSE; otherwise it returns BUG_TRAP_TYPE_BUG.

lib/bug.c keeps a list of loaded modules which can be searched for __bug_table
entries.  The architecture must call
module_bug_finalize()/module_bug_cleanup() from its corresponding
module_finalize/cleanup functions.

Unsetting CONFIG_DEBUG_BUGVERBOSE will reduce the kernel size by some amount.
At the very least, filename and line information will not be recorded for each
but, but architectures may decide to store no extra information per BUG at
all.

Unfortunately, gcc doesn't have a general way to mark an asm() as noreturn, so
architectures will generally have to include an infinite loop (or similar) in
the BUG code, so that gcc knows execution won't continue beyond that point.
gcc does have a __builtin_trap() operator which may be useful to achieve the
same effect, unfortunately it cannot be used to actually implement the BUG
itself, because there's no way to get the instruction's address for use in
generating the __bug_table entry.

[randy.dunlap@oracle.com: Handle BUG=n, GENERIC_BUG=n to prevent build errors]
[bunk@stusta.de: include/linux/bug.h must always #include <linux/module.h]
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
NeilBrown
d63a5a74de [PATCH] lockdep: avoid lockdep warning in md
md_open takes ->reconfig_mutex which causes lockdep to complain.  This
(normally) doesn't have deadlock potential as the possible conflict is with a
reconfig_mutex in a different device.

I say "normally" because if a loop were created in the array->member hierarchy
a deadlock could happen.  However that causes bigger problems than a deadlock
and should be fixed independently.

So we flag the lock in md_open as a nested lock.  This requires defining
mutex_lock_interruptible_nested.

Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Peter Zijlstra
2e7b651df1 [PATCH] remove the old bd_mutex lockdep annotation
Remove the old complex and crufty bd_mutex annotation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Thomas Maier
32694850a9 [PATCH] pktcdvd: add sysfs and debugfs interface
Add a sysfs and debugfs interface to the pktcdvd driver.

Look into the Documentation/ABI/testing/* files in the patch for more info.

Signed-off-by: Thomas Maier <balagi@justmail.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Thomas Maier
0a0fc9601d [PATCH] pktcdvd: bio write congestion using congestion_wait()
This adds a bio write queue congestion control to the pktcdvd driver with
fixed on/off marks.  It prevents that the driver consumes a unlimited
amount of write requests.

[akpm@osdl.org: sync with congestion_wait() renaming]
Signed-off-by: Thomas Maier <balagi@justmail.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Oleg Nesterov
ae424ae4b5 [PATCH] make set_special_pids() static
Make set_special_pids() static, the only caller is daemonize().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Peter Zijlstra
24ec839c43 [PATCH] tty: ->signal->tty locking
Fix the locking of signal->tty.

Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand.  And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.

(NOTE: sys_unshare() is broken wrt ->sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid.  Otherwise the lifetime of ttys
are governed by their open file handles.  This leaves some holes for tty
access from signal->tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
       be examined by someone familiar with the security framework, I think
       it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
       invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Dmitry Torokhov
bef986502f Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/usb/input/hid.h
2006-12-08 01:07:56 -05:00
Linus Torvalds
ea14fad0d4 Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (76 commits)
  [ARM] 4002/1: S3C24XX: leave parent IRQs unmasked
  [ARM] 4001/1: S3C24XX: shorten reboot time
  [ARM] 3983/2: remove unused argument to __bug()
  [ARM] 4000/1: Osiris: add third serial port in
  [ARM] 3999/1: RX3715: suspend to RAM support
  [ARM] 3998/1: VR1000: LED platform devices
  [ARM] 3995/1: iop13xx: add iop13xx support
  [ARM] 3968/1: iop13xx: add iop13xx_defconfig
  [ARM] Update mach-types
  [ARM] Allow gcc to optimise arm_add_memory a little more
  [ARM] 3991/1: i.MX/MX1 high resolution time source
  [ARM] 3990/1: i.MX/MX1 more precise PLL decode
  [ARM] 3986/1: H1940: suspend to RAM support
  [ARM] 3985/1: ixp4xx clocksource cleanup
  [ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2)
  [ARM] 3994/1: ixp23xx: fix handling of pci master aborts
  [ARM] 3981/1: sched_clock for PXA2xx
  [ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to 63 bit
  [ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit period
  [ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter
  ...
2006-12-07 15:40:39 -08:00
Linus Torvalds
6ee7e78e7c Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] replace kmalloc+memset with kzalloc
  [IA64] resolve name clash by renaming is_available_memory()
  [IA64] Need export for csum_ipv6_magic
  [IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
  [PATCH] Add support for type argument in PAL_GET_PSTATE
  [IA64] tidy up return value of ip_fast_csum
  [IA64] implement csum_ipv6_magic for ia64.
  [IA64] More Itanium PAL spec updates
  [IA64] Update processor_info features
  [IA64] Add se bit to Processor State Parameter structure
  [IA64] Add dp bit to cache and bus check structs
  [IA64] SN: Correctly update smp_affinty mask
  [IA64] sparse cleanups
  [IA64] IA64 Kexec/kdump
2006-12-07 15:39:22 -08:00
Trond Myklebust
21b4e73692 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus 2006-12-07 16:35:17 -05:00
Trond Myklebust
34161db6b1 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus
Conflicts:

	include/linux/sunrpc/xprt.h
	net/sunrpc/xprtsock.c
Fix up conflicts with the workqueue changes.
2006-12-07 15:48:15 -05:00
Zou Nan hai
a79561134f [IA64] IA64 Kexec/kdump
Changes and updates.

1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-12-07 09:51:35 -08:00
Linus Torvalds
68380b5813 Add "run_scheduled_work()" workqueue function
This allows workqueue users to run just their own pending work, rather
than wait for the whole workqueue to finish running.  This solves the
deadlock with networking libphy that was due to other workqueue entries
possibly needing a lock that was held by the routine that wanted to
flush its own work.

It's not wonderful: if you absolutely need to synchronize with the work
function having been executed, any user strictly speaking should have
its own completion tracking logic, since when we run things explicitly
by hand, the generic workqueue layer can no longer help us synchronize.

Also, this is strictly only usable for work that has been scheduled
without any delayed timers.  You can not mix the new interface with
schedule_delayed_work().

But it's better than what we had currently.

Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 09:28:19 -08:00
Linus Torvalds
1c1afa3c05 Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits)
  [DLM] Clean up lowcomms
  [GFS2] Change gfs2_fsync() to use write_inode_now()
  [GFS2] Fix indent in recovery.c
  [GFS2] Don't flush everything on fdatasync
  [GFS2] Add a comment about reading the super block
  [GFS2] Mount problem with the GFS2 code
  [GFS2] Remove gfs2_check_acl()
  [DLM] fix format warnings in rcom.c and recoverd.c
  [GFS2] lock function parameter
  [DLM] don't accept replies to old recovery messages
  [DLM] fix size of STATUS_REPLY message
  [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
  [DLM] fix add_requestqueue checking nodes list
  [GFS2] Fix recursive locking in gfs2_getattr
  [GFS2] Fix recursive locking in gfs2_permission
  [GFS2] Reduce number of arguments to meta_io.c:getbuf()
  [GFS2] Move gfs2_meta_syncfs() into log.c
  [GFS2] Fix journal flush problem
  [GFS2] mark_inode_dirty after write to stuffed file
  [GFS2] Fix glock ordering on inode creation
  ...
2006-12-07 09:13:20 -08:00
Linus Torvalds
2685b267bc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
  [NETFILTER]: Fix non-ANSI func. decl.
  [TG3]: Identify Serdes devices more clearly.
  [TG3]: Use msleep.
  [TG3]: Use netif_msg_*.
  [TG3]: Allow partial speed advertisement.
  [TG3]: Add TG3_FLG2_IS_NIC flag.
  [TG3]: Add 5787F device ID.
  [TG3]: Fix Phy loopback.
  [WANROUTER]: Kill kmalloc debugging code.
  [TCP] inet_twdr_hangman: Delete unnecessary memory barrier().
  [NET]: Memory barrier cleanups
  [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.
  audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
  audit: Add auditing to ipsec
  [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n
  [IrDA]: Incorrect TTP header reservation
  [IrDA]: PXA FIR code device model conversion
  [GENETLINK]: Fix misplaced command flags.
  [NETLIK]: Add a pointer to the Generic Netlink wiki page.
  [IPV6] RAW: Don't release unlocked sock.
  ...
2006-12-07 09:05:15 -08:00
Linus Torvalds
4522d58275 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:59:11 -08:00
Andrew Morton
6cf24f031b [PATCH] elf.h: forward declare struct file
In file included from include/asm/patch.h:14,
		 from arch/ia64/kernel/patch.c:10:
  include/linux/elf.h:375: warning: "struct file" declared inside parameter list
  include/linux/elf.h:375: warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:48 -08:00
Corey Minyard
4d7cbac7c8 [PATCH] IPMI: Fix BT long busy
The IPMI BT subdriver has been patched to survive "long busy" timeouts seen
during firmware upgrades and resets.  The patch never returns the HOSED state,
synthesizes response messages with meaningful completion codes, and recovers
gracefully when the hardware finishes the long busy.  The subdriver now issues
a "Get BT Capabilities" command and properly uses those results.  More
informative completion codes are returned on error from transaction starts;
this logic was propogated to the KCS and SMIC subdrivers.  Finally, indent and
other style quirks were normalized.

Signed-off-by: Rocky Craig <rocky.craig@hp.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:47 -08:00
Corey Minyard
b9675136e2 [PATCH] IPMI: Add maintenance mode
Some commands and operations on a BMC can cause the BMC to "go away" for a
while.  This can cause the automatic flag processing and other things of that
nature to timeout and generate annoying logs, or possibly cause other bad
things to happen when in firmware update mode.

Add detection of those commands (cold reset, warm reset, and any firmware
command) and turns off automatic processing for 30 seconds.  It also add a
manual override either way.

Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:47 -08:00
Corey Minyard
759643b874 [PATCH] IPMI: pass sysfs name from lower level driver
Pass in the sysfs name from the lower-level IPMI driver, as the coming IPMI
serial driver will need that to link properly from the serial device sysfs
directory.

Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:47 -08:00
Paul Clements
6b39bb6548 [PATCH] nbd: show nbd client pid in sysfs
Allow nbd to expose the nbd-client daemon's PID in /sys/block/nbd<x>/pid.

This is helpful for tracking connection status of a device and for
determining which nbd devices are currently in use.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:47 -08:00
Benjamin LaHaise
97d2a80584 [PATCH] aio: remove ki_retried debugging member
Remove the ki_retried member from struct kiocb.  I think the idea was
bounced around a while back, but Arnaldo pointed out another reason that we
should dig it up when he pointed out that the last cacheline of struct
kiocb only contains 4 bytes.  By removing the debugging member, we save
more than the 8 byte on 64 bit machines.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Magnus Damm
85916f8166 [PATCH] Kexec / Kdump: Unify elf note code
The elf note saving code is currently duplicated over several
architectures.  This cleanup patch simply adds code to a common file and
then replaces the arch-specific code with calls to the newly added code.

The only drawback with this approach is that s390 doesn't fully support
kexec-on-panic which for that arch leads to introduction of unused code.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Helge Deller
15ad7cdcfd [PATCH] struct seq_operations and struct file_operations constification
- move some file_operations structs into the .rodata section

 - move static strings from policy_types[] array into the .rodata section

 - fix generic seq_operations usages, so that those structs may be defined
   as "const" as well

[akpm@osdl.org: couple of fixes]
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Robert P. J. Day
a0e7688df1 [PATCH] Kbuild: add 3 more header files to get properly "unifdef"ed
Add 3 more files to get "unifdef"ed when creating sanitized headers with
"make headers_install".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Mariusz Kozlowski
5296c7bec8 [PATCH] fs: reiserfs add missing brackets
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Jordan Crouse
65867beb0d [PATCH] Trivial cleanup in the PCI IDs for the CS5535
Rename a poorly worded PCI ID for the Geode GX and CS5535 companion chips.
The graphics processor and host bridge actually live in the northbridge on
the integrated processor, not in the companion chip.

Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Adrian Bunk
0da1480ec3 [PATCH] proper prototype for remove_inode_dquot_ref()
Add a proper prototype for remove_inode_dquot_ref() in
include/linux/quotaops.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Arnaldo Carvalho de Melo
12d40e43d2 [PATCH] Save some bytes in struct inode
[acme@newtoy net-2.6.20]$ pahole --cacheline 64 fs/inode.o inode
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/dcache.h:86 */
struct inode {
        struct hlist_node          i_hash;               /*     0     8 */
        struct list_head           i_list;               /*     8     8 */
        struct list_head           i_sb_list;            /*    16     8 */
        struct list_head           i_dentry;             /*    24     8 */
        long unsigned int          i_ino;                /*    32     4 */
        atomic_t                   i_count;              /*    36     4 */
        umode_t                    i_mode;               /*    40     2 */

        /* XXX 2 bytes hole, try to pack */

        unsigned int               i_nlink;              /*    44     4 */
        uid_t                      i_uid;                /*    48     4 */
        gid_t                      i_gid;                /*    52     4 */
        dev_t                      i_rdev;               /*    56     4 */
        loff_t                     i_size;               /*    60     8 */
        struct timespec            i_atime;              /*    68     8 */
        struct timespec            i_mtime;              /*    76     8 */
        struct timespec            i_ctime;              /*    84     8 */
        unsigned int               i_blkbits;            /*    92     4 */
        long unsigned int          i_version;            /*    96     4 */
        blkcnt_t                   i_blocks;             /*   100     4 */
        short unsigned int         i_bytes;              /*   104     2 */

        /* XXX 2 bytes hole, try to pack */

        spinlock_t                 i_lock;               /*   108    40 */
        struct mutex               i_mutex;              /*   148    76 */
        struct rw_semaphore        i_alloc_sem;          /*   224    64 */
        struct inode_operations *  i_op;                 /*   288     4 */
        const struct file_operations  * i_fop;           /*   292     4 */
        struct super_block *       i_sb;                 /*   296     4 */
        struct file_lock *         i_flock;              /*   300     4 */
        struct address_space *     i_mapping;            /*   304     4 */
        struct address_space       i_data;               /*   308   188 */
        struct list_head           i_devices;            /*   496     8 */
        union                      ;                     /*   504     4 */
        int                        i_cindex;             /*   508     4 */
        __u32                      i_generation;         /*   512     4 */
        /* ---------- cacheline 8 boundary ---------- */
        long unsigned int          i_dnotify_mask;       /*   516     4 */
        struct dnotify_struct *    i_dnotify;            /*   520     4 */
        struct list_head           inotify_watches;      /*   524     8 */
        struct mutex               inotify_mutex;        /*   532    76 */
        long unsigned int          i_state;              /*   608     4 */
        long unsigned int          dirtied_when;         /*   612     4 */
        unsigned int               i_flags;              /*   616     4 */
        atomic_t                   i_writecount;         /*   620     4 */
        void *                     i_security;           /*   624     4 */
        void *                     i_private;            /*   628     4 */
}; /* size: 632, sum members: 628, holes: 2, sum holes: 4 */

[acme@newtoy net-2.6.20]$

So just moving i_mode to after i_bytes we save 4 bytes by nuking both holes:

[acme@newtoy net-2.6.20]$ codiff -V /tmp/inode.o.before fs/inode.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/fs/inode.c:
  struct inode |   -4
    i_mode;
     from: umode_t               /*    40(0)     2(0) */
     to:   umode_t               /*   102(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

I've prunned all the other offset changes, only this one is of interest here.

So now we have:

[acme@newtoy net-2.6.20]$ pahole --cacheline 64 ../OUTPUT/qemu/net-2.6.20/fs/inode.o inode
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/dcache.h:86 */
struct inode {
        struct hlist_node          i_hash;               /*     0     8 */
        struct list_head           i_list;               /*     8     8 */
        struct list_head           i_sb_list;            /*    16     8 */
        struct list_head           i_dentry;             /*    24     8 */
        long unsigned int          i_ino;                /*    32     4 */
        atomic_t                   i_count;              /*    36     4 */
        unsigned int               i_nlink;              /*    40     4 */
        uid_t                      i_uid;                /*    44     4 */
        gid_t                      i_gid;                /*    48     4 */
        dev_t                      i_rdev;               /*    52     4 */
        loff_t                     i_size;               /*    56     8 */
        /* ---------- cacheline 1 boundary ---------- */
        struct timespec            i_atime;              /*    64     8 */
        struct timespec            i_mtime;              /*    72     8 */
        struct timespec            i_ctime;              /*    80     8 */
        unsigned int               i_blkbits;            /*    88     4 */
        long unsigned int          i_version;            /*    92     4 */
        blkcnt_t                   i_blocks;             /*    96     4 */
        short unsigned int         i_bytes;              /*   100     2 */
        umode_t                    i_mode;               /*   102     2 */
        spinlock_t                 i_lock;               /*   104    40 */
        struct mutex               i_mutex;              /*   144    76 */
        struct rw_semaphore        i_alloc_sem;          /*   220    64 */
        struct inode_operations *  i_op;                 /*   284     4 */
        const struct file_operations  * i_fop;           /*   288     4 */
        struct super_block *       i_sb;                 /*   292     4 */
        struct file_lock *         i_flock;              /*   296     4 */
        struct address_space *     i_mapping;            /*   300     4 */
        struct address_space       i_data;               /*   304   188 */
        struct list_head           i_devices;            /*   492     8 */
        union                      ;                     /*   500     4 */
        int                        i_cindex;             /*   504     4 */
        __u32                      i_generation;         /*   508     4 */
        /* ---------- cacheline 8 boundary ---------- */
        long unsigned int          i_dnotify_mask;       /*   512     4 */
        struct dnotify_struct *    i_dnotify;            /*   516     4 */
        struct list_head           inotify_watches;      /*   520     8 */
        struct mutex               inotify_mutex;        /*   528    76 */
        long unsigned int          i_state;              /*   604     4 */
        long unsigned int          dirtied_when;         /*   608     4 */
        unsigned int               i_flags;              /*   612     4 */
        atomic_t                   i_writecount;         /*   616     4 */
        void *                     i_security;           /*   620     4 */
        void *                     i_private;            /*   624     4 */
}; /* size: 628 */

[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:43 -08:00
Ingo Molnar
2ee91f197c [PATCH] lockdep: show more details about self-test failures
Make the locking self-test failures (of 'FAILURE' type) easier to debug by
printing more information.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:43 -08:00
Stephane Eranian
0b71c8e76d [PATCH] remove useless carta_random32.h
Remove the carta_random32.h header file.  The carta_random32() function was
was put in and removed in favor of random32().  In the removal process, the
header file was forgotten.

Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:42 -08:00
Gautham R Shenoy
f7dff2b126 [PATCH] Handle per-subsystem mutexes for CONFIG_HOTPLUG_CPU not set
Provide a common interface for all the subsystems to lock and unlock their
per-subsystem hotcpu mutexes.

When CONFIG_HOTPLUG_CPU is not set, these operations would be no-ops.

[akpm@osdl.org: macros -> inlines]
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:41 -08:00
Eric Dumazet
83b7b44e1c [PATCH] fs: reorder some 'struct inode' fields to speedup i_size manipulations
On 32bits SMP platforms, 64bits i_size is protected by a seqcount
(i_size_seqcount).

When i_size is read or written, i_size_seqcount is read/written as well, so
it make sense to group these two fields together in the same cache line.

This patch moves i_size_seqcount next to i_size, and also moves i_version
to let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for
32bits platforms).

For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a
'long i_version' should not introduce a new hole because of padding.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:41 -08:00
Randy Dunlap
d9489fb606 [PATCH] kernel-doc: fix fusion and i2o docs
Correct lots of typos, kernel-doc warnings, & kernel-doc usage in fusion and
i2o drivers.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:40 -08:00
Adrian Bunk
c585646dd1 [PATCH] fs/lockd/host.c: make 2 functions static
Make the following needlessly global functions static:

 - nlm_lookup_host()
 - nsm_find()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:40 -08:00
Adrian Bunk
7ddae86095 [PATCH] make fs/jbd2/transaction.c:__kbd2_journal_temp_unlink_buffer() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:40 -08:00
Adrian Bunk
d394e122bc [PATCH] make fs/jbd/transaction.c:__journal_temp_unlink_buffer() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:40 -08:00
Adrian Bunk
d3228a887c [PATCH] make kernel/signal.c:kill_proc_info() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Adrian Bunk
ebe7e5fe4b [PATCH] remove kernel/lockdep.c:lockdep_internal
Remove the no longer used lockdep_internal().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Ingo Molnar
0231606785 [PATCH] hotplug CPU: clean up hotcpu_notifier() use
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

    text    data     bss     dec     hex filename
 1624412  728710 3674856 6027978  5bfaca vmlinux.before
 1624412  728710 3674856 6027978  5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Randy Dunlap
83df8db9e6 [PATCH] declare smp_call_function_single in generic code
smp_call_function_single() needs to be visible in non-SMP builds, to fix:

arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function 'smp_call_function_single'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Masami Hiramatsu
b4c6c34a53 [PATCH] kprobes: enable booster on the preemptible kernel
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer.  The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag.  If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.

However, the processing of this check routine takes a long time.  So, this
patch introduces the garbage collection mechanism of insn_slot.  It also
introduces the "dirty" flag to free_insn_slot because of efficiency.

The "clean" instruction slots (dirty flag is cleared) are released
immediately.  But the "dirty" slots which are used by boosted kprobes, are
marked as garbages.  collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.

Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Magnus Damm
386d9a7edd [PATCH] elf: Always define elf_addr_t in linux/elf.h
Define elf_addr_t in linux/elf.h.  The size of the type is determined using
ELF_CLASS.  This allows us to remove the defines that today are spread all
over .c and .h files.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
suzuki
651971cb72 [PATCH] Fix the size limit of compat space msgsize
Currently we allocate 64k space on the user stack and use it the msgbuf for
sys_{msgrcv,msgsnd} for compat and the results are later copied in user [
by copy_in_user].  This patch introduces helper routines for
sys_{msgrcv,msgsnd} as below:

do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with
the msqid and msgflg.

do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to
the buffer.  The mtype has to be copied back the user space msgbuf by the
caller.

These changes avoid the need to allocate the msgsize on the userspace (
thus removing the size limt ) and the overhead of an extra copy_in_user().

Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Thomas Gleixner
cfd1893477 [PATCH] ktime: Fix signed / unsigned mismatch in ktime_to_ns
The 32 bit implementation of ktime_to_ns returns unsigned value, while the
64 bit version correctly returns an signed value.  There is no current user
affected by this, but it has to be fixed, as ktime values can be negative.

Pointed-out-by: Helmut Duregger <Helmut.Duregger@student.uibk.ac.at>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:37 -08:00
Andrew Morton
0490366432 [PATCH] remove HASH_HIGHMEM
It has no users and it's doubtful that we'll need it again.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:37 -08:00
Peter Zijlstra
d5abe66917 [PATCH] debug: workqueue locking sanity
Workqueue functions should not leak locks, assert so, printing the
last function ran.

Use macros in lockdep.h to avoid include dependency pains.

[akpm@osdl.org: build fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Ingo Molnar
ece8a684c7 [PATCH] sleep profiling
Implement prof=sleep profiling.  TASK_UNINTERRUPTIBLE sleeps will be taken
as a profile hit, and every millisecond spent sleeping causes a profile-hit
for the call site that initiated the sleep.

Sample readprofile output on i386:

   306 ps2_sendbyte                               1.3973
   432 call_usermodehelper_keys                   1.9548
   484 ps2_command                                0.6453
   790 __driver_attach                            4.7879
  1593 msleep                                    44.2500
  3976 sync_buffer                               64.1290
  4076 do_lookup                                 12.4648
  8587 sync_page                                122.6714
 20820 total                                      0.0067

(NOTE: architectures need to check whether get_wchan() can be called from
deep within the wakeup path.)

akpm: we need to mark more functions __sched.  lock_sock(), msleep(), others..

akpm: the contention in do_lookup() is a surprise.  Presumably doing disk
reads for directory contents while holding i_mutex.

[akpm@osdl.org: various fixes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Peter Zijlstra
6cfd76a26d [PATCH] lockdep: name some old style locks
Name some of the remaning 'old_style_spin_init' locks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Andrew Morton
8984d137df [PATCH] ext4: uninline large functions
Saves nearly 4kbytes on x86.

Cc: Arnaldo Carvalho de Melo <acme@mandriva.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:35 -08:00
Andrew Morton
3a229b39eb [PATCH] ext3: uninline large functions
Saves nearly 4kbytes on x86.

Cc: Arnaldo Carvalho de Melo <acme@mandriva.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:35 -08:00
Paul B Schroeder
e0980dafa3 [PATCH] Exar quad port serial
This is on our "Envoy" boxes which we have, according to the documentation, an
"Exar ST16C554/554D Quad UART with 16-byte Fifo's".  The box also has two
other "on-board" serial ports and a modem chip.

The two on-board serial UARTs were being detected along with the first two
Exar UARTs.  The last two Exar UARTs were not showing up and neither was the
modem.

This patch was the only way I could the kernel to see beyond the standard four
serial ports and get all four of the Exar UARTs to show up.

[akpm@osdl.org: build fix]
Signed-off-by:  Paul B Schroeder <pschroeder@uplogix.com>
Cc: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:35 -08:00
Alexey Dobriyan
9774a1f54f [PATCH] Compile-time check re world-writeable module params
One of the mistakes a module_param() user can make is to supply default
value of module parameter as the last argument.  module_param() accepts
permissions instead.  If default value is, say, 3 (-------wx), parameter
becomes world-writeable.

So far, the only remedy was to apply grep(1) and read drivers submitted
to -mm. BTDT.

With this patch applied, compiler will finally do some job.

*) bounds checking on permissions
*) world-writeable bit checking on permissions
*) compile breakage if checks trigger

First version of this check (only "& 2" part) directly caught 4 out of 7
places during my last grep.

    Subject: Neverending module_param() bugs
    [X] drivers/acpi/sbs.c:101:module_param(capacity_mode, int, CAPACITY_UNIT);
    [X] drivers/acpi/sbs.c:102:module_param(update_mode, int, UPDATE_MODE);
    [ ] drivers/acpi/sbs.c:103:module_param(update_info_mode, int, UPDATE_INFO_MODE);
    [ ] drivers/acpi/sbs.c:104:module_param(update_time, int, UPDATE_TIME);
    [ ] drivers/acpi/sbs.c:105:module_param(update_time2, int, UPDATE_TIME2);
    [X] drivers/char/watchdog/sbc8360.c:203:module_param(timeout, int, 27);
    [X] drivers/media/video/tuner-simple.c:13:module_param(offset, int, 0666);

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:34 -08:00
Oleg Nesterov
34ec12349c [PATCH] taskstats: cleanup ->signal->stats allocation
Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
public interface.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:34 -08:00
Oleg Nesterov
115085ea07 [PATCH] taskstats: cleanup do_exit() path
do_exit:
	taskstats_exit_alloc()
	...
	taskstats_exit_send()
	taskstats_exit_free()

I think this is not good, let it be a single function exported to the core
kernel, taskstats_exit(), which does alloc + send + free itself.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:34 -08:00
Andrew Morton
20aa7b21b1 [PATCH] probe_kernel_address() needs to do set_fs()
probe_kernel_address() purports to be generic, only it forgot to select
KERNEL_DS, so it presently won't work right on all architectures.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:34 -08:00
Ryan Underwood
c140e11001 [PATCH] parport_pc: Add support for OX16PCI952 parallel port
Add support for the parallel port (implemented as separate PCI function) on
the Oxford Semiconductor OX16PCI952.

Signed-off-by: Ryan Underwood <nemesis@icequake.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:34 -08:00
Jan Engelhardt
5ec68b2e31 [PATCH] pull in necessary header files for cdev.h
linux/cdev.h uses struct kobject and other structs and should therefore
include them.  Currently, a module either needs to add the missing includes
itself, or, in case a module includes other headers already, needs to put
<linux/cdev.h> last, which goes against a alphabetically-sorted include
list.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:32 -08:00
Ingo Molnar
e59e2ae2c2 [PATCH] SysRq-X: show blocked tasks
Add SysRq-X support: show blocked (TASK_UNINTERRUPTIBLE) tasks only.

Useful for debugging IO stalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:32 -08:00
Miklos Szeredi
0ec7ca41f6 [PATCH] fuse: add DESTROY operation
Add a DESTROY operation for block device based filesystems.  With the help of
this operation, such a filesystem can flush dirty data to the device
synchronously before the umount returns.

This is needed in situations where the filesystem is assumed to be clean
immediately after unmount (e.g.  ejecting removable media).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:32 -08:00
Miklos Szeredi
b2d2272fae [PATCH] fuse: add bmap support
Add support for the BMAP operation for block device based filesystems.  This
is needed to support swap-files and lilo.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:32 -08:00
Miklos Szeredi
e9168c189f [PATCH] fuse: update userspace interface to version 7.8
Add a flag to the RELEASE message which specifies that a FLUSH operation
should be performed as well.  This interface update is needed for the FreeBSD
port, and doesn't actually touch the Linux implementation at all.

Also rename the unused 'flush_flags' in the FLUSH message to 'unused'.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:31 -08:00
Jan Engelhardt
48ed214d10 [PATCH] constify inode accessors
Change the signature of i_size_read(), IMINOR() and IMAJOR() because they,
or the functions they call, will never modify the argument.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:31 -08:00
Peter Korsgaard
238b8721a5 [PATCH] serial uartlite driver
Add a driver for the Xilinx uartlite serial controller used in boards with
the PPC405 core in the Xilinx V2P/V4 fpgas.

The hardware is very simple (baudrate/start/stopbits fixed and no break
support).  See the datasheet for details:

	http://www.xilinx.com/bvdocs/ipcenter/data_sheet/opb_uartlite.pdf

See http://thread.gmane.org/gmane.linux.serial/1237/ for the email thread.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:30 -08:00
Mike Miller
799202cbd0 [PATCH] cciss: add support for 1024 logical volumes
Add the support for a large number of logical volumes.  We will soon have
hardware that support up to 1024 logical volumes.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:30 -08:00
Rafael J. Wysocki
341a595850 [PATCH] Support for freezeable workqueues
Make it possible to create a workqueue the worker thread of which will be
frozen during suspend, along with other kernel threads.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:29 -08:00
Rafael J. Wysocki
a9b6f562f1 [PATCH] swsusp: Untangle thaw_processes
Move the loop from thaw_processes() to a separate function and call it
independently for kernel threads and user space processes so that the order
of thawing tasks is clearly visible.

Drop thaw_kernel_threads() which is never used.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:28 -08:00
Nigel Cunningham
ff39593ad0 [PATCH] swsusp: thaw userspace and kernel space separately
Modify process thawing so that we can thaw kernel space without thawing
userspace, and thaw kernelspace first.  This will be useful in later
patches, where I intend to get swsusp thawing kernel threads only before
seeking to free memory.

Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:28 -08:00
Nigel Cunningham
7dfb71030f [PATCH] Add include/linux/freezer.h and move definitions from sched.h
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Rafael J. Wysocki
8357376d3d [PATCH] swsusp: Improve handling of highmem
Currently swsusp saves the contents of highmem pages by copying them to the
normal zone which is quite inefficient (eg.  it requires two normal pages
to be used for saving one highmem page).  This may be improved by using
highmem for saving the contents of saveable highmem pages.

Namely, during the suspend phase of the suspend-resume cycle we try to
allocate as many free highmem pages as there are saveable highmem pages.
If there are not enough highmem image pages to store the contents of all of
the saveable highmem pages, some of them will be stored in the "normal"
memory.  Next, we allocate as many free "normal" pages as needed to store
the (remaining) image data.  We use a memory bitmap to mark the allocated
free pages (ie.  highmem as well as "normal" image pages).

Now, we use another memory bitmap to mark all of the saveable pages
(highmem as well as "normal") and the contents of the saveable pages are
copied into the image pages.  Then, the second bitmap is used to save the
pfns corresponding to the saveable pages and the first one is used to save
their data.

During the resume phase the pfns of the pages that were saveable during the
suspend are loaded from the image and used to mark the "unsafe" page
frames.  Next, we try to allocate as many free highmem page frames as to
load all of the image data that had been in the highmem before the suspend
and we allocate so many free "normal" page frames that the total number of
allocated free pages (highmem and "normal") is equal to the size of the
image.  While doing this we have to make sure that there will be some extra
free "normal" and "safe" page frames for two lists of PBEs constructed
later.

Now, the image data are loaded, if possible, into their "original" page
frames.  The image data that cannot be written into their "original" page
frames are loaded into "safe" page frames and their "original" kernel
virtual addresses, as well as the addresses of the "safe" pages containing
their copies, are stored in one of two lists of PBEs.

One list of PBEs is for the copies of "normal" suspend pages (ie.  "normal"
pages that were saveable during the suspend) and it is used in the same way
as previously (ie.  by the architecture-dependent parts of swsusp).  The
other list of PBEs is for the copies of highmem suspend pages.  The pages
in this list are restored (in a reversible way) right before the
arch-dependent code is called.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Rafael J. Wysocki
3aef83e0ef [PATCH] swsusp: use block device offsets to identify swap locations
Make swsusp use block device offsets instead of swap offsets to identify swap
locations and make it use the same code paths for writing as well as for
reading data.

This allows us to use the same code for handling swap files and swap
partitions and to simplify the code, eg.  by dropping rw_swap_page_sync().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Rafael J. Wysocki
915bae9ebe [PATCH] swsusp: use partition device and offset to identify swap areas
The Linux kernel handles swap files almost in the same way as it handles swap
partitions and there are only two differences between these two types of swap
areas:

(1) swap files need not be contiguous,

(2) the header of a swap file is not in the first block of the partition
    that holds it.  From the swsusp's point of view (1) is not a problem,
    because it is already taken care of by the swap-handling code, but (2) has
    to be taken into consideration.

In principle the location of a swap file's header may be determined with the
help of appropriate filesystem driver.  Unfortunately, however, it requires
the filesystem holding the swap file to be mounted, and if this filesystem is
journaled, it cannot be mounted during a resume from disk.  For this reason we
need some other means by which swap areas can be identified.

For example, to identify a swap area we can use the partition that holds the
area and the offset from the beginning of this partition at which the swap
header is located.

The following patch allows swsusp to identify swap areas this way.  It changes
swap_type_of() so that it takes an additional argument representing an offset
of the swap header within the partition represented by its first argument.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Nick Piggin
7cf9c2c76c [PATCH] radix-tree: RCU lockless readside
Make radix tree lookups safe to be performed without locks.  Readers are
protected against nodes being deleted by using RCU based freeing.  Readers
are protected against new node insertion by using memory barriers to ensure
the node itself will be properly written before it is visible in the radix
tree.

Each radix tree node keeps a record of their height (above leaf nodes).
This height does not change after insertion -- when the radix tree is
extended, higher nodes are only inserted in the top.  So a lookup can take
the pointer to what is *now* the root node, and traverse down it even if
the tree is concurrently extended and this node becomes a subtree of a new
root.

"Direct" pointers (tree height of 0, where root->rnode points directly to
the data item) are handled by using the low bit of the pointer to signal
whether rnode is a direct pointer or a pointer to a radix tree node.

When a reader wants to traverse the next branch, they will take a copy of
the pointer.  This pointer will be either NULL (and the branch is empty) or
non-NULL (and will point to a valid node).

[akpm@osdl.org: cleanups]
[Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
[clameter@sgi.com: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Arnaldo Carvalho de Melo
36de643786 [PATCH] Save some bytes in struct mm_struct
Before:
[acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct

/* include2/asm/processor.h:542 */
struct mm_struct {
        struct vm_area_struct *    mmap;                 /*     0     4 */
        struct rb_root             mm_rb;                /*     4     4 */
        struct vm_area_struct *    mmap_cache;           /*     8     4 */
        long unsigned int          (*get_unmapped_area)(); /*    12     4 */
        void                       (*unmap_area)();      /*    16     4 */
        long unsigned int          mmap_base;            /*    20     4 */
        long unsigned int          task_size;            /*    24     4 */
        long unsigned int          cached_hole_size;     /*    28     4 */
        /* ---------- cacheline 1 boundary ---------- */
        long unsigned int          free_area_cache;      /*    32     4 */
        pgd_t *                    pgd;                  /*    36     4 */
        atomic_t                   mm_users;             /*    40     4 */
        atomic_t                   mm_count;             /*    44     4 */
        int                        map_count;            /*    48     4 */
        struct rw_semaphore        mmap_sem;             /*    52    64 */
        spinlock_t                 page_table_lock;      /*   116    40 */
        struct list_head           mmlist;               /*   156     8 */
        mm_counter_t               _file_rss;            /*   164     4 */
        mm_counter_t               _anon_rss;            /*   168     4 */
        long unsigned int          hiwater_rss;          /*   172     4 */
        long unsigned int          hiwater_vm;           /*   176     4 */
        long unsigned int          total_vm;             /*   180     4 */
        long unsigned int          locked_vm;            /*   184     4 */
        long unsigned int          shared_vm;            /*   188     4 */
        /* ---------- cacheline 6 boundary ---------- */
        long unsigned int          exec_vm;              /*   192     4 */
        long unsigned int          stack_vm;             /*   196     4 */
        long unsigned int          reserved_vm;          /*   200     4 */
        long unsigned int          def_flags;            /*   204     4 */
        long unsigned int          nr_ptes;              /*   208     4 */
        long unsigned int          start_code;           /*   212     4 */
        long unsigned int          end_code;             /*   216     4 */
        long unsigned int          start_data;           /*   220     4 */
        /* ---------- cacheline 7 boundary ---------- */
        long unsigned int          end_data;             /*   224     4 */
        long unsigned int          start_brk;            /*   228     4 */
        long unsigned int          brk;                  /*   232     4 */
        long unsigned int          start_stack;          /*   236     4 */
        long unsigned int          arg_start;            /*   240     4 */
        long unsigned int          arg_end;              /*   244     4 */
        long unsigned int          env_start;            /*   248     4 */
        long unsigned int          env_end;              /*   252     4 */
        /* ---------- cacheline 8 boundary ---------- */
        long unsigned int          saved_auxv[44];       /*   256   176 */
        unsigned int               dumpable:2;           /*   432     4 */
        cpumask_t                  cpu_vm_mask;          /*   436     4 */
        mm_context_t               context;              /*   440    68 */
        long unsigned int          swap_token_time;      /*   508     4 */
        /* ---------- cacheline 16 boundary ---------- */
        char                       recent_pagein;        /*   512     1 */

        /* XXX 3 bytes hole, try to pack */

        int                        core_waiters;         /*   516     4 */
        struct completion *        core_startup_done;    /*   520     4 */
        struct completion          core_done;            /*   524    52 */
        rwlock_t                   ioctx_list_lock;      /*   576    36 */
        struct kioctx *            ioctx_list;           /*   612     4 */
}; /* size: 616, sum members: 613, holes: 1, sum holes: 3, cachelines: 20,
      last cacheline: 8 bytes */

After:

[acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct
/* include2/asm/processor.h:542 */
struct mm_struct {
        struct vm_area_struct *    mmap;                 /*     0     4 */
        struct rb_root             mm_rb;                /*     4     4 */
        struct vm_area_struct *    mmap_cache;           /*     8     4 */
        long unsigned int          (*get_unmapped_area)(); /*    12     4 */
        void                       (*unmap_area)();      /*    16     4 */
        long unsigned int          mmap_base;            /*    20     4 */
        long unsigned int          task_size;            /*    24     4 */
        long unsigned int          cached_hole_size;     /*    28     4 */
        /* ---------- cacheline 1 boundary ---------- */
        long unsigned int          free_area_cache;      /*    32     4 */
        pgd_t *                    pgd;                  /*    36     4 */
        atomic_t                   mm_users;             /*    40     4 */
        atomic_t                   mm_count;             /*    44     4 */
        int                        map_count;            /*    48     4 */
        struct rw_semaphore        mmap_sem;             /*    52    64 */
        spinlock_t                 page_table_lock;      /*   116    40 */
        struct list_head           mmlist;               /*   156     8 */
        mm_counter_t               _file_rss;            /*   164     4 */
        mm_counter_t               _anon_rss;            /*   168     4 */
        long unsigned int          hiwater_rss;          /*   172     4 */
        long unsigned int          hiwater_vm;           /*   176     4 */
        long unsigned int          total_vm;             /*   180     4 */
        long unsigned int          locked_vm;            /*   184     4 */
        long unsigned int          shared_vm;            /*   188     4 */
        /* ---------- cacheline 6 boundary ---------- */
        long unsigned int          exec_vm;              /*   192     4 */
        long unsigned int          stack_vm;             /*   196     4 */
        long unsigned int          reserved_vm;          /*   200     4 */
        long unsigned int          def_flags;            /*   204     4 */
        long unsigned int          nr_ptes;              /*   208     4 */
        long unsigned int          start_code;           /*   212     4 */
        long unsigned int          end_code;             /*   216     4 */
        long unsigned int          start_data;           /*   220     4 */
        /* ---------- cacheline 7 boundary ---------- */
        long unsigned int          end_data;             /*   224     4 */
        long unsigned int          start_brk;            /*   228     4 */
        long unsigned int          brk;                  /*   232     4 */
        long unsigned int          start_stack;          /*   236     4 */
        long unsigned int          arg_start;            /*   240     4 */
        long unsigned int          arg_end;              /*   244     4 */
        long unsigned int          env_start;            /*   248     4 */
        long unsigned int          env_end;              /*   252     4 */
        /* ---------- cacheline 8 boundary ---------- */
        long unsigned int          saved_auxv[44];       /*   256   176 */
        cpumask_t                  cpu_vm_mask;          /*   432     4 */
        mm_context_t               context;              /*   436    68 */
        long unsigned int          swap_token_time;      /*   504     4 */
        char                       recent_pagein;        /*   508     1 */
        unsigned char              dumpable:2;           /*   509     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        core_waiters;         /*   512     4 */
        struct completion *        core_startup_done;    /*   516     4 */
        struct completion          core_done;            /*   520    52 */
        rwlock_t                   ioctx_list_lock;      /*   572    36 */
        struct kioctx *            ioctx_list;           /*   608     4 */
}; /* size: 612, sum members: 610, holes: 1, sum holes: 2, cachelines: 20,
      last cacheline: 4 bytes */

[acme@newtoy net-2.6.20]$ codiff -V /tmp/sched.o.before kernel/sched.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/kernel/sched.c:
  struct mm_struct |   -4
    dumpable:2;
     from: unsigned int          /*   432(30)    4(2) */
     to:   unsigned char         /*   509(6)     1(2) */
< SNIP other offset changes >
 1 struct changed
[acme@newtoy net-2.6.20]$

I'm not aware of any problem about using 2 byte wide bitfields where
previously a 4 byte wide one was, holler if there is any, I wouldn't be
surprised, bitfields are things from hell.

For the curious, 432(30) means: at offset 432 from the struct start, at
offset 30 in the bitfield (yeah, it comes backwards, hellish, huh?) ditto
for 509(6), while 4(2) and 1(2) means "struct field size(bitfield size)".

Now we have a 2 bytes hole and are using only 4 bytes of the last 32
bytes cacheline, any takers? :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Andy Whitcroft
33f2ef89f8 [PATCH] mm: make compound page destructor handling explicit
Currently we we use the lru head link of the second page of a compound page
to hold its destructor.  This was ok when it was purely an internal
implmentation detail.  However, hugetlbfs overrides this destructor
violating the layering.  Abstract this out as explicit calls, also
introduce a type for the callback function allowing them to be type
checked.  For each callback we pre-declare the function, causing a type
error on definition rather than on use elsewhere.

[akpm@osdl.org: cleanups]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Andrew Morton
1b1cec4bbc [PATCH] slab: deprecate kmem_cache_t
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter
441e143e95 [PATCH] slab: remove SLAB_DMA
SLAB_DMA is an alias of GFP_DMA. This is the last one so we
remove the leftover comment too.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
e94b176609 [PATCH] slab: remove SLAB_KERNEL
SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
f7267c0c07 [PATCH] slab: remove SLAB_USER
SLAB_USER is an alias of GFP_USER

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
e6b4f8da3a [PATCH] slab: remove SLAB_NOFS
SLAB_NOFS is an alias of GFP_NOFS.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
55acbda096 [PATCH] slab: remove SLAB_NOIO
SLAB_NOIO is an alias of GFP_NOIO with a single instance of use.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
a06d72c1dc [PATCH] slab: remove SLAB_LEVEL_MASK
SLAB_LEVEL_MASK is only used internally to the slab and is
and alias of GFP_LEVEL_MASK.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
6e0eaa4b05 [PATCH] slab: remove SLAB_NO_GROW
It is only used internally in the slab.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Andy Whitcroft
25ba77c141 [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int
NUMA node ids are passed as either int or unsigned int almost exclusivly
page_to_nid and zone_to_nid both return unsigned long.  This is a throw
back to when page_to_nid was a #define and was thus exposing the real type
of the page flags field.

In addition to fixing up the definitions of page_to_nid and zone_to_nid I
audited the users of these functions identifying the following incorrect
uses:

1) mm/page_alloc.c show_node() -- printk dumping the node id,
2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
   against numa_node_id() which returns an int from cpu_to_node(), and
3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
   uses bit_set which in generic code takes an int.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
ebe29738f3 [PATCH] Remove uses of kmem_cache_t from mm/* and include/linux/slab.h
Remove all uses of kmem_cache_t (the most were left in slab.h).  The
typedef for kmem_cache_t is then only necessary for other kernel
subsystems.  Add a comment to that effect.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
b86c089b83 [PATCH] Move names_cachep to linux/fs.h
The names_cachep is used for getname() and putname().  So lets put it into
fs.h near those two definitions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
aa362a83e7 [PATCH] Move fs_cachep to linux/fs_struct.h
fs_cachep is only used in kernel/exit.c and in kernel/fork.c.

It is used to store fs_struct items so it should be placed in linux/fs_struct.h

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
8b7d91eb7f [PATCH] Move filep_cachep to include/file.h
filp_cachep is only used in fs/file_table.c and in fs/dcache.c where
it is defined.

Move it to related definitions in linux/file.h.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Christoph Lameter
5d6538fcf2 [PATCH] Move files_cachep to include/file.h
Proper place is in file.h since files_cachep uses are rated to file I/O.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Lameter
c43692e85f [PATCH] Move vm_area_cachep to include/mm.h
vm_area_cachep is used to store vm_area_structs. So move to mm.h.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Lameter
298ec1e2ac [PATCH] Move sighand_cachep to include/signal.h
Move sighand_cachep definitioni to linux/signal.h

The sighand cache is only used in fs/exec.c and kernel/fork.c.  It is defined
in kernel/fork.c but only used in fs/exec.c.

The sighand_cachep is related to signal processing.  So add the definition to
signal.h.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Lameter
54cc211ce3 [PATCH] Remove bio_cachep from slab.h
Remove bio_cachep from slab.h - it no longer exists.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Hellwig
b30973f877 [PATCH] node-aware skb allocation
Node-aware allocation of skbs for the receive path.

Details:

  - __alloc_skb gets a new node argument and cals the node-aware
    slab functions with it.
  - netdev_alloc_skb passed the node number it gets from dev_to_node
    to it, everyone else passes -1 (any node)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Hellwig
873481367e [PATCH] add numa node information to struct device
For node-aware skb allocations we need information about the node in struct
net_device or struct device.  Davem suggested to put it into struct device
which this patch does.

In particular:

 - struct device gets a new int numa_node member if CONFIG_NUMA is set
 - there are two new helpers, dev_to_node and set_dev_node to
   transparently deal with the non-numa case
 - for pci devices the node-info is set to the value we get from
   pcibus_to_node.

Note that for some architectures pcibus_to_node doesn't work yet at the time
we call it currently.  This is harmless and will just mean skb allocations
aren't node-local on this architectures until the implementation of
pcibus_to_node on these architectures have been updated (There are patches for
x86 and x86_64 floating around)

[akpm@osdl.org: cleanup]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Hellwig
8b98c1699e [PATCH] leak tracking for kmalloc_node
We have variants of kmalloc and kmem_cache_alloc that leave leak tracking to
the caller.  This is used for subsystem-specific allocators like skb_alloc.

To make skb_alloc node-aware we need similar routines for the node-aware slab
allocator, which this patch adds.

Note that the code is rather ugly, but it mirrors the non-node-aware code 1:1:

[akpm@osdl.org: add module export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Peter Zijlstra
ad76fb6b5a [PATCH] mm: k{,um}map_atomic() vs in_atomic()
Make kmap_atomic/kunmap_atomic denote a pagefault disabled scope.  All non
trivial implementations already do this anyway.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Peter Zijlstra
a866374aec [PATCH] mm: pagefault_{disable,enable}()
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.

Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.

(NOTE: the extra barrier() in pagefault_disable might fix some holes on
       machines which have too many registers for their own good)

[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Chen, Kenneth W
39dde65c99 [PATCH] shared page table for hugetlb page
Following up with the work on shared page table done by Dave McCracken.  This
set of patch target shared page table for hugetlb memory only.

The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments.  In the normal
page case, the amount of memory saved from process' page table is quite
significant.  For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.

With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio.  One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency.  These two effects contribute to higher
application performance.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Nick Piggin
cc10250907 [PATCH] mm: add arch_alloc_page
Add an arch_alloc_page to match arch_free_page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Ashwin Chaugule
7602bdf2fd [PATCH] new scheme to preempt swap token
The new swap token patches replace the current token traversal algo.  The old
algo had a crude timeout parameter that was used to handover the token from
one task to another.  This algo, transfers the token to the tasks that are in
need of the token.  The urgency for the token is based on the number of times
a task is required to swap-in pages.  Accordingly, the priority of a task is
incremented if it has been badly affected due to swap-outs.  To ensure that
the token doesnt bounce around rapidly, the token holders are given a priority
boost.  The priority of tasks is also decremented, if their rate of swap-in's
keeps reducing.  This way, the condition to check whether to pre-empt the swap
token, is a matter of comparing two task's priority fields.

[akpm@osdl.org: cleanups]
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Paul Jackson
7253f4ef04 [PATCH] memory page_alloc zonelist caching reorder structure
Rearrange the struct members in the 'struct zonelist_cache' structure, so
as to put the readonly (once initialized) z_to_n[] array first, where it
will come right after the zones[] array in struct zonelist.

This pretty much eliminates the chance that the two frequently written
elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap
times, will end up on the same cache line as the performance sensitive,
frequently read, never (after init) written zones[] array.

Keeping frequently written data off frequently read cache lines is good for
performance.

Thanks to Rohit Seth for the suggestion.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:20 -08:00
Paul Jackson
9276b1bc96 [PATCH] memory page_alloc zonelist caching speedup
Optimize the critical zonelist scanning for free pages in the kernel memory
allocator by caching the zones that were found to be full recently, and
skipping them.

Remembers the zones in a zonelist that were short of free memory in the
last second.  And it stashes a zone-to-node table in the zonelist struct,
to optimize that conversion (minimize its cache footprint.)

Recent changes:

    This differs in a significant way from a similar patch that I
    posted a week ago.  Now, instead of having a nodemask_t of
    recently full nodes, I have a bitmask of recently full zones.
    This solves a problem that last weeks patch had, which on
    systems with multiple zones per node (such as DMA zone) would
    take seeing any of these zones full as meaning that all zones
    on that node were full.

    Also I changed names - from "zonelist faster" to "zonelist cache",
    as that seemed to better convey what we're doing here - caching
    some of the key zonelist state (for faster access.)

    See below for some performance benchmark results.  After all that
    discussion with David on why I didn't need them, I went and got
    some ;).  I wanted to verify that I had not hurt the normal case
    of memory allocation noticeably.  At least for my one little
    microbenchmark, I found (1) the normal case wasn't affected, and
    (2) workloads that forced scanning across multiple nodes for
    memory improved up to 10% fewer System CPU cycles and lower
    elapsed clock time ('sys' and 'real').  Good.  See details, below.

    I didn't have the logic in get_page_from_freelist() for various
    full nodes and zone reclaim failures correct.  That should be
    fixed up now - notice the new goto labels zonelist_scan,
    this_zone_full, and try_next_zone, in get_page_from_freelist().

There are two reasons I persued this alternative, over some earlier
proposals that would have focused on optimizing the fake numa
emulation case by caching the last useful zone:

 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
    have seen real customer loads where the cost to scan the zonelist
    was a problem, due to many nodes being full of memory before
    we got to a node we could use.  Or at least, I think we have.
    This was related to me by another engineer, based on experiences
    from some time past.  So this is not guaranteed.  Most likely, though.

    The following approach should help such real numa systems just as
    much as it helps fake numa systems, or any combination thereof.

 2) The effort to distinguish fake from real numa, using node_distance,
    so that we could cache a fake numa node and optimize choosing
    it over equivalent distance fake nodes, while continuing to
    properly scan all real nodes in distance order, was going to
    require a nasty blob of zonelist and node distance munging.

    The following approach has no new dependency on node distances or
    zone sorting.

See comment in the patch below for a description of what it actually does.

Technical details of note (or controversy):

 - See the use of "zlc_active" and "did_zlc_setup" below, to delay
   adding any work for this new mechanism until we've looked at the
   first zone in zonelist.  I figured the odds of the first zone
   having the memory we needed were high enough that we should just
   look there, first, then get fancy only if we need to keep looking.

 - Some odd hackery was needed to add items to struct zonelist, while
   not tripping up the custom zonelists built by the mm/mempolicy.c
   code for MPOL_BIND.  My usual wordy comments below explain this.
   Search for "MPOL_BIND".

 - Some per-node data in the struct zonelist is now modified frequently,
   with no locking.  Multiple CPU cores on a node could hit and mangle
   this data.  The theory is that this is just performance hint data,
   and the memory allocator will work just fine despite any such mangling.
   The fields at risk are the struct 'zonelist_cache' fields 'fullzones'
   (a bitmask) and 'last_full_zap' (unsigned long jiffies).  It should
   all be self correcting after at most a one second delay.

 - This still does a linear scan of the same lengths as before.  All
   I've optimized is making the scan faster, not algorithmically
   shorter.  It is now able to scan a compact array of 'unsigned
   short' in the case of many full nodes, so one cache line should
   cover quite a few nodes, rather than each node hitting another
   one or two new and distinct cache lines.

 - If both Andi and Nick don't find this too complicated, I will be
   (pleasantly) flabbergasted.

 - I removed the comment claiming we only use one cachline's worth of
   zonelist.  We seem, at least in the fake numa case, to have put the
   lie to that claim.

 - I pay no attention to the various watermarks and such in this performance
   hint.  A node could be marked full for one watermark, and then skipped
   over when searching for a page using a different watermark.  I think
   that's actually quite ok, as it will tend to slightly increase the
   spreading of memory over other nodes, away from a memory stressed node.

===============

Performance - some benchmark results and analysis:

This benchmark runs a memory hog program that uses multiple
threads to touch alot of memory as quickly as it can.

Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of
the total 96 GBytes on the system, and using 1, 19, 37, or 55
threads (on a 56 CPU system.)  System, user and real (elapsed)
timings were recorded for each run, shown in units of seconds,
in the table below.

Two kernels were tested - 2.6.18-mm3 and the same kernel with
this zonelist caching patch added.  The table also shows the
percentage improvement the zonelist caching sys time is over
(lower than) the stock *-mm kernel.

      number     2.6.18-mm3	   zonelist-cache    delta (< 0 good)	percent
 GBs    N  	------------	   --------------    ----------------	systime
 mem threads   sys user  real	  sys  user  real     sys  user  real	 better
  12	 1     153   24   177	  151	 24   176      -2     0    -1	   1%
  12	19	99   22     8	   99	 22	8	0     0     0	   0%
  12	37     111   25     6	  112	 25	6	1     0     0	  -0%
  12	55     115   25     5	  110	 23	5      -5    -2     0	   4%
  38	 1     502   74   576	  497	 73   570      -5    -1    -6	   0%
  38	19     426   78    48	  373	 76    39     -53    -2    -9	  12%
  38	37     544   83    36	  547	 82    36	3    -1     0	  -0%
  38	55     501   77    23	  511	 80    24      10     3     1	  -1%
  64	 1     917  125  1042	  890	124  1014     -27    -1   -28	   2%
  64	19    1118  138   119	  965	141   103    -153     3   -16	  13%
  64	37    1202  151    94	 1136	150    81     -66    -1   -13	   5%
  64	55    1118  141    61	 1072	140    58     -46    -1    -3	   4%
  90	 1    1342  177  1519	 1275	174  1450     -67    -3   -69	   4%
  90	19    2392  199   192	 2116	189   176    -276   -10   -16	  11%
  90	37    3313  238   175	 2972	225   145    -341   -13   -30	  10%
  90	55    1948  210   104	 1843	213   100    -105     3    -4	   5%

Notes:
 1) This test ran a memory hog program that started a specified number N of
    threads, and had each thread allocate and touch 1/N'th of
    the total memory to be used in the test run in a single loop,
    writing a constant word to memory, one store every 4096 bytes.
    Watching this test during some earlier trial runs, I would see
    each of these threads sit down on one CPU and stay there, for
    the remainder of the pass, a different CPU for each thread.

 2) The 'real' column is not comparable to the 'sys' or 'user' columns.
    The 'real' column is seconds wall clock time elapsed, from beginning
    to end of that test pass.  The 'sys' and 'user' columns are total
    CPU seconds spent on that test pass.  For a 19 thread test run,
    for example, the sum of 'sys' and 'user' could be up to 19 times the
    number of 'real' elapsed wall clock seconds.

 3) Tests were run on a fresh, single-user boot, to minimize the amount
    of memory already in use at the start of the test, and to minimize
    the amount of background activity that might interfere.

 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM.

 5) Notice that the 'real' time gets large for the single thread runs, even
    though the measured 'sys' and 'user' times are modest.  I'm not sure what
    that means - probably something to do with it being slow for one thread to
    be accessing memory along ways away.  Perhaps the fake numa system, running
    ostensibly the same workload, would not show this substantial degradation
    of 'real' time for one thread on many nodes -- lets hope not.

 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs)
    ran quite efficiently, as one might expect.  Each pair of threads needed
    to allocate and touch the memory on the node the two threads shared, a
    pleasantly parallizable workload.

 7) The intermediate thread count passes, when asking for alot of memory forcing
    them to go to a few neighboring nodes, improved the most with this zonelist
    caching patch.

Conclusions:
 * This zonelist cache patch probably makes little difference one way or the
   other for most workloads on real numa hardware, if those workloads avoid
   heavy off node allocations.
 * For memory intensive workloads requiring substantial off-node allocations
   on real numa hardware, this patch improves both kernel and elapsed timings
   up to ten per-cent.
 * For fake numa systems, I'm optimistic, but will have to leave that up to
   Rohit Seth to actually test (once I get him a 2.6.18 backport.)

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: David Rientjes <rientjes@cs.washington.edu>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:20 -08:00
Christoph Lameter
89689ae7f9 [PATCH] Get rid of zone_table[]
The zone table is mostly not needed.  If we have a node in the page flags
then we can get to the zone via NODE_DATA() which is much more likely to be
already in the cpu cache.

In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
access an exact replica of zonetable in the node_zones field.  In all of
the above cases there will be no need at all for the zone table.

The only remaining case is if in a NUMA system the node numbers do not fit
into the page flags.  In that case we make sparse generate a table that
maps sections to nodes and use that table to to figure out the node number.
 This table is sized to fit in a single cache line for the known 32 bit
NUMA platform which makes it very likely that the information can be
obtained without a cache miss.

For sparsemem the zone table seems to be have been fairly large based on
the maximum possible number of sections and the number of zones per node.
There is some memory saving by removing zone_table.  The main benefit is to
reduce the cache foootprint of the VM from the frequent lookups of zones.
Plus it simplifies the page allocator.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:20 -08:00