Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is worth
changing it in order to report a proper status (success or error) instead
of returning void.
[randy.dunlap@oracle.com: Fix inline kobject functions]
Cc: Mauricio Lin <mauriciolin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since commit 368c73d4f6 the kernel will try
to update the non-writeable BAR registers 0..3 of PIIX4 IDE adapters if
pci_assign_unassigned_resources() is used to do full resource assignment of
the bus. This fails because in the PIIX4 these BAR registers have
implicitly assumed values and read back as zero; it used to work because
the kernel used to just write zero to that register the read back value did
match what was written.
The fix is a new resource flag IORESOURCE_PCI_FIXED used to mark a resource
as non-movable. This will also be useful to keep other import system
resources from being moved around - for example system consoles on PCI
busses.
[akpm@osdl.org: cleanup]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I don't see any good reason for exporting device IDs to userspace.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is designed to fix:
- Disk eating corruptor on KT7 after resume from RAM
- VIA IRQ handling
- VIA fixups for bus lockups after resume from RAM
The core of this is to add a table of resume fixups run at resume time.
We need to do this for a variety of boards and features, but particularly
we need to do this to get various critical VIA fixups done on resume.
The second part of the problem is to handle VIA IRQ number rules which
are a bit odd and need special handling for PIC interrupts. Various
patches broke various boxes and while this one may not be perfect
(hopefully it is) it ensures the workaround is applied to the right
devices only.
From: Jean Delvare <khali@linux-fr.org>
Now that PCI quirks are replayed on software resume, we can safely
re-enable the Asus SMBus unhiding quirk even when software suspend support
is enabled.
[akpm@osdl.org: fix const warning]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a few #defines for grabbing and working with the address fields
in a HT_CAPTYPE_MSI_MAPPING capability. All from the HT spec v3.00.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are already several places in the kernel that want to search a PCI
device for a given Hypertransport capability. Although this is possible
using pci_find_capability() etc., it makes sense to encapsulate that
logic in a helper - pci_find_ht_capability().
To cater for searching exhaustively for a capability, we also provide
pci_find_next_ht_capability().
We also need to cater for the fact that the HT capability fields may be
either 3 or 5 bits wide. pci_find_ht_capability() deals with this for you,
but callers using the #defines directly must handle that themselves.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This works like pci_dev_present but instead of returning boolean returns
the matching pci_device_id entry. This makes it much more useful. Code
bloat is basically nil as the old boolean function is rewritten in terms of
the new one.
This will be used by the updated VIA PCI quirks for one
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pci: add class codes for Wireless RF controllers
Add PCI codes to include/linux/pci_ids.h for RF controllers; first
batch of these devices seem to be the Ultra-Wide-Band and Wireless USB
controllers (WHCI spec).
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently we allow any merge, even if the io originates from different
processes. This can cause really bad starvation and unfairness, if those
ios happen to be synchronous (reads or direct writes).
So add a allow_merge hook to the io scheduler ops, so an io scheduler can
help decide whether a bio/process combination may be merged with an
existing request.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch set adds generic abstract layer support for acpi video driver to
have generic user interface to control backlight and output switch control by
leveraging the existing backlight sysfs class driver, and by adding a new
video output sysfs class driver.
This patch:
Add dev argument for backlight_device_register to link the class device to
real device object. The platform specific driver should find a way to get the
real device object for their video device.
[akpm@osdl.org: build fix]
[akpm@osdl.org: fix msi-laptop.c]
Signed-off-by: Luming Yu <Luming.yu@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
[PATCH] Generic HID layer - update MAINTAINERS
input/hid: Supporting more keys from the HUT Consumer Page
[PATCH] Generic HID layer - build: USB_HID should select HID
The blk_rq_unmap_user() API is not very nice. It expects the caller to
know that rq->bio has to be reset to the original bio, and it will
silently do nothing if that is not done. Instead make it explicit that
we need to pass in the first bio, by expecting a bio argument.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We have full flexibility of merging parameters now, so we can remove the
hooks that define back/front/request merge strategies. Nobody is using
them anymore.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
On architectures where the atomicity of the bit operations is handled by
external means (ie a separate spinlock to protect concurrent accesses),
just doing a direct assignment on the workqueue data field (as done by
commit 4594bf159f) can cause the
assignment to be lost due to lack of serialization with the bitops on
the same word.
So we need to serialize the assignment with the locks on those
architectures (notably older ARM chips, PA-RISC and sparc32).
So rather than using an "unsigned long", let's use "atomic_long_t",
which already has a safe assignment operation (atomic_long_set()) on
such architectures.
This requires that the atomic operations use the same atomicity locks as
the bit operations do, but that is largely the case anyway. Sparc32
will probably need fixing.
Architectures (including modern ARM with LL/SC) that implement sane
atomic operations for SMP won't see any of this matter.
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: David Miller <davem@davemloft.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nobody uses it, but it was still wrong. Using the macro argument name
'work' meant that when we used 'work' as a member name, that would also
get replaced by the macro argument.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed. We can put it back when
it's stable and isn't likely to make warning or bug events worse.
In the meantime, enable frame pointers for more readable stack traces.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On USB keyboards lots of hot/internet keys are not working. This patch
adds support for a number of keys from the USB HID Usage Table
(http://www.usb.org/developers/devclass_docs/Hut1_12.pdf).
It also adds several new key codes. Most of them are used on real world
keyboards I know. I added some others (KEY_+ EDITOR, GRAPHICSEDITOR, DATABASE,
NEWS, VOICEMAIL, VIDEOPHONE) to avoid "holes".
I also added KEY_ZOOMRESET as it is possible to have a inet keyboard and a
remote control in parallel and it makes sense to have them behave differently.
Signed-off-by: Florian Festi <ffesti@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Remove the deferred hooks and all related code as scheduled in
feature-removal-schedule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
platform_device_add_data() makes a copy of the data that is given to it,
and thus the parameter can be const. This removes a warning when data
from get_property() on powerpc is handed to platform_device_add_data(),
as get_property() returns a const pointer.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
To allow a more effective copy_user_highpage() on certain architectures,
a vma argument is added to the function and cow_user_page() allowing
the implementation of these functions to check for the VM_EXEC bit.
The main part of this patch was originally written by Ralf Baechle;
Atushi Nemoto did the the debugging.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem:
1. There is a process containing two thread (T1 and T2). The
thread T1 calls fork(). Then dup_mmap() function called on T1 context.
static inline int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
...
flush_cache_mm(current->mm);
... /* A */
(write-protect all Copy-On-Write pages)
... /* B */
flush_tlb_mm(current->mm);
...
2. When preemption happens between A and B (or on SMP kernel), the
thread T2 can run and modify data on COW pages without page fault
(modified data will stay in cache).
3. Some time after fork() completed, the thread T2 may cause a page
fault by write-protect on a COW page.
4. Then data of the COW page will be copied to newly allocated
physical page (copy_cow_page()). It reads data via kernel mapping.
The kernel mapping can have different 'color' with user space
mapping of the thread T2 (dcache aliasing). Therefore
copy_cow_page() will copy stale data. Then the modified data in
cache will be lost.
In order to allow architecture code to deal with this problem allow
architecture code to override copy_user_highpage() by defining
__HAVE_ARCH_COPY_USER_HIGHPAGE in <asm/page.h>.
The main part of this patch was originally written by Ralf Baechle;
Atushi Nemoto did the the debugging.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
hwmon: Add MAINTAINERS entry for new ams driver
hwmon: New AMS hardware monitoring driver
hwmon/w83793: Add documentation and maintainer
hwmon: New Winbond W83793 hardware monitoring driver
hwmon: Update Rudolf Marek's e-mail address
hwmon/f71805f: Fix the device address decoding
hwmon/f71805f: Always create all fan inputs
hwmon/f71805f: Add support for the Fintek F71872F/FG chip
hwmon: New PC87427 hardware monitoring driver
hwmon/it87: Remove the SMBus interface support
hwmon/hdaps: Update the list of supported devices
hwmon/hdaps: Move the DMI detection data to .data
hwmon/pc87360: Autodetect the VRM version
hwmon/f71805f: Document the fan control features
hwmon/f71805f: Add support for "speed mode" fan speed control
hwmon/f71805f: Support DC fan speed control mode
hwmon/f71805f: Let the user adjust the PWM base frequency
hwmon/f71805f: Add manual fan speed control
hwmon/f71805f: Store the fan control registers
Run this:
#!/bin/sh
for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
echo "De-casting $f..."
perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
done
And then go through and reinstate those cases where code is casting pointers
to non-pointers.
And then drop a few hunks which conflicted with outstanding work.
Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define an op descriptor struct, use it to simplify nfsd4_proc_compound().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tuck away the replay_owner in the cstate while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the saved and current filehandles together into all the nfsd4 compound
operations.
I want a unified interface to these operations so we can just call them by
pointer and throw out the huge switch statement.
Also I'll eventually want a structure like this--that holds the state used
during compound processing--for deferral.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A comment here incorrectly states that "slack_space" is measured in words, not
bytes. Remove the comment, and adjust a variable name and a few comments to
clarify the situation.
This is pure cleanup; there should be no change in functionality.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts the tracking of the user space watchdog process from using
a pid_t to use struct pid. This makes us safe from pid wrap around issues and
prepares the way for the pid namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
smbfs keeps track of the user space server process in conn_pid. This converts
that track to use a struct pid instead of pid_t. This keeps us safe from pid
wrap around issues and prepares the way for the pid namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently this driver tracks user space clients it should send signals to. In
the presenct of file descriptor passing this is appears susceptible to
confusion from pid wrap around issues.
Replacing this with a struct pid prevents us from getting confused, and
prepares for a pid namespace implementation.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Annotated, all places switched to keeping status net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we print an assert due to scheduling-in-atomic bugs, and if lockdep
is enabled, then the IRQ tracing information of lockdep can be printed
to pinpoint the code location that disabled interrupts. This saved me
quite a bit of debugging time in cases where the backtrace did not
identify the irq-disabling site well enough.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most distributions enable sysrq support but set it to 0 by default. Add a
sysrq_always_enabled boot option to always-enable sysrq keys. Useful for
debugging - without having to modify the disribution's config files (which
might not be possible if the kernel is on a live CD, etc.).
Also, while at it, clean up the sysrq interfaces.
[bunk@stusta.de: make sysrq_always_enabled_setup() static]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement block device specific .direct_IO method instead of going through
generic direct_io_worker for block device.
direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
file system, where it needs to perform block allocation, hole detection,
extents file on write, and tons of other corner cases. The end result is
that it takes tons of CPU time to submit an I/O.
For block device, the block allocation is much simpler and a tight triple
loop can be written to iterate each iovec and each page within the iovec in
order to construct/prepare bio structure and then subsequently submit it to
the block layer. This significantly speeds up O_D on block device.
[akpm@osdl.org: small speedup]
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add "relatime" (relative atime) support. Relative atime only updates the
atime if the previous atime is older than the mtime or ctime. Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.
A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txt
Signed-off-by: Valerie Henson <val_henson@linux.intel.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, to tell a task that it should go to the refrigerator, we set the
PF_FREEZE flag for it and send a fake signal to it. Unfortunately there
are two SMP-related problems with this approach. First, a task running on
another CPU may be updating its flags while the freezer attempts to set
PF_FREEZE for it and this may leave the task's flags in an inconsistent
state. Second, there is a potential race between freeze_process() and
refrigerator() in which freeze_process() running on one CPU is reading a
task's PF_FREEZE flag while refrigerator() running on another CPU has just
set PF_FROZEN for the same task and attempts to reset PF_FREEZE for it. If
the refrigerator wins the race, freeze_process() will state that PF_FREEZE
hasn't been set for the task and will set it unnecessarily, so the task
will go to the refrigerator once again after it's been thawed.
To solve first of these problems we need to stop using PF_FREEZE to tell
tasks that they should go to the refrigerator. Instead, we can introduce a
special TIF_*** flag and use it for this purpose, since it is allowed to
change the other tasks' TIF_*** flags and there are special calls for it.
To avoid the freeze_process()-refrigerator() race we can make
freeze_process() to always check the task's PF_FROZEN flag after it's read
its "freeze" flag. We should also make sure that refrigerator() will
always reset the task's "freeze" flag after it's set PF_FROZEN for it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When some objects are allocated by one CPU but freed by another CPU we can
consume lot of cycles doing divides in obj_to_index().
(Typical load on a dual processor machine where network interrupts are
handled by one particular CPU (allocating skbufs), and the other CPU is
running the application (consuming and freeing skbufs))
Here on one production server (dual-core AMD Opteron 285), I noticed this
divide took 1.20 % of CPU_CLK_UNHALTED events in kernel. But Opteron are
quite modern cpus and the divide is much more expensive on oldest
architectures :
On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
cycle for a multiply.
Doing some math, we can use a reciprocal multiplication instead of a divide.
If we want to compute V = (A / B) (A and B being u32 quantities)
we can instead use :
V = ((u64)A * RECIPROCAL(B)) >> 32 ;
where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B
Note :
I wrote pure C code for clarity. gcc output for i386 is not optimal but
acceptable :
mull 0x14(%ebx)
mov %edx,%eax // part of the >> 32
xor %edx,%edx // useless
mov %eax,(%esp) // could be avoided
mov %edx,0x4(%esp) // useless
mov (%esp),%ebx
[akpm@osdl.org: small cleanups]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:
cpuset_zone_allowed_hardwall()
cpuset_zone_allowed_softwall()
Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.
If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.
Unfortunately, this meant that users would end up with the softwall version
without thinking about it. Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.
The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)
The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)
This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.
If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.
This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines. It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.
For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.
Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More cleanups for slab.h
1. Remove tabs from weird locations as suggested by Pekka
2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
as suggested by Pekka.
3. Uses static inline for the fallback defs as also suggested by Pekka.
4. Make kmem_ptr_valid take a const * argument.
5. Separate the NUMA fallback definitions from the kmalloc_track fallback
definitions.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a response to an earlier discussion on linux-mm about splitting
slab.h components per allocator. Patch is against 2.6.19-git11. See
http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2
This patch cleans up the slab header definitions. We define the common
functions of slob and slab in slab.h and put the extra definitions needed
for slab's kmalloc implementations in <linux/slab_def.h>. In order to get
a greater set of common functions we add several empty functions to slob.c
and also rename slob's kmalloc to __kmalloc.
Slob does not need any special definitions since we introduce a fallback
case. If there is no need for a slab implementation to provide its own
kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
That is sufficient for SLOB.
Sort the function in slab.h according to their functionality. First the
functions operating on struct kmem_cache * then the kmalloc related
functions followed by special debug and fallback definitions.
Also redo a lot of comments.
Signed-off-by: Christoph Lameter <clameter@sgi.com>?
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fields of struct pipe_buf_operations have not a precise layout (ie not
optimized to fit cache lines nor reduce cache line ping pongs)
The bufs[] array is *large* and is placed near the beginning of the
structure, so all following fields have a large offset. This is
unfortunate because many archs have smaller instructions when using small
offsets relative to a base register. On x86 for example, 7 bits offsets
have smaller instruction lengths.
Moving bufs[] at the end of pipe_buf_operations permits all fields to have
small offsets, and reduce text size, and icache pressure.
# size vmlinux.pre vmlinux
text data bss dec hex filename
3268989 664356 492196 4425541 438745 vmlinux.pre
3268765 664356 492196 4425317 438665 vmlinux
So this patch reduces text size by 224 bytes on my x86_64 machine. Similar
results on ia32.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 373beb35cd.
No one is using this identifier yet. The purpose of this identifier is to
export nsproxy to user space which is wrong. nsproxy is an internal
implementation optimization, which should keep our fork times from getting
slower as we increase the number of global namespaces you don't have to
share.
Adding a global identifier like this is inappropriate because it makes
namespaces inherently non-recursive, greatly limiting what we can do with
them in the future.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- pipe/splice should use const pipe_buf_operations and file_operations
- struct pipe_inode_info has an unused field "start" : get rid of it.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
Fix inotify maintainers entry
Fix typo in new debug options.
Jon needs a new shift key.
fs: Convert kmalloc() + memset() to kzalloc() in fs/.
configfs.h: Remove dead macro definitions.
kconfig: Standardize "depends" -> "depends on" in Kconfig files
e100: replace kmalloc with kcalloc
um: replace kmalloc+memset with kzalloc
fix typo in net/ipv4/ip_fragment.c
include/linux/compiler.h: reject gcc 3 < gcc 3.2
Kconfig: fix spelling error in config KALLSYMS help text
Remove duplicate "have to" in comment
Fix small typo in drivers/serial/icom.c
Use consistent casing in help message
EXT{2,3,4}_FS: remove outdated part of the help text
Delete the __ATTR-related macro definitions since these are now
defined in include/linux/sysfs.h.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The kernel doesn't compile with GCC <3.2, do not allow it to succeed if GCC
3.0.x or 3.1.x are used.
Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Introduced in commit 7cc13edc13.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c: Fix OMAP clock prescaler to match the comment
i2c: Refactor a kfree in i2c-dev
i2c: Fix return value check in i2c-dev
i2c: Enable PEC on more i2c-i801 devices
i2c: Discard the i2c algo del_bus wrappers
i2c: New ARM Versatile/Realview bus driver
i2c: fix broken ds1337 initialization
i2c: i2c-i801 documentation update
i2c: Use the __ATTR macro where possible
i2c: Whitespace cleanups
i2c: Use put_user instead of copy_to_user where possible
i2c: New Atmel AT91 bus driver
i2c: Add support for nested i2c bus locking
i2c: Cleanups to the i2c-nforce2 bus driver
i2c: Add request/release_mem_region to i2c-ibm_iic bus driver
i2c: New Philips PNX bus driver
i2c: Delete the broken i2c-ite bus driver
i2c: Update the list of driver IDs
i2c: Fix documentation typos
This interface was useless as the LPC ISA-like interface is always
available, is faster, and is more reliable. This cuts the driver
size by some 20%.
This change is also required to later convert the it87 driver to a
platform driver, so that we can get rid of i2c-isa in a near future.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
seqlock_init() needs to use spin_lock_init() for dynamic locks, so that
lockdep is notified about the presence of a new lock.
(this is a fallout of the recent networking merge, which started using
the so-far unused seqlock_init() API.)
This fix solves the following lockdep-internal warning on current -git:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
__lock_acquire+0x10c/0x9f9
lock_acquire+0x56/0x72
_spin_lock+0x35/0x42
neigh_destroy+0x9d/0x12e
neigh_periodic_timer+0x10a/0x15c
run_timer_softirq+0x126/0x18e
__do_softirq+0x6b/0xe6
do_softirq+0x64/0xd2
ksoftirqd+0x82/0x138
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While working on bidi support at struct request level
I have found that blk_queue_activity_fn is actually never used.
The only user is in ide-probe.c with this code:
/* enable led activity for disk drives only */
if (drive->media == ide_disk && hwif->led_act)
blk_queue_activity_fn(q, hwif->led_act, drive);
And led_act is never initialized anywhere.
(Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18)
Unless it is all for future use off course.
(this patch is against linux-2.6-block.git as off 2006/12/4)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits)
[POWERPC] Generic BUG for powerpc
[PPC] Fix compile failure do to introduction of PHY_POLL
[POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
[POWERPC] Remove old dcr.S
[POWERPC] Fix SPU coredump code for max_fdset removal
[POWERPC] Fix irq routing on some 32-bit PowerMacs
[POWERPC] ps3: Add vuart support
[POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
[POWERPC] dont allow pSeries_probe to succeed without initialising MMU
[POWERPC] micro optimise pSeries_probe
[POWERPC] Add SPURR SPR to sysfs
[POWERPC] Add DSCR SPR to sysfs
[POWERPC] Fix 440SPe CPU table entry
[POWERPC] Add support for FP emulation for the e300c2 core
[POWERPC] of_device_register: propagate device_create_file return code
[POWERPC] Fix mmap of PCI resource with hack for X
[POWERPC] iSeries: head_64.o needs to depend on lparmap.s
[POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group
[POWERPC] Remove QE header files from lite5200.c
[POWERPC] of_platform_make_bus_id(): make `magic' int
...
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch
* resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
* supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
* simplifies the present overflow problems in calculations
Current sending rate X and the cached value X_recv of the receiver-estimated
sending rate are both scaled by 64 (2^6) in order to
* cope with low sending rates (minimally 1 byte/second)
* allow upgrading to use a packets-per-second implementation of CCID 3
* avoid calculation errors due to integer arithmetic cut-off
The patch implements a revised strategy from
http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html
The only difference with regard to that strategy is that t_ipi is already
used in the calculation of the nofeedback timeout, which saves one division.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
We should not initialize rootfs before all the core initializers have
run. So do it as a separate stage just before starting the regular
driver initializers.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary. It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)
That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba:
"[PATCH] Fix linux banner utsname information").
This just restores "linux_banner" as a static string, which should fix
the version finding. And /proc/version simply uses a different string.
To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.
Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PHY_POLL is defined in <linux/phy.h> include it in <linux/fsl_devices.h> so
board code will have it defined.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Remove extraneous whitespace from various i2c headers and core files,
like space-before-tab and whitespace at end of line.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
This patch adds the 'level' field into the i2c_adapter structure, which is
used to represent the 'logical' level of nesting for the purposes of
lockdep. This field is then used in the i2c_transfer() function, to
acquire the per-adapter bus_lock with correct nesting level.
Signed-off-by: Jiri Kosina <jikos@jikos.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
New I2C bus driver for Philips ARM boards (Philips IP3204 I2C IP
block). This I2C controller can be found on (at least) PNX010x,
PNX52xx and PNX4008 Philips boards.
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The rest of the ITE8172 support was already removed from MIPS tree.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
* A chip driver ID was assigned to the Radeon, while it is an adapter
so it needs an i2c adapter ID.
* The SAA7191 is a video decoder, not encoder.
* The icspll driver is dead, and will never be ported to Linux 2.6.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits)
V4L/DVB 4949b: Fix container_of pointer retreival
V4L/DVB (4949a): Fix INIT_WORK
V4L/DVB (4949): Cxusb: codingstyle cleanups
V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach
V4L/DVB (4947): Cx88: trivial cleanups
V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area
V4L/DVB (4945): Cx88: consolidate cx22702_config structs
V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach
V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners
V4L/DVB (4953): Usbvision minor fixes
V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP
V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations
V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting
V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach
V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations
V4L/DVB (4942): Whitespace cleanups
V4L/DVB (4937): Usbvision cleanup and code reorganization
V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal
V4L/DVB (4935): Added the capability of selecting fm gain by tuner
V4L/DVB (4934): Usbvision radio requires GainNormal at e register
...
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mostly changing alignment. Just some general cleanup.
[akpm@osdl.org: build fix]
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a round_jiffies() function as well as a round_jiffies_relative()
function. These functions round a jiffies value to the next whole second.
The primary purpose of this rounding is to cause all "we don't care exactly
when" timers to happen at the same jiffy.
This avoids multiple timers firing within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep CPU
sleep states and thus waste power.
The exact wakeup moment is skewed by the cpu number, to avoid all cpus from
waking up at the exact same time (and hitting the same lock/cachelines
there)
[akpm@osdl.org: fix variable type]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries. The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.
The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:
pagesize / 4
pagesize / 2
pagesize <- memory allocator switch point
pagesize * 2
pagesize * 4
...etc...
Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.
Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop. Even together, they will still be, by far, smaller
than the fdarray. The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An fdtable can either be embedded inside a files_struct or standalone (after
being expanded). When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.
Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded. We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).
In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.
Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a bypass-the-cache read fails, we simply try again through the cache. If
it fails again it will trigger normal recovery precedures.
update 1:
From: NeilBrown <neilb@suse.de>
1/
chunk_aligned_read and retry_aligned_read assume that
data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong data.
2/ The cloned bio is being used-after-free in raid5_align_endio
(to test BIO_UPTODATE).
3/ We forgot to add rdev->data_offset when submitting
a bio for aligned-read
4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
so we need to invalidate the segment counts.
5/ We don't de-reference the rdev when the read completes.
This means we need to record the rdev to so it is still
available in the end_io routine. Fortunately
bi_next in the original bio is unused at this point so
we can stuff it in there.
6/ We leak a cloned bio if the target rdev is not usable.
From: NeilBrown <neilb@suse.de>
update 2:
1/ When aligned requests fail (read error) they need to be retried
via the normal method (stripe cache). As we cannot be sure that
we can process a single read in one go (we may not be able to
allocate all the stripes needed) we store a bio-being-retried
and a list of bioes-that-still-need-to-be-retried.
When find a bio that needs to be retried, we should add it to
the list, not to single-bio...
2/ We were never incrementing 'scnt' when resubmitting failed
aligned requests.
[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ESB2 appears to emit spurious DMA interrupts when configured for native
mode and handling ATAPI devices. Stratus were able to pin this bug down and
produce a patch. This is a rework which applies the fixup only to the ESB2
(for now). We can apply it to other chips later if the same problem is found.
This code has been tested and confirmed to fix the problem on the tested
systems.
Signed-off-by: Alan Cox <alan@redhat.com>
(Most of the hard work done by Stratus however)
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove scheduler stats lb_stopbalance counter. This counter can be
calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to
create gazillion counters while we can derive the value.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently at a particular domain, each cpu in the sched group will do a
load balance at the frequency of balance_interval. More the cores and
threads, more the cpus will be in each sched group at SMP and NUMA domain.
And we endup spending quite a bit of time doing load balancing in those
domains.
Fix this by making only one cpu(first idle cpu or first cpu in the group if
all the cpus are busy) in the sched group do the load balance at that
particular sched domain and this load will slowly percolate down to the
other cpus with in that group(when they do load balancing at lower
domains).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Large sched domains can be very expensive to scan. Add an option SD_SERIALIZE
to the sched domain flags. If that flag is set then we make sure that no
other such domain is being balanced.
[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
softirq.
We calculate the earliest time for each layer of sched domains to be rescanned
(this is the rescan time for idle) and use the earliest of those to schedule
the softirq via a new field "next_balance" added to struct rq.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With SMT, if the logical processor is busy, load balance happens for every
8msec(min)-16msec(max). There is no need to do this often, as this is just
for fairness(to maintain uniform runqueue lengths) and default time slice
anyhow is 100msec.
Appended patch increases this interval to 64msec(min)-128msec(max) when the
logical processor is busy.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The present per-task IO accounting isn't very useful. It simply counts the
number of bytes passed into read() and write(). So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.
(David Wright had some comments on the applicability of the present logical IO accounting:
For billing purposes it is useless but for workload analysis it is very
useful
read_bytes/read_calls average read request size
write_bytes/write_calls average write request size
read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
write_bytes/write_blocks ie logical/physical guess since pdflush writes can
be missed
I often look for logical larger than physical to see filesystem cache
problems. And the bytes/cpusec can help find applications that are
dominating the cache and causing slow interactive response from page cache
contention.
I want to find the IO intensive applications and make sure they are doing
efficient IO. Thus the acctcms(sysV) or csacms command would give the high
IO commands).
This patchset adds new accounting which tries to be more accurate. We account
for three things:
reads:
attempt to count the number of bytes which this process really did cause
to be fetched from the storage layer. Done at the submit_bio() level, so it
is accurate for block-backed filesystems. I also attempt to wire up NFS and
CIFS.
writes:
attempt to count the number of bytes which this process caused to be sent
to the storage layer. This is done at page-dirtying time.
The big inaccuracy here is truncate. If a process writes 1MB to a file
and then deletes the file, it will in fact perform no writeout. But it will
have been accounted as having caused 1MB of write.
So...
cancelled_writes:
account the number of bytes which this process caused to not happen, by
truncating pagecache.
We _could_ just subtract this from the process's `write' accounting. But
that means that some processes would be reported to have done negative
amounts of write IO, which is silly.
So we just report the raw number and punt this decision up to userspace.
Now, we _could_ account for writes at the physical I/O level. But
- This would require that we track memory-dirtying tasks at the per-page
level (would require a new pointer in struct page).
- It would mean that IO statistics for a process are usually only available
long after that process has exitted. Which means that we probably cannot
communicate this info via taskstats.
This patch:
Wire up the kernel-private data structures and the accessor functions to
manipulate them.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are some kernel-only bits in the middle of <linux/futex.h> which
should be removed in what we export to userspace.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add rtc_merge_alarm(), which can be used by rtc drivers to turn a partially
specified alarm expiry (i.e. most significant fields set to -1, as with the
RTC_ALM_SET ioctl()) into a fully specified expiry.
If the most significant specified field is earlier than the current time, the
least significant unspecified field is incremented.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
freezer.h uses task_struct fields so it should include sched.h.
CC [M] fs/jfs/jfs_txnmgr.o
In file included from fs/jfs/jfs_txnmgr.c:49:
include/linux/freezer.h: In function 'frozen':
include/linux/freezer.h:9: error: dereferencing pointer to incomplete type
include/linux/freezer.h:9: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:9: error: (Each undeclared identifier is reported only once
include/linux/freezer.h:9: error: for each function it appears in.)
include/linux/freezer.h: In function 'freezing':
include/linux/freezer.h:17: error: dereferencing pointer to incomplete type
include/linux/freezer.h:17: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'freeze':
include/linux/freezer.h:26: error: dereferencing pointer to incomplete type
include/linux/freezer.h:26: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'do_not_freeze':
include/linux/freezer.h:34: error: dereferencing pointer to incomplete type
include/linux/freezer.h:34: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h: In function 'thaw_process':
include/linux/freezer.h:43: error: dereferencing pointer to incomplete type
include/linux/freezer.h:43: error: 'PF_FROZEN' undeclared (first use in this function)
include/linux/freezer.h:44: warning: implicit declaration of function 'wake_up_process'
include/linux/freezer.h: In function 'frozen_process':
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: dereferencing pointer to incomplete type
include/linux/freezer.h:55: error: 'PF_FREEZE' undeclared (first use in this function)
include/linux/freezer.h:55: error: 'PF_FROZEN' undeclared (first use in this function)
fs/jfs/jfs_txnmgr.c: In function 'freezing':
include/linux/freezer.h:18: warning: control reaches end of non-void function
make[2]: *** [fs/jfs/jfs_txnmgr.o] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a V4L2 driver for the OmniVision OV7670 camera.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
A driver for the Marvell M88ALP01 "CAFE" CMOS integrated camera
controller. This driver has been renamed "cafe_ccic" since my previous
patch set.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Two defines for V4L2, needed by the Cafe camera driver:
1) Add the RGB444 image format
2) Add the "init" internal command which is separate from "reset".
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
GLIBC uses them etc.
They are guarded by ifndef __KERNEL__ so nobody will start
accidently using them in the kernel again, it's just for
userspace.
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRMGRP_REPORT uses 0x10 which is a group that belongs
to events. The correct value is 0x20.
We should really be using xfrm_nlgroups going forward; it was tempting
to delete the definition of XFRMGRP_REPORT but it would break at
least iproute2.
Signed-off-by: J Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
hh_lock was converted from rwlock to seqlock by Stephen.
To have a 100% benefit of this change, I suggest to place read mostly fields
of hh_cache in a separate cache line, because hh_refcnt may be changed quite
frequently on some busy machines.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restore API compatibility due to bits moved from rtnetlink.h to
separate headers.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>