This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.
Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_multiport and ip6t_multiport to xt_multiport.
As a result, this addes support for inversion and port range match
to IPv6 packets.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now
a user program needs to specify IPPROTO_ESP as protocol to use esp match
with IPv6. This means that ip6tables requires '-p esp' like iptables.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the *_decap_state structures which were previously
used to share state between input/post_input. This is no longer
needed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the decap_state argument from the xfrm input hook.
Previously this function allowed the input hook to share state with
the post_input hook. The latter has since been removed.
The only purpose for it now is to check the encap type. However, it
is easier and better to move the encap type check to the generic
xfrm_rcv function. This allows us to get rid of the decap state
argument altogether.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3424/2: ixp23xx: fix uncompress.h for recent CRLF decompressor change
[ARM] 3434/1: pxa i2s amsl define
[ARM] 3425/1: xsc3: need to include pgtable-hwdef.h
[ARM] Allow un-muxed syscalls to be available for everyone
[ARM] 3420/1: Missing clobber in example code
[ARM] nommu: fixups for the exception vectors
[ARM] nommu: add nommu specific Kconfig and MMUEXT variable in Makefile
[ARM] nommu: start-up code
[ARM] nommu: MPU support in boot/compressed/head.S
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Avoid "u64 foo : 32;" for gcc3 vs. gcc4 compatibility
[IA64] Export cpu cache info by sysfs
I was grepping through the code and some `grep ganularity -R .` didn't
catch what I thought. Then looking closer I saw the term "granuality"
used in only four places (in comments) and granularity in many more
places describing the same idea. Some other facts:
dictionary.com does not know such a word
define:granuality on google is not found (and pages for granuality are
mostly related to patches to the kernel)
it has not been discussed as a term on LKML, AFAICS (=Can Search)
To be consistent, I think granularity should be used everywhere.
Signed-off-by: Kalin KOZHUHAROV <kalin@thinrope.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Turn some macros into inline functions and add proper type checking as
well as being more readable. Also a minor comment adjustment.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As announced, lookup_hash() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The monochrome->color expansion routine that handles bitmaps which have
(widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian.
This is caused by a bogus bit test in slow_imageblit().
Fix.
This patch may deserve to go to the stable tree. The code has already been
well tested in little-endian machines. It's only in big-endian where there is
uncertainty and Herbert confirmed that this is the correct way to go.
It should not introduce regressions.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Generalise the Corgi backlight driver by moving the default intensity and
limit mask settings into the platform specific data structure. This enables
the driver to support other Zaurus hardware, specifically the SL-6000x (Tosa)
model.
Also change the spinlock to a mutex (the spinlock is overkill).
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Backlight class attributes are currently easy to implement incorrectly.
Moving certain handling into the backlight core prevents this whilst at the
same time makes the drivers simpler and consistent. The following changes are
included:
The brightness attribute only sets and reads the brightness variable in the
backlight_properties structure.
The power attribute only sets and reads the power variable in the
backlight_properties structure.
Any framebuffer blanking events change a variable fb_blank in the
backlight_properties structure.
The backlight driver has only two functions to implement. One function is
called when any of the above properties change (to update the backlight
brightness), the second is called to return the current backlight brightness
value. A new attribute "actual_brightness" is added to return this brightness
as determined by the driver having combined all the above factors (and any
driver/device specific factors).
Additionally, the backlight core takes care of checking the maximum brightness
is not exceeded and of turning off the backlight before device removal.
The corgi backlight driver is updated to reflect these changes.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is very common to hash a dentry and then to call lookup. If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.
In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.
For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.
If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.
This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.
The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.
Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.
When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.
What that means is that the proc code that looks up a task and later
wants to sleep can now do:
rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();
The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.
Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.
Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.
The end result:
- get_task_struct is safe in an rcu context where we have just looked
up the task.
- put_task_struct() simplifies into its old pre rcu self.
This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just got nuked in mainline. Bring it back because Eric's patches use it.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The activated flag in task_struct is used to track different sleep types and
its usage is somewhat obfuscated. Convert the variable to an enum with more
descriptive names without altering the function.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.
Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.
In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.
Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.
On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The removal of the data field in the hrtimer structure enforces the
embedding of the timer into another data structure. nanosleep now uses a
private implementation of the most common used timer callback function
(simple task wakeup).
In order to avoid the reimplentation of such functionality all over the
place a generic hrtimer_sleeper functionality is created.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an LED trigger for IDE disk activity to the ide-disk driver.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for LED triggers to the LED subsystem. "Triggers" are events
which change the state of an LED. Two kinds of trigger are available, simple
ones which can be added to exising code with minimum disruption and complex
ones for implementing new or more complex functionality.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the foundations of a new LEDs subsystem. This patch adds a class which
presents LED devices within sysfs and allows their brightness to be
controlled.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the
current value of kmsg_redirect from the kernel so that it can save it and
restore it after resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
local_t's were defined to be unsigned. This increases confusion because
atomic_t's are signed. The patch goes through and changes all implementations
to use signed longs throughout.
Also, x86-64 was using 32-bit quantities for the value passed into local_add()
and local_sub(). Fixed.
All (actually, both) existing users have been audited.
(Also s/__inline__/inline/ in x86_64/local.h)
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Domsch noticed a startup race with the IPMI kernel thread, it was
possible (though extraordinarly unlikely) that a message could come in
before the upper layer was ready to handle it. This patch splits the
startup processing of an IPMI interface into two parts, one to get ready
and one to actually start the processes to receive messages from the
interface.
[akpm@osdl.org: cleanups]
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make baby-simple the code for /proc/devices. Based on the proven design
for /proc/interrupts.
This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:
# dd if=/proc/devices bs=1
Character devices:
1 mem
27+0 records in
27+0 records out
This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.
[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit a4a6198b80:
[PATCH] tvec_bases too large for per-cpu data
introduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If running on a host not supporting TLS (for instance 2.4) we should report
that cleanly to the user, instead of printing not comprehensible "error 5" for
that.
Additionally, i386 and x86_64 support different ranges for
user_desc->entry_number, and we must account for that; we couldn't pass
ourselves -1 because we need to override previously existing TLS descriptors
which glibc has possibly set, so test at startup the range to use.
x86 and x86_64 existing ranges are hardcoded.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement sys_[gs]et_thread_area and the corresponding ptrace operations for
UML. This is the main chunk, additional parts follow. This implementation is
now well tested and has run reliably for some time, and we've understood all
the previously existing problems.
Their implementation saves the new GDT content and then forwards the call to
the host when appropriate, i.e. immediately when the target process is
running or on context switch otherwise (i.e. on fork and on ptrace() calls).
In SKAS mode, we must switch registers on each context switch (because SKAS
does not switches tls_array together with current->mm).
Also, added get_cpu() locking; this has been done for SKAS mode, since TT does
not need it (it does not use smp_processor_id()).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ldt-{i386,x86_64}.h is made of two different parts - some code for parsing of
LDT descriptors, which is arch-dependant, and the code to handle uml_ldt_t (an
LDT block inside UML), which is mostly arch-independant (among x86 and x86_64,
at least).
Join the common part in a single file (ldt.h) and split the rest away
(host_ldt-{i386,x86_64}.h).
This is needed because processor.h, with next patches, will start including
the LDT descriptor parsing macros in host_ldt.h, but it can't include ldt.h
because it uses semaphores (and to define semaphores one must first include
processor.h!).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
misc sparse annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Porting the patch I posted for x86_64 to i386.
http://marc.theaimsgroup.com/?l=linux-kernel&m=114178139610707&w=2
o While using kdump, after a system crash when second kernel boots, timer
vector gets (0x31) locked and CPU does not see timer interrupts
travelling from IOAPIC to APIC. Currently it does not lead to boot
failure in second kernel as timer interrupts continues to come as ExtInt
through LAPIC directly, but fixing it is good in case some boards do not
support the other mode.
o After a system crash, it is not safe to service interrupts any more,
hence interrupts are disabled. This leads to pending interrupts at
LAPIC. LAPIC sends these interrupts to the CPU during early boot of
second kernel. Other pending interrupts are discarded saying unexpected
trap but timer interrupt is serviced and CPU does not issue an LAPIC EOI
because it think this interrupt came from i8259 and sends ack to 8259.
This leads to vector 0x31 locking as LAPIC does not clear respective ISR
and keeps on waiting for EOI.
o This patch issues extra EOI for the pending interrupts who have ISR set.
o Though today only timer seems to be the special case because in early
boot it thinks interrupts are coming from i8259 and uses
mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
probably doing it in generic manner for all vectors makes sense.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's been disabled since v2.1.88
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_trylock_page() is an odd interface in that it doesn't take a reference
like the others. Now that XFS no longer uses it, and its last remaining
caller actually wants an elevated refcount, opencode that callsite and
schedule find_trylock_page() for removal.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- chips/sharp.c: make two needlessly global functions static
- move some declarations to a header file where they belong to
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Apparently the pccard_iodyn_ops declaration has been forgotten, which
results in a compilation error for m8xx_pcmcia.c
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
gcc3 thinks that a 32-bit field of a u64 type is itself a u64, so
should be printed with "%ld". gcc4 thinks it needs just "%d".
Make both versions happy by avoiding this construct.
Signed-off-by: Tony Luck <tony.luck@intel.com>
- Fix possible race of referring the setup hook from the running PCM
- Fix memory leak in an error path of proc write
- Clean up the setup hook parser
Signed-off-by: Takashi Iwai <tiwai@suse.de>
- Clean up initialization and destruction of substream instance
Now snd_pcm_open_substream() alone does most initialization jobs.
Add pcm_release callback for cleaning up at snd_pcm_release_substream()
- Tidy up PCM oss code
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This ugly hack add support for Siemens MC45 PCMCIA GPRS card (which is
identical to Possio GCC, and which is offered by one of our local GPRS
providers). Card has unfortunate feature that after poweron oxcf950 chip
is fully powered and works, but attached MC45 modem is powered down :-(
There is a special sequence (which takes 1 sec :-( ) to poweron MC45 (and
after MC45 powers on, it takes more than 2 secs until firmware fully
boots...) which needs to be executed after all powerons.
I'm really not familiar with PCMCIA subsystem, so I have no idea whether I
should issue request_region() on rest of oxcf950 address range (0-7 is
UART, 8-F are special configuration registers), or how this should be
better integrated with PM system and so on - I just put it in same place
where another hack already lived...
Card uses 18.432MHz XTAL, so to get it to work you must add lines below to
the /etc/pcmcia/serial.opts.
case "$MANFID-$FUNCID-$PRODID_1-$PRODID_2-$PRODID_3-$PRODID_4" in
'030c,0003-2-GPRS-CARD--')
SERIAL_OPTS="baud_base 1152000"
;;
esac
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Instead of the two status values struct pcmcia_device->p_state and state,
use descriptive bitfields. Most value-checking in drivers was invalid, as
the core now only calls the ->remove() (a.k.a. detach) function in case the
attachement _and_ configuration was successful.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Remove the unused DEV_RELEASE_PENDING flag, and move the DEV_SUSPEND flag
into the p_dev structure, and make use of it at the core level.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
dev_link_t * and client_handle_t both mean struct pcmcai_device * by now.
Therefore, remove all such indirections.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Embed dev_link_t into struct pcmcia_device(), as they basically address the
same entity. The actual contents of dev_link_t will be cleaned up step by step.
This patch includes a bugfix from and signed-off-by Andrew Morton.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Rename pcmcia_device.state (which is used in very few places) to p_state
in order to avoid a namespace collision when moving the deprecated
dev_link_t into struct pcmcia_device
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
As we do not allow setting Vcc in the pcmcia core, and Vpp1 and
Vpp2 can only be set to the same value, a lot of code can be
streamlined.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Handle the _modifying_ operation sm91c92_cs requires in
pcmcia_modify_configuration, so that the only remaining users
of pcmcia_release_configuration() are within the pcmcia core
module.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
In all but one case, the suspend and resume functions of PCMCIA drivers
contain mostly of calls to pcmcia_release_configuration() and
pcmcia_request_configuration(). Therefore, move this code out of the
drivers and into the core.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Convert the remaining drivers which use pcmcia_release_io or
pcmcia_release_irq, and remove the EXPORT of these symbols.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
pcmcia_disable_device(struct pcmcia_device *p_dev) performs the necessary
cleanups upon device or driver removal: it calls the appropriate
pcmcia_release_* functions, and can replace (most) of the current drivers'
_release() functions.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This patch adds support for the Compact Flash controller integrated in
the Atmel AT91RM9200 processor.
Signed-off-by: Andrew Victor <andrew@sanpeople.com
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
BasePort, NumPorts and Attributes are or can be embedded in
struct resource, so remove them.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
If the kernel is configured to not include the deprecated PCMCIA ioctl,
some code doesn't need to be built.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Remove the compatibility wrappers, as they can (now) also be implemented
using macros. Please continue using these wrappers instead of new functions
until a new API has stabilized.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Use mutexes in the PCMICA core, as they suffice for what needs to be done.
Includes a bugfix from and Signed-off-by Andrew Morton.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Access the PCMCIA config_t struct (one per device function) using
a pointer in struct pcmcia_device, instead of looking them up in
an array.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Previously we added NET_IP_ALIGN so an architecture can override the
padding done to align headers. The next step is to allow the skb
headroom to be overridden.
We currently always reserve 16 bytes to grow into, meaning all DMAs
start 16 bytes into a cacheline. On ppc64 we really want DMA writes to
start on a cacheline boundary, so we increase that headroom to one
cacheline.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (24 commits)
[PARISC] Fix double free when removing HIL drivers
[PARISC] Add atomic_sub_and_test
[PARISC] Enabled some NLS modules in a500, b180 and c3000 defconfigs
[PARISC] Kill duplicated EXPORT_SYMBOL warnings
[PARISC] Move ioremap EXPORT_SYMBOL from parisc_ksyms.c
[PARISC] Make local_t use atomic_long_t
[PARISC] Update defconfigs
[PARISC] Add PREEMPT support
[PARISC] More useful readwrite lock helpers
[PARISC] Convert HIL drivers to use input_allocate_device
[PARISC] Fixup CONFIG_EISA a bit
[PARISC] getsockopt should be ENTRY_COMP
[PARISC] Remove obsolete CONFIG_DEBUG_IOREMAP
[PARISC] Temporary FIXME for ioremapping EISA regions
[PARISC] Enable ioremap functionality unconditionally
[PARISC] Fix stifb with IOREMAP and a 64-bit kernel
[PARISC] Add CONFIG_HPPA_IOREMAP to conditionally enable ioremap
[PARISC] Add STRICT_MM_TYPECHECKS
[PARISC] Fix IOREMAP with a 64-bit kernel
[PARISC] Add parisc implementation of flush_kernel_dcache_page()
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mad: RMPP support for additional classes
IB/mad: include GID/class when matching receives
IB/mthca: Fix section mismatch problems
IPoIB: Fix oops with raw sockets
IB/mthca: Fix check of size in SRQ creation
IB/srp: Fix unmapping of fake scatterlist
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] sata_mv: three bug fixes
[PATCH] libata: ata_dev_init_params() fixes
[PATCH] libata: Fix interesting use of "extern" and also some bracketing
[PATCH] libata: Simplex and other mode filtering logic
[PATCH] libata - ATA is both ATA and CFA
[PATCH] libata: Add ->set_mode hook for odd drivers
[PATCH] libata: BMDMA handling updates
[PATCH] libata: kill trailing whitespace
[PATCH] libata: add FIXME above ata_dev_xfermask()
[PATCH] libata: cosmetic changes in ata_bus_softreset()
[PATCH] libata: kill E.D.D.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ioremap() should prefer WB over UC
[IA64] Add __mca_table to the DISCARD list in gate.lds
[IA64] Move __mca_table out of the __init section
[IA64] simplify some condition checks in iosapic_check_gsi_range
[IA64] correct some messages and fixes some minor things
[IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode()
[IA64-SGI] sn_hwperf use of num_online_cpus()
[IA64] optimize flush_tlb_range on large numa box
[IA64] lazy_mmu_prot_update needs to be aware of huge pages
This enables the caller to migrate pages from one address space page
cache to another. In buzz word marketing, you can do zero-copy file
copies!
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix up some ISA/EISA stuff.
(Note: isa_ accessors have been removed from asm/io.h)
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Remove CONFIG_DEBUG_IOREMAP, it's now obsolete and won't work anyway.
Remove it from lib/KConfig since it was only available on parisc.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Enable CONFIG_HPPA_IOREMAP by default and remove all now unnecessary code.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Instead of making it a #define in asm/io.h, allow user to select
to turn on IOREMAP from the config menu.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
We need to do a little renaming of our original syntax because
of the difference in arguments.
Signed-off-by: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
This should now allow SG_IO and fuse to function correctly on our
platform.
Signed-off-by: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Add __mca_table to the DISCARD list for the gate.lds linker script to
avoid broken linker references when linking the final vmlinux file.
Also add comment to include/asm-ia64/asmmacros.h to avoid anyone else
hitting this problem in the future.
Credits to James Bottomley <James.Bottomley@SteelEye.com> for spotting
the DISCARD list in gate.lds.S
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add RMPP support for additional management classes that support it.
Also, validate RMPP is consistent with management class specified.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Patch from Lennert Buytenhek
Adapt ixp23xx uncompress.h to a081568d70.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Marc-Andre Hebert
The error concerns a bit mask define for the AMSL bit of the SACR1 register in the 2.6 kernel tree. The AMSL is bit 0 and it was defined as so in the 2.4 kernel tree but it is inccorrectly set as bit 1 (a reserved bit) in the 2.6 kernel tree.
Signed-off-by: Marc-Andre Hebert <marcandreh@humanware.ca>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It's been a while since the un-muxed socket and ipc syscalls were
introduced, so make the unistd.h number definitions visible for
non-EABI as well as EABI.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add a field to the host_set called 'flags' (was host_set_flags changed
to suit Jeff)
Add a simplex_claimed field so we can remember who owns the DMA channel
Add a ->mode_filter() hook to allow drivers to filter modes
Add docs for mode_filter and set_mode
Filter according to simplex state
Filter cable in core
This provides the needed framework to support all the mode rules found
in the PATA world. The simplex filter deals with 'to spec' simplex DMA
systems found in older chips. The cable filter avoids duplicating the
same rules in each chip driver with PATA. Finally the mode filter is
neccessary because drive/chip combinations have errata that forbid
certain modes with some drives or types of ATA object.
Drive speed setup remains per channel for now and the filters now use
the framework Tejun put into place which cleans them up a lot from the
older libata-pata patches.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some hardware doesn't want the usual mode setup logic running. This
allows the hardware driver to replace it for special cases in the least
invasive way possible.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is the minimal patch set to enable the current code to be used with
a controller following SFF (ie any PATA and early SATA controllers)
safely without crashes if there is no BMDMA area or if BMDMA is not
assigned by the BIOS for some reason.
Simplex status is recorded but not acted upon in this change, this isn't
a problem with the current drivers as none of them are for simplex
hardware. A following diff will deal with that.
The flags in the probe structure remain ->host_set_flags although Jeff
asked me to rename them, simply because the rename would break the usual
Linux rules that old code should break when there are changes. not
compile and run and then blow up/eat your computer/etc. Renaming this
later is a trivial exercise once a better name is chosen.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On a allyesconfig'ured kernel:
Size Uses Wasted Name and definition
===== ==== ====== ================================================
95 162 12075 netif_wake_queue include/linux/netdevice.h
129 86 9265 dev_kfree_skb_any include/linux/netdevice.h
127 56 5885 netif_device_attach include/linux/netdevice.h
73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h
46 60 1534 netif_device_detach include/linux/netdevice.h
119 16 1485 __netif_rx_schedule include/linux/netdevice.h
143 5 492 netif_rx_schedule include/linux/netdevice.h
81 7 366 netif_schedule include/linux/netdevice.h
netif_wake_queue is big because __netif_schedule is a big inline:
static inline void __netif_schedule(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
unsigned long flags;
struct softnet_data *sd;
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
dev->next_sched = sd->output_queue;
sd->output_queue = dev;
raise_softirq_irqoff(NET_TX_SOFTIRQ);
local_irq_restore(flags);
}
}
static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
__netif_schedule(dev);
}
By de-inlining __netif_schedule we are saving a lot of text
at each callsite of netif_wake_queue and netif_schedule.
__netif_rx_schedule is also big, and it makes more sense to keep
both of them out of line.
Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
instead... oh well.
netif_device_attach/detach are not hot paths, we can deinline them too.
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits)
[PATCH] powerpc: Remove oprofile spinlock backtrace code
[PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus
[PATCH] powerpc: Add oprofile calltrace support
[PATCH] for_each_possible_cpu: ppc
[PATCH] for_each_possible_cpu: powerpc
[PATCH] lock PTE before updating it in 440/BookE page fault handler
[PATCH] powerpc: Kill _machine and hard-coded platform numbers
ppc: Fix compile error in arch/ppc/lib/strcase.c
[PATCH] git-powerpc: WARN was a dumb idea
[PATCH] powerpc: a couple of trivial compile warning fixes
powerpc: remove OCP references
powerpc: Make uImage default build output for MPC8540 ADS
powerpc: move math-emu over to arch/powerpc
powerpc: use memparse() for mem= command line parsing
ppc: fix strncasecmp prototype
[PATCH] powerpc: make ISA floppies work again
[PATCH] powerpc: Fix some initcall return values
[PATCH] powerpc: Workaround for pSeries RTAS bug
[PATCH] spufs: fix __init/__exit annotations
[PATCH] powerpc: add hvc backend for rtas
...
Remove oprofile spinlock backtrace code now we have proper calltrace
support. Also make MMCRA sihv and sipr bits a variable since they may
change in future cpus. Finally, MMCRA should be a 64bit quantity.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add oprofile calltrace support to powerpc. Disable spinlock backtracing
now we can use calltrace info.
(Updated to work on both 32bit and 64bit by me).
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal(). This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit. It is also a
preparation for the further tref/pids rework.
This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.
We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.
Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__exit_signal() is private to release_task() now. I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to
copy_sighand().
This matches copy_signal/cleanup_signal naming, and I think it is easier to
follow.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__exit_signal() does important cleanups atomically under ->siglock. It is
also called from copy_process's error path. This is not good, for example we
can't move __unhash_process() under ->siglock for that reason.
We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
'bad_fork_cleanup_sighand:' label. For copy_process() case it is sufficient
to just backout copy_signal(), nothing more.
Again, nobody can see this task yet. For CLONE_THREAD case we just decrement
signal->count, otherwise nobody can see this ->signal and we can free it
lockless.
This patch assumes it is safe to do exit_thread_group_keys() without
tasklist_lock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only caller of exit_sighand(tsk) is copy_process's error path. We can
call __exit_sighand() directly and kill exit_sighand().
This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
external references, nobody can see it, and
IF (clone_flags & CLONE_SIGHAND)
At least 'current' has a reference to ->sighand, this
means atomic_dec_and_test(sighand->count) can't be true.
ELSE
Nobody can see this ->sighand, this means we can free it
without any locking.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add lock_task_sighand() helper and converts group_send_sig_info() to use
it. Hopefully we will have more users soon.
This patch also removes '!sighand->count' and '!p->usage' checks, I think
they both are bogus, racy and unneeded (but probably it makes sense to
restore them as BUG_ON()s).
->sighand is cleared and it's ->count is decremented in release_task() with
sighand->siglock held, so it is a bug to have '!p->usage || !->count' after
we already locked and verified it is the same. On the other hand, an
already dead task without ->sighand can have a non-zero ->usage due to
ptrace, for example.
If we read the stale value of ->sighand we must see the change after
spin_lock(), because that change was done while holding that same old
->sighand.siglock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch borrows a clever Hugh's 'struct anon_vma' trick.
Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.
But this means we don't need to defer 'kmem_cache_free(sighand)'. We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.
To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
daemonize() calls set_special_pids(1,1), while init and kernel threads spawned
from init/main.c:init() run with 0,0 special pids. This patch changes
INIT_SIGNALS() so that that they run with ->pgrp == ->session == 1 also. This
patch relies on fact that swapper's pid == 1.
Now we have no hashed zero pids in pid_hash[].
User-space visibible change is that now /sbin/init runs with (1,1) special
pids and becomes a session leader.
Quoting Eric W. Biederman:
>
> daemonize consuming pids (1,1) then consumes pgrp 1. So that when
> /sbin/init calls setsid() it thinks /sbin/init is a process group
> leader and setsid() fails. So /sbin/init wants pgrp 1 session 1
> but doesn't get it. I am pretty certain daemonize did not exist so
> /sbin/init got pgrp 1 session 1 in 2.4.
>
> That is the bug that is being fixed.
>
> This patch takes things one step farther and essentially calls
> setsid() for pid == 1 before init is execed. That is new behavior
> but it cleans up the kernel as we now do not need to support the
> case of a process without a process group or a session.
>
> The only process that could have possibly cared was /sbin/init
> and it already calls setsid() because it doesn't want that.
>
> If this was going to break anything noticeable the change in behavior
> from 2.4 to 2.6 would have already done that.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fork_idle() does unhash_process() just after copy_process(). Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.
copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process(). We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].
We don't need to hash pid == 0 in pidmap_init(). free_pidmap() is never
called with pid == 0 arg, so it will never be reused. So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.
However with this patch we don't hash pid == 0 for PIDTYPE_PID case. We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().
This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list. Callers are
updated to do these actions by hand.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently. This patch removes this argument.
No changes in affected .o files.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.
Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups. The leader is already in
the EXIT_DEAD state so no one cares about it's pids. The only concern for
the leader is that __unhash_process called from release_task will function
correctly. If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.
For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.
Currently de_thread is also adding the task to the task list which is just
silly.
Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.
So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.
The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kill_sl function doesn't exist in the kernel so a prototype is completely
unnecessary.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Basically this patch moves the generic tunnel protocol stuff out of
xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
and tunnel6 respectively.
The reason for this is that the problem that Hugo uncovered is only
the tip of the iceberg. The real problem is that when we removed the
dependency of ipip on xfrm4_tunnel we didn't really consider the module
case at all.
For instance, as it is it's possible to build both ipip and xfrm4_tunnel
as modules and if the latter is loaded then ipip simply won't load.
After considering the alternatives I've decided that the best way out of
this is to restore the dependency of ipip on the non-xfrm-specific part
of xfrm4_tunnel. This is acceptable IMHO because the intention of the
removal was really to be able to use ipip without the xfrm subsystem.
This is still preserved by this patch.
So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
the arbitration between the two. The order of processing is determined
by a simple integer which ensures that ipip gets processed before
xfrm4_tunnel.
The situation for ICMP handling is a little bit more complicated since
we may not have enough information to determine who it's for. It's not
a big deal at the moment since the xfrm ICMP handlers are basically
no-ops. In future we can deal with this when we look at ICMP caching
in general.
The user-visible change to this is the removal of the TUNNEL Kconfig
prompts. This makes sense because it can only be used through IPCOMP
as it stands.
The addition of the new modules shouldn't introduce any problems since
module dependency will cause them to be loaded.
Oh and I also turned some unnecessary pskb's in IPv6 related to this
patch to skb's.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] Provide Cirrus EP93xx AMBA PL010 serial support.
[SERIAL] amba-pl010: allow platforms to specify modem control method
[SERIAL] Remove obsoleted au1x00_uart driver
[SERIAL] Small time UART configuration fix for AU1100 processor
Patch from Lennert Buytenhek
This patch adds support for the Intel ixp23xx series of CPUs. The
ixp23xx is an XSC3 based CPU with 512K of L2 cache, a 64bit 66MHz PCI
interface, two DDR RAM interfaces, QDR RAM interfaces, two gigabit
MACs, two 10/100 MACs, expansion bus, four microengines, a Media and
Switch Fabric unit almost identical to the one on the ixp2400, two
xscale (8250ish) UARTs and a bunch of other stuff.
This patch adds the core ixp23xx support code, and support for the
ADI Engineering Roadrunner, Intel IXDP2351, and IP Fabrics Double
Espresso platforms.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Add support for the LogicPD PXA270 Card Engine.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
This patch adds support for the new XScale v3 core. This is an
ARMv5 ISA core with the following additions:
- L2 cache
- I/O coherency support (on select chipsets)
- Low-Locality Reference cache attributes (replaces mini-cache)
- Supersections (v6 compatible)
- 36-bit addressing (v6 compatible)
- Single instruction cache line clean/invalidate
- LRU cache replacement (vs round-robin)
I attempted to merge the XSC3 support into proc-xscale.S, but XSC3
cores have separate errata and have to handle things like L2, so it
is simpler to keep it separate.
L2 cache support is currently a build option because the L2 enable
bit must be set before we enable the MMU and there is no easy way to
capture command line parameters at this point.
There are still optimizations that can be done such as using LLR for
copypage (in theory using the exisiting mini-cache code) but those
can be addressed down the road.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block:
[BLOCK] cfq-iosched: seek and async performance fixes
[PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
[PATCH] cfq-iosched: small cfq_choose_req() optimization
[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all occurences of 0xff.. in calls to function pci_set_dma_mask()
and pci_set_consistant_dma_mask() with the corresponding DMA_xBIT_MASK from
linux/dma-mapping.h.
Signed-off-by: Matthias Gehre <M.Gehre@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nowadays, even Debian stable ships a microcode_ctl utility recent enough to no
longer use this ioctl.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Tigran Aivazian <tigran_aivazian@symantec.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
for_each_cpu() is a for-loop over cpu_possible_map. for_each_online_cpu is
for-loop cpu over cpu_online_map. .....for_each_cpu() is not sufficiently
explicit and can lead to mistakes.
This patch adds for_each_possible_cpu() in preparation for the removal of
for_each_cpu().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize select and poll by a using stack space for small fd sets
This brings back an old optimization from Linux 2.0. Using the stack is
faster than kmalloc. On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)
It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages. This can add up - e.g. if you have 10
daemons blocking in poll/select you save 40KB of memory.
I did a patch for this long ago, but it was never applied. This version is
a reimplementation of the old patch that tries to be less intrusive. I
only did the minimal changes needed for the stack allocation.
The cut off point before external memory is allocated is currently at
832bytes. The system calls always allocate this much memory on the stack.
These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers. There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]
I suspect more optimizations might be possible, but they would be more
complicated. One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
Problem is when to flush the file descriptors out though.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some quick backport bits from the libata PATA work to fix things found in
the sis driver. The piix driver needs some fixes too but those are way to
large and need someone working on old IDE with time to do them.
This patch fixes the case where random bits get loaded into SIS timing
registers according to the description of the correct behaviour from
Vojtech Pavlik. It also adds the SiS5517 ATA16 chipset which is not
currently supported by the driver. Thanks to Conrad Harriss for loaning me
the machine with the 5517 chipset.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Renumber the recently-added POLLREMOVE and POLLRDHUP to line up with the other
architectures.
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On ppc64 we look at a profiling register to work out the sample address and
if it was in userspace or kernel.
The backtrace interface oprofile_add_sample does not allow this. Create
oprofile_add_ext_sample and make oprofile_add_sample use it too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add driver support for general purpose I/O feature of the Synclink GT
adapters.
Signed-off-by: Paul Fulghum <paulkf@micrgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that Christoph Lameter's atomic_long_t support is merged in mainline,
might as well convert asm-generic/local.h to use it, so the same code can
be used for both sizes of 32 and 64-bit unsigned longs.
akpm sayeth:
Q:
Is there any particular reason why these routines weren't simply
implemented with local_save/restore_flags, if they are only meant to
guarantee atomicity to the local cpu? I'm sure on most platforms this
would be more efficient than using an atomic...
A:
The whole _point_ of local_t is to avoid local_irq_disable(). It's
designed to exploit the fact that many CPUs can do incs and decs in a way
which is atomic wrt local interrupts, but not atomic wrt SMP.
But this patch makes sense, because asm-generic/local.h is just a fallback
implementation for architectures which either cannot perform these
local-irq-atomic operations, or its maintainers haven't yet got around to
implementing them.
We need more work done on local_t in the 2.6.17 timeframe - they're defined as
unsigned long, but some architectures implement them as signed long.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix up some RTC whitespace and style
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reading the CMOS clock on x86 and some other arches currently takes up to one
second because it synchronizes with the CMOS second tick-over. This delay
shows up at boot time as well a resume time.
This is the currently the most substantial boot time delay for machines that
are working towards instant-on capability. Also, a quick back of the envelope
calculation (.5sec * 2M users * 1 boot a day * 10 years) suggests it has cost
Linux users in the neighborhood of a million man-hours.
An earlier thread on this topic is here:
http://groups.google.com/group/linux.kernel/browse_frm/thread/8a24255215ff6151/2aa97e66a977653d?hl=en&lr=&ie=UTF-8&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D1To2R-2S7-11%40gated-at.bofh.it#2aa97e66a977653d
..from which the consensus seems to be that it's no longer desirable.
In my view, there are basically four cases to consider:
1) networked, need precise walltime: use NTP
2) networked, don't need precise walltime: use NTP anyway
3) not networked, don't need sub-second precision walltime: don't care
4) not networked, need sub-second precision walltime:
get a network or a radio time source because RTC isn't good enough anyway
So this patch series simply removes the synchronization in favor of a simple
seqlock-like approach using the seconds value.
Note that for purposes of timer accuracy on wakeup, this patch will cause us
to fire timers up to one second late. But as the current timer resume code
will already sync once (or more!), it's no worse for short timers.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.
We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Detect whether a given process is seeky and if so disable (mostly) the
idle window if it is. We still allow just a little idle time, just enough
to allow that process to submit a new request. That is needed to maintain
fairness across priority groups.
In some cases, we could setup several async queues. This is not optimal
from a performance POV, since we want all async io in one queue to perform
good sorting on it. It also impacted sync queues, as async io got too much
slice time.
Signed-off-by: Jens Axboe <axboe@suse.de>
There are at least 14 different implementations of WARN() in the tree already.
The build fails all over the place.
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As per the corresponding change to the serial drivers, arrange
for ARM decompressors to give CRLF. Move the common putstr code
into misc.c such that machines only need to supply "putc" and
"flush" functions.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.
Signed-off-by: Jens Axboe <axboe@suse.de>
We used to assume that a DMA mapping request with a NULL dev was for
ISA DMA. This assumption was broken at some point. Now we explicitly
pass the detected ISA PCI device in the floppy setup.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These are some updates from both Ryan and Arnd for the hvc_console
driver:
The main point is to enable the inclusion of a console driver
for rtas, which is currrently needed for the cell platform.
Also shuffle around some data-type declarations and moves some
functions out of include/asm-ppc64/hvconsole.h and into a new
drivers/char/hvc_console.h file.
Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com>
Signed-off-by: Arnd Bergmann <abergman@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to export ppc64_firmware_features for modules. Before we do that
I think we should probably rename it to powerpc_firmware_features.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Export validate_sp so we can use it in the oprofile calltrace code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
- No one uses op_counter_config.valid, so remove it
- No need to ifdef around function protypes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
32-bit CHRP machines are now supported only in arch/powerpc, as are
all 64-bit PowerPC processors. This means that we don't use
Open Firmware on any platform in arch/ppc any more.
This makes PReP support a single-platform option like every other
platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM
is gone from arch/ppc. CONFIG_PPC_PREP is the option that selects
PReP support and is generally what has replaced
CONFIG_PPC_MULTIPLATFORM within arch/ppc.
_machine is all but dead now, being #defined to 0.
Updated Makefiles, comments and Kconfig options generally to reflect
these changes.
Signed-off-by: Paul Mackerras <paulus@samba.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: drop duplicate assignment in request_sock
[IPSEC]: Fix tunnel error handling in ipcomp6
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Don't make debugfs depend on DEBUG_KERNEL
[PATCH] Fix blktrace compile with sysfs not defined
[PATCH] unused label in drivers/block/cciss.
[BLOCK] increase size of disk stat counters
[PATCH] blk_execute_rq_nowait-speedup
[PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
[BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
[PATCH] kzalloc() conversion in drivers/block
[PATCH] update max_sectors documentation
... being careful that mutex_trylock is inverted wrt down_trylock
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows user-space to access data safely. This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.
It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.
A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
check_reshape checks validity and does things that can be done instantly -
like adding devices to raid1. start_reshape initiates a restriping process to
convert the whole array.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of checkpointing at each stripe, only checkpoint when a new write
would overwrite uncheckpointed data. Block any write to the uncheckpointed
area. Arbitrarily checkpoint at least every 3Meg.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We allow the superblock to record an 'old' and a 'new' geometry, and a
position where any conversion is up to. The geometry allows for changing
chunksize, layout and level as well as number of devices.
When using verion-0.90 superblock, we convert the version to 0.91 while the
conversion is happening so that an old kernel will refuse the assemble the
array. For version-1, we use a feature bit for the same effect.
When starting an array we check for an incomplete reshape and restart the
reshape process if needed. If the reshape stopped at an awkward time (like
when updating the first stripe) we refuse to assemble the array, and let
user-space worry about it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds raid5_reshape and end_reshape which will start and finish the
reshape processes.
raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
accidental use.
Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.
and Make sure that you have backups, just in case.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the core of the resize/expand process.
sync_request notices if a 'reshape' is happening and acts accordingly.
It allocated new stripe_heads for the next chunk-wide-stripe in the target
geometry, marking them STRIPE_EXPANDING.
Then it finds which stripe heads in the old geometry can provide data needed
by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to
read all blocks on those stripes.
Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a
STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
is written out and released.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to allow that different stripes are of different effective sizes, and
use the appropriate size. Also, when a stripe is being expanded, we must
block any IO attempts until the stripe is stable again.
Key elements in this change are:
- each stripe_head gets a 'disk' field which is part of the key,
thus there can sometimes be two stripe heads of the same area of
the array, but covering different numbers of devices. One of these
will be marked STRIPE_EXPANDING and so won't accept new requests.
- conf->expand_progress tracks how the expansion is progressing and
is used to determine whether the target part of the array has been
expanded yet or not.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
data structure.
This requires allocating new stripes in a new kmem_cache. If this succeeds,
we copy cache pages over and release the old stripes and kmem_cache.
We then allocate new pages. If that fails, we leave the stripe cache at it's
new size. It isn't worth the effort to shrink it back again.
Unfortuanately this means we need two kmem_cache names as we, for a short
period of time, we have two kmem_caches. So they are raid5/%s and
raid5/%s-alt
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The remainder of this batch implements raid5 reshaping. Currently the only
shape change that is supported is added a device, but it is envisioned that
changing the chunksize and layout will also be supported, as well as changing
the level (e.g. 1->5, 5->6).
The reshape process naturally has to move all of the data in the array, and so
should be used with caution. It is believed to work, and some testing does
support this, but wider testing would be great for increasing my confidence.
You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth.
This is because mdadm need to take a copy of a 'critical section' at the
start of the array incase there is a crash at an awkward moment. On restart,
mdadm will restore the critical section and allow reshape to continue.
I hope to release a 2.4-pre by early next week - it still needs a little more
polishing.
This patch:
Previously the array of disk information was included in the raid5 'conf'
structure which was allocated to an appropriate size. This makes it awkward
to change the size of that array. So we split it off into a separate
kmalloced array which will require a little extra indexing, but is much easier
to grow.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adding bd_claim_by_kobject() function which takes kobject as additional
signature of holder device and creates sysfs symlinks between holder device
and claimed device. bd_release_from_kobject() is a counterpart of
bd_claim_by_kobject.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all the CONFIG_SYSFS stuff. That's supposed to all be implemented up
in header files.
Yes, the CONFIG_SYSFS=n data structures will be a little larger than
necessary, but that's a tradeoff we can decide to make.
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Creating "slaves" and "holders" directories in /sys/block/<disk> and
creating "holders" directory under /sys/block/<disk>/<partition>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow drive geometry to be stored with a new DM_DEV_SET_GEOMETRY ioctl.
Device-mapper will now respond to HDIO_GETGEO. If the geometry information is
not available, zero will be returned for all of the parameters.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This flag should be set for a virtual device iff it is set for all
underlying devices.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem ID
10de:0215.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a driver for the ST M48T86 / Dallas DS12887 RTC.
This is a platform driver. The platform device must provide I/O routines to
access the RTC.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the I2C driver ids to i2c-id.h in preparation of the I2C
direct probing method.
This is kept separate so that it can be integrated to
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch, completely optional, removes from drivers/i2c/chips all the
drivers that are implemented in the new RTC subsystem.
It should be noted that none of the current driver is actually integrated,
i.e. usable without further patches.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the basic RTC subsystem infrastructure to the kernel.
rtc/class.c - registration facilities for RTC drivers
rtc/interface.c - kernel/rtc interface functions
rtc/hctosys.c - snippet of code that copies hw clock to sw clock
at bootup, if configured to do so.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes from the ARM subsytem some of the rtc-related functions
that have been included in the RTC subsystem. It also fixes some naming
collisions.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
futex.h updates:
- get rid of FUTEX_OWNER_PENDING - it's not used
- reduce ROBUST_LIST_LIMIT to a saner value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- fix: initialize the robust list(s) to NULL in copy_process.
- doc update
- cleanup: rename _inuser to _inatomic
- __user cleanups and other small cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and
wire up the new syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire
up the new syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
32-bit syscall compatibility support. (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the core infrastructure for robust futexes: structure definitions, the new
syscalls and the do_exit() based cleanup mechanism.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patchset provides a new (written from scratch) implementation of robust
futexes, called "lightweight robust futexes". We believe this new
implementation is faster and simpler than the vma-based robust futex solutions
presented before, and we'd like this patchset to be adopted in the upstream
kernel. This is version 1 of the patchset.
Background
----------
What are robust futexes? To answer that, we first need to understand what
futexes are: normal futexes are special types of locks that in the
noncontended case can be acquired/released from userspace without having to
enter the kernel.
A futex is in essence a user-space address, e.g. a 32-bit lock variable
field. If userspace notices contention (the lock is already owned and someone
else wants to grab it too) then the lock is marked with a value that says
"there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to
wait for the other guy to release it. The kernel creates a 'futex queue'
internally, so that it can later on match up the waiter with the waker -
without them having to know about each other. When the owner thread releases
the futex, it notices (via the variable value) that there were waiter(s)
pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up. Once all
waiters have taken and released the lock, the futex is again back to
'uncontended' state, and there's no in-kernel state associated with it. The
kernel completely forgets that there ever was a futex at that address. This
method makes futexes very lightweight and scalable.
"Robustness" is about dealing with crashes while holding a lock: if a process
exits prematurely while holding a pthread_mutex_t lock that is also shared
with some other process (e.g. yum segfaults while holding a pthread_mutex_t,
or yum is kill -9-ed), then waiters for that lock need to be notified that the
last owner of the lock exited in some irregular way.
To solve such types of problems, "robust mutex" userspace APIs were created:
pthread_mutex_lock() returns an error value if the owner exits prematurely -
and the new owner can decide whether the data protected by the lock can be
recovered safely.
There is a big conceptual problem with futex based mutexes though: it is the
kernel that destroys the owner task (e.g. due to a SEGFAULT), but the kernel
cannot help with the cleanup: if there is no 'futex queue' (and in most cases
there is none, futexes being fast lightweight locks) then the kernel has no
information to clean up after the held lock! Userspace has no chance to clean
up after the lock either - userspace is the one that crashes, so it has no
opportunity to clean up. Catch-22.
In practice, when e.g. yum is kill -9-ed (or segfaults), a system reboot is
needed to release that futex based lock. This is one of the leading
bugreports against yum.
To solve this problem, 'Robust Futex' patches were created and presented on
lkml: the one written by Todd Kneisel and David Singleton is the most advanced
at the moment. These patches all tried to extend the futex abstraction by
registering futex-based locks in the kernel - and thus give the kernel a
chance to clean up.
E.g. in David Singleton's robust-futex-6.patch, there are 3 new syscall
variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER.
The kernel attaches such robust futexes to vmas (via
vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are
searched to see whether they have a robust_head set.
Lots of work went into the vma-based robust-futex patch, and recently it has
improved significantly, but unfortunately it still has two fundamental
problems left:
- they have quite complex locking and race scenarios. The vma-based
patches had been pending for years, but they are still not completely
reliable.
- they have to scan _every_ vma at sys_exit() time, per thread!
The second disadvantage is a real killer: pthread_exit() takes around 1
microsecond on Linux, but with thousands (or tens of thousands) of vmas every
pthread_exit() takes a millisecond or more, also totally destroying the CPU's
L1 and L2 caches!
This is very much noticeable even for normal process sys_exit_group() calls:
the kernel has to do the vma scanning unconditionally! (this is because the
kernel has no knowledge about how many robust futexes there are to be cleaned
up, because a robust futex might have been registered in another task, and the
futex variable might have been simply mmap()-ed into this process's address
space).
This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than
that: the overhead makes robust futexes impractical for any type of generic
Linux distribution.
So it became clear to us, something had to be done. Last week, when Thomas
Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he
found a handful of new races and we were talking about it and were analyzing
the situation. At that point a fundamentally different solution occured to
me. This patchset (written in the past couple of days) implements that new
solution. Be warned though - the patchset does things we normally dont do in
Linux, so some might find the approach disturbing. Parental advice
recommended ;-)
New approach to robust futexes
------------------------------
At the heart of this new approach there is a per-thread private list of robust
locks that userspace is holding (maintained by glibc) - which userspace list
is registered with the kernel via a new syscall [this registration happens at
most once per thread lifetime]. At do_exit() time, the kernel checks this
user-space list: are there any robust futex locks to be cleaned up?
In the common case, at do_exit() time, there is no list registered, so the
cost of robust futexes is just a simple current->robust_list != NULL
comparison. If the thread has registered a list, then normally the list is
empty. If the thread/process crashed or terminated in some incorrect way then
the list might be non-empty: in this case the kernel carefully walks the list
[not trusting it], and marks all locks that are owned by this thread with the
FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any).
The list is guaranteed to be private and per-thread, so it's lockless. There
is one race possible though: since adding to and removing from the list is
done after the futex is acquired by glibc, there is a few instructions window
for the thread (or process) to die there, leaving the futex hung. To protect
against this possibility, userspace (glibc) also maintains a simple per-thread
'list_op_pending' field, to allow the kernel to clean up if the thread dies
after acquiring the lock, but just before it could have added itself to the
list. Glibc sets this list_op_pending field before it tries to acquire the
futex, and clears it after the list-add (or list-remove) has finished.
That's all that is needed - all the rest of robust-futex cleanup is done in
userspace [just like with the previous patches].
Ulrich Drepper has implemented the necessary glibc support for this new
mechanism, which fully enables robust mutexes. (Ulrich plans to commit these
changes to glibc-HEAD later today.)
Key differences of this userspace-list based approach, compared to the vma
based method:
- it's much, much faster: at thread exit time, there's no need to loop
over every vma (!), which the VM-based method has to do. Only a very
simple 'is the list empty' op is done.
- no VM changes are needed - 'struct address_space' is left alone.
- no registration of individual locks is needed: robust mutexes dont need
any extra per-lock syscalls. Robust mutexes thus become a very lightweight
primitive - so they dont force the application designer to do a hard choice
between performance and robustness - robust mutexes are just as fast.
- no per-lock kernel allocation happens.
- no resource limits are needed.
- no kernel-space recovery call (FUTEX_RECOVER) is needed.
- the implementation and the locking is "obvious", and there are no
interactions with the VM.
Performance
-----------
I have benchmarked the time needed for the kernel to process a list of 1
million (!) held locks, using the new method [on a 2GHz CPU]:
- with FUTEX_WAIT set [contended mutex]: 130 msecs
- without FUTEX_WAIT set [uncontended mutex]: 30 msecs
I have also measured an approach where glibc does the lock notification [which
it currently does for !pshared robust mutexes], and that took 256 msecs -
clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do.
(1 million held locks are unheard of - we expect at most a handful of locks to
be held at a time. Nevertheless it's nice to know that this approach scales
nicely.)
Implementation details
----------------------
The patch adds two new syscalls: one to register the userspace list, and one
to query the registered list pointer:
asmlinkage long
sys_set_robust_list(struct robust_list_head __user *head,
size_t len);
asmlinkage long
sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
size_t __user *len_ptr);
List registration is very fast: the pointer is simply stored in
current->robust_list. [Note that in the future, if robust futexes become
widespread, we could extend sys_clone() to register a robust-list head for new
threads, without the need of another syscall.]
So there is virtually zero overhead for tasks not using robust futexes, and
even for robust futex users, there is only one extra syscall per thread
lifetime, and the cleanup operation, if it happens, is fast and
straightforward. The kernel doesnt have any internal distinction between
robust and normal futexes.
If a futex is found to be held at exit time, the kernel sets the highest bit
of the futex word:
#define FUTEX_OWNER_DIED 0x40000000
and wakes up the next futex waiter (if any). User-space does the rest of
the cleanup.
Otherwise, robust futexes are acquired by glibc by putting the TID into the
futex field atomically. Waiters set the FUTEX_WAITERS bit:
#define FUTEX_WAITERS 0x80000000
and the remaining bits are for the TID.
Testing, architecture support
-----------------------------
I've tested the new syscalls on x86 and x86_64, and have made sure the parsing
of the userspace list is robust [ ;-) ] even if the list is deliberately
corrupted.
i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the
new glibc code (on x86_64 and i386), and it works for his robust-mutex
testcases.
All other architectures should build just fine too - but they wont have the
new syscalls yet.
Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline
function before writing up the syscalls (that function returns -ENOSYS right
now).
This patch:
Add placeholder futex_atomic_cmpxchg_inuser() implementations to every
architecture that supports futexes. It returns -ENOSYS.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add ptr_to_compat() to s390 - needed by the new robust-futex code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
untested. CHECKME: am i right about the 0x7fffffffUL masking?
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just about every architecture defines some macros to do operations on pfns.
They're all virtually identical. This patch consolidates all of them.
One minor glitch is that at least i386 uses them in a very skeletal header
file. To keep away from #include dependency hell, I stuck the new
definitions in a new, isolated header.
Of all of the implementations, sh64 is the only one that varied by a bit.
It used some masks to ensure that any sign-extension got ripped away before
the arithmetic is done. This has been posted to that sh64 maintainers and
the development list.
Compiles on x86, x86_64, ia64 and ppc64.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Helper functions for for_each_online_pgdat/for_each_zone look too big to be
inlined. Speed of these helper macro itself is not very important. (inner
loops are tend to do more work than this)
This patch make helper function to be out-of-lined.
inline out-of-line
.text 005c0680 005bf6a0
005c0680 - 005bf6a0 = FE0 = 4Kbytes.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By using for_each_online_pgdat(), pgdat_list is not necessary now. This patch
removes it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
sorted by node_boot_start.
Only nodes against which init_bootmem() is called are linked to the list.
(i386 allocates bootmem only from one node(0) not from all online nodes.)
A summary:
1. for_each_online_pgdat() traverses all *online* nodes.
2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch defines for_each_online_pgdat() as a replacement of
for_each_pgdat()
Now, online nodes are managed by node_online_map. But for_each_pgdat()
uses pgdat_link to iterate over all nodes(pgdat). This means management
structure for online pgdat is duplicated.
I think using node_online_map for for_each_pgdat() is simple and sane
rather ather than pgdat_link. New macro is named as
for_each_online_pgdat(). Following patch will fix callers of
for_each_pgdat().
The bootmem allocater uses for_each_pgdat() before pgdat initialization. I
don't think it's sane. Following patch will fix it.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes zone_mem_map.
pfn_to_page uses pgdat, page_to_pfn uses zone. page_to_pfn can use pgdat
instead of zone, which is only one user of zone_mem_map. By modifing it,
we can remove zone_mem_map.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
UML can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparc can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sh64 can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Curnow <rc@rc0.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sh can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PPC can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ARM can use generic funcs.
PFN_TO_NID, LOCAL_MAP_NR are defined by sub-archs.
Signed-off-by: KAMEZAWA Hirotuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alpha can use generic funcs.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PowerPC can use generic ones.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM.
Each arch has its own page_to_pfn(), pfn_to_page() for each models.
But most of them can use the same arithmetic.
This patch adds asm-generic/memory_model.h, which includes generic
page_to_pfn(), pfn_to_page() definitions for each memory model.
When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are
used instead of macro. This is enabled by some archs and reduces
text size.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new sched domain for representing multi-core with shared caches
between cores. Consider a dual package system, each package containing two
cores and with last level cache shared between cores with in a package. If
there are two runnable processes, with this appended patch those two
processes will be scheduled on different packages.
On such systems, with this patch we have observed 8% perf improvement with
specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
users).
This new domain will come into play only on multi-core systems with shared
caches. On other systems, this sched domain will be removed by domain
degeneration code. This new domain can be also used for implementing power
savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
I will post another patch for power savings policy soon)
Most of the arch/* file changes are for cpu_coregroup_map() implementation.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can now make some code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. it makes some of the code nicer.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cache_fresh is now only used in cache.c, so unexport it.
Part of cache_fresh (setting CACHE_VALID) should really be done under the
lock, while part (calling cache_revisit_request etc) must be done outside the
lock. So we split it up appropriately.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This has been replaced by more traditional code.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The C++-like 'template' approach proves to be too ugly and hard to work with.
The old 'template' won't go away until all users are updated.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These were an unnecessary wart. Also only have one 'DefineSimpleCache..'
instead of two.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current svc_expkey holds a pointer to the svc_export structure, so updates to
that structure have to be in-place, which is a wart on the whole cache
infrastruct. So we break that linkage and just do a second lookup.
If this became a performance issue, it would be possible to put a direct link
back in which was only used conditionally. i.e. when an object is replaced
in the cache, we set a flag in the old object. When dereferencing the link
from svc_expkey, if the flag is set, we drop the reference and do a fresh
lookup.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'auth_domain's are simply handles on internal data structures. They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch define a new autofs packet for autofs v5 and updates the waitq.c
functions to handle the additional packet type.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update autofs4 version.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a gcc warning about losing qualifiers to the first argument of
copy_from_user. The typeof change for correctness, and fixes a lot of the
warnings, but there are some cases where x has some extra qualifiers, like
volatile, which copy_from_user can't know about. For these, the void * cast
seems to be necessary.
Also cleaned up some of the whitespace and got rid of the emacs comment at the
bottom.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On CHRP machines we are supposed to call into firmware (RTAS)
periodically, to give it a chance to check for errors and other
events. Under ppc we had some special code in timer_interrupt
to do this, but that didn't get transferred over to arch/powerpc.
Instead, we use an array of timer_list structs, one per CPU,
and use add_timer_on to make sure each one gets called on the
appropriate CPU.
With this we can remove the heartbeat_* elements of the ppc_md
struct.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The kernel's representation of the disk statistics uses the type unsigned
which is 32b on both 32b and 64b platforms. Unfortunately, most system
tools that work with these numbers that are exported in /proc/diskstats
including iostat read these numbers into unsigned longs. This works fine
on 32b platforms and when the number of IO transactions are small on 64b
platforms. However, when the numbers wrap on 64b platforms & you read the
numbers into unsigned longs, and compare the numbers to previous readings,
then you get an unsigned representation of a negative number. This looks
like a very large 64b number & gives you bizarre readouts in iostat:
ilc4: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
ilc4: sda 5.50 0.00 143.96 0.00 307496983987862656.00 0.00 153748491993931328.00 0.00 2136028725038430.00 7.94 55.12 5.59 80.42
Though fixing iostat in user space is possible, and a quick survey
indicates that several other similar tools also use unsigned longs when
processing /proc/diskstats. Therefore, it seems like a better approach
would be to extend the length of the disk_stats structure on 64b
architectures to 64b. The following patch does that. It should not affect
the operation on 32b platforms.
Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
Since pSeries only wants to do something different in the idle loop when
there is no work to do, we can simplify the code by implementing
ppc_md.power_save functions instead of complete idle loops. There are
two versions: one for shared-processor partitions and one for dedicated-
processor partitions.
With this we also do a cede_processor() call on dedicated processor
partitions if the poll_pending() call indicates that the hypervisor
has work it wants to do.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
loops. It brings over the concept of having a ppc_md.power_save
function from 32-bit to ARCH=powerpc, which lets us get rid of
native_idle(). With this we will also be able to simplify the idle
handling for pSeries and cell.
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc32: Reorganize and complete MPC52xx initial cpu setup
This patch splits up the CPU setup into a generic part and a
platform specific part. We also add a few missing init at the
same time.
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc32: Adds support for the PCI hostbridge in MPC5200B
Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently have a hack to flip the boot cpu and its secondary thread
to logical cpuid 0 and 1. This means the logical - physical mapping will
differ depending on which cpu is boot cpu. This is most apparent on
kexec, where we might kexec on any cpu and therefore change the mapping
from boot to boot.
The patch below does a first pass early on to work out the logical cpuid
of the boot thread. We then fix up some paca structures to match.
Ive also removed the boot_cpuid_phys variable for ppc64, to be
consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.
This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.
The signal1 & signal2 files may be mmap()'d by regular user
processes. The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.
This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.
I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.
This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc currently declares some of its own system calls
in <asm/unistd.h>, but not all of them. That place also
contains remainders of the now almost unused kernel syscall
hack.
- Add a new <asm/syscalls.h> with clean declarations
- Include that file from every source that implements one
of these
- Get rid of old declarations in <asm/unistd.h>
This patch is required as a base for implementing system
calls from an SPU, but also makes sense as a general
cleanup.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>