dentry cache uses sophisticated RCU technology (and prefetching if
available) but touches 2 cache lines per dentry during hlist lookup.
This patch moves d_hash in the same cache line than d_parent and d_name
fields so that :
1) One cache line is needed instead of two.
2) the hlist_for_each_rcu() prefetching has a chance to bring all the
needed data in advance, not only the part that includes d_hash.next.
I also changed one old comment that was wrong for 64bits.
A further optimisation would be to separate dentry in two parts, one that
is mostly read, and one writen (d_count/d_lock) to avoid false sharing on
SMP/NUMA but this would need different field placement depending on 32bits
or 64bits platform.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now the real motivation for this cpuset mem_exclusive patch series seems
trivial.
This patch keeps a task in or under one mem_exclusive cpuset from provoking an
oom kill of a task under a non-overlapping mem_exclusive cpuset. Since only
interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
containment, there is little to gain from oom killing a task under a
non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
allocation must come from disjoint memory nodes.
This patch enables configuring a system so that a runaway job under one
mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
that might be using very high compute and memory resources for a prolonged
time.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes use of the previously underutilized cpuset flag
'mem_exclusive' to provide what amounts to another layer of memory placement
resolution. With this patch, there are now the following four layers of
memory placement available:
1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
3) The current tasks cpuset (GFP_USER allocations constrained to here), and
4) Specific node placement, using mbind and set_mempolicy.
These nest - each layer is a subset (same or within) of the previous.
Layer (2) above is new, with this patch. The call used to check whether a
zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
extended to take a gfp_mask argument, and its logic is extended, in the case
that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
placement is allowed. The definition of GFP_USER, which used to be identical
to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
cpuset_gfp_hardwall_flag patch.
GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
cpuset, so long as any node therein is not too tight on memory, but will
escape to the larger layer, if need be.
The intended use is to allow something like a batch manager to handle several
jobs, each job in its own cpuset, but using common kernel memory for caches
and such. Swapper and oom_kill activity is also constrained to Layer (2). A
task in or below one mem_exclusive cpuset should not cause swapping on nodes
in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
task in another such cpuset. Heavy use of kernel memory for i/o caching and
such by one job should not impact the memory available to jobs in other
non-overlapping mem_exclusive cpusets.
This patch enables providing hardwall, inescapable cpusets for memory
allocations of each job, while sharing kernel memory allocations between
several jobs, in an enclosing mem_exclusive cpuset.
Like Dinakar's patch earlier to enable administering sched domains using the
cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
that had previously done nothing much useful other than restrict what cpuset
configurations were allowed.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add another GFP flag: __GFP_HARDWALL.
A subsequent "cpuset_zone_allowed" patch will use this flag to mark GFP_USER
allocations, and distinguish them from GFP_KERNEL allocations.
Allocations (such as GFP_USER) marked GFP_HARDWALL are constrainted to the
current tasks cpuset. Other allocations (such as GFP_KERNEL) can steal from
the possibly larger nearest mem_exclusive cpuset ancestor, if memory is tight
on every node in the current cpuset.
This patch collides with Mel Gorman's patch to reduce fragmentation in the
standard buddy allocator, which adds two GFP flags. This was discussed on
linux-mm in July. Most likely, one of his flags for user reclaimable memory
can be the same as my __GFP_HARDWALL flag, under some generic name meaning its
user address space memory.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
People have run into a problem when they do this:
watch (file1, all_events);
watch (file2, some_events);
if file2 is a hard link to file1, some events will be missed because by
default we replace the mask. The patch below adds a flag IN_MASK_ADD which
will cause inotify to add to the existing mask if present.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes sense now that we have asm-powerpc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch gathers all the struct flock64 definitions (and the operations),
puts them under !CONFIG_64BIT and cleans up the arch files.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch just gathers together all the struct flock definitions except
xtensa into asm-generic/fcntl.h.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch puts the most popular of each fcntl operation/flag into
asm-generic/fcntl.h and cleans up the arch files.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch puts the most popular of each open flag into asm-generic/fcntl.h
and cleans up the arch files.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These two files are basically identical, so make one just include the other
(protecting the 32-bit-only parts with __powerpc64__). Also remove some
completely unused defines.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This set of patches creates asm-generic/fcntl.h and consolidates as much as
possible from the asm-*/fcntl.h files into it.
This patch just gathers all the identical bits of the asm-*/fcntl.h files into
asm-generic/fcntl.h.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've rewriten Atushi's fix for the 64-bit put_unaligned on 32-bit systems
bug to generate more efficient code.
This case has buzilla URL http://bugzilla.kernel.org/show_bug.cgi?id=5138.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
asm/segment.h varies greatly on different architectures but is clearly
deprecated. Removing all non-architecture consumers will make it easier
for us to get ride of asm/segment.h all together.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch cleans up a commonly repeated set of changes to the NTP state
variables by adding two helper inline functions:
ntp_clear(): Clears the ntp state variables
ntp_synced(): Returns 1 if the system is synced with a time server.
This was compile tested for alpha, arm, i386, x86-64, ppc64, s390, sparc,
sparc64.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
struct file cleanup: f_maxcount has an unique value (INT_MAX). Just use
the hard-wired value.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is no do_nanosleep function so kill it's declaration in <linux/time.h>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
unused and useless..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IRQ_PER_CPU is not used by all architectures. This patch introduces the
macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation
of dead code in __do_IRQ().
ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their
include/asm_ARCH/irq.h file.
Through grepping the tree I found the following architectures currently use
IRQ_PER_CPU:
cris, ia64, ppc, ppc64 and parisc.
Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Upgrade the request_firmware_nowait function to not start the hotplug
action on a firmware update.
This patch is tested along with dell_rbu driver on i386 and x86-64 systems.
Signed-off-by: Abhay Salunke <Abhay_Salunke@dell.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the /proc/sysvipc/shm|sem|msg files to use the generic seq_file
implementation for struct ipc_ids.
Signed-off-by: Mike Waychison <mikew@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When registering an RPC cache, cache_register() always sets the owner as the
sunrpc module. However, there are RPC caches owned by other modules. With
the incorrect owner setting, the real owning module can be removed potentially
with an open reference to the cache from userspace.
For example, if one were to stop the nfs server and unmount the nfsd
filesystem, the nfsd module could be removed eventhough rpc.idmapd had
references to the idtoname and nametoid caches (i.e.
/proc/net/rpc/nfs4.<cachename>/channel is still open). This resulted in a
system panic on one of our machines when attempting to restart the nfs
services after reloading the nfsd module.
The following patch adds a 'struct module *owner' field in struct
cache_detail. The owner is further assigned to the struct proc_dir_entry
in cache_register() so that the module cannot be unloaded while user-space
daemons have an open reference on the associated file under /proc.
Signed-off-by: Bruce Allan <bwa@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seems pointless to require .c files to test CONFIG_PNP_DEBUG and
conditionally define DEBUG before including <linux/pnp.h>. Just test
CONFIG_PNP_DEBUG directly in pnp.h.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Newer Sony VAIO models (VGN-S480, VGN-S460, VGN-S3XP etc) use a new method to
initialize the SPIC device. The new way to initialize (and disable) the
device comes directly from the AML code in the _CRS, _SRS and _DIS methods
from the DSDT table. This patch adds support for the new models.
Signed-off-by: Erik Waling <erikw@acc.umu.se>
Signed-off-by: Stelian Pop <stelian@popies.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If /etc/mtab is a regular file all of the mount options (of a file system)
are written to /etc/mtab by the mount command. The quota tools look there
for the quota strings for their operation. If, however, /etc/mtab is a
symlink to /proc/mounts (a "good thing" in some environments) the tools
don't write anything - they assume the kernel will take care of things.
While the quota options are sent down to the kernel via the mount system
call and the file system codes handle them properly unfortunately there is
no code to echo the quota strings into /proc/mounts and the quota tools
fail in the symlink case.
The attached patchs modify the EXT[2|3] and JFS codes to add the necessary
hooks. The show_options function of each file system in these patches
currently deal with only those things that seemed related to quotas;
especially in the EXT3 case more can be done (later?).
Jan Kara also noted the difficulty in moving these changes above the FS
codes responding similarly to myself to Andrew's comment about possible
VFS migration. Issue summary:
- FS codes have to process the entire string of options anyway.
- Only FS codes that use quotas must have a show_options function (for
quotas to work properly) however quotas are only used in a small number
of FS.
- Since most of the quota using FS support other options these FS codes
should have the a show_options function to show those options - and the
quota echoing becomes virtually negligible.
Based on feedback I have modified my patches from the original:
JFS a missing patch has been restored to the posting
EXT[2|3] and JFS always use the show_options function
- Each FS has at least one FS specific option displayed
- QUOTA output is under a CONFIG_QUOTA ifdef
- a follow-on patch will add a multitude of options for each FS
EXT[2|3] and JFS "quota" is treated as "usrquota"
EXT3 journalled data check for journalled quota removed
EXT[2|3] mount when quota specified but not compiled in
- no changes from my original patch. I tested the patch and the codes
warn but
- still mount. With all due respection I believe the comments
otherwise were a
- misread of the patch. Please reread/test and comment. XFS patch
removed - the XFS team already made the necessary changes EXT3 mixing
old and new quotas are handled differently (not purely exclusive)
- if old and new quotas for the same type are used together the old
type is silently depricated for compatability (e.g. usrquota and
usrjquota)
- mixing of old and new quotas is an error (e.g. usrjquota and
grpquota)
Signed-off-by: Mark Bellon <mbellon@mvista.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't
very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE
so that we can change it if necessary when a new vector is added.
Because of include file ordering problems, doing this necessitated the
extraction of the AT_* symbols into a standalone header file.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All users have been converted.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens:
->bi_set is totally unnecessary bloat of struct bio. Just define a proper
destructor for the bio and it already knows what bio_set it belongs too.
Peter:
Fixed the bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global functions static
- journal.c: remove the unused global function __journal_internal_check
and move the check to journal_init
- remove the following write-only global variable:
- journal.c: current_journal
- remove the following unneeded EXPORT_SYMBOL:
- journal.c: journal_recover
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When I first wrote the compat layer patches, I was somewhat cavalier about
the definition of compat_uid_t and compat_gid_t (or maybe I just
misunderstood :-)). This patch makes the compat types much more consistent
with the types we are being compatible with and hopefully will fix a few
bugs along the way.
compat type type in compat arch
__compat_[ug]id_t __kernel_[ug]id_t
__compat_[ug]id32_t __kernel_[ug]id32_t
compat_[ug]id_t [ug]id_t
The difference is that compat_uid_t is always 32 bits (for the archs we
care about) but __compat_uid_t may be 16 bits on some.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here's the latest version of relayfs, against linux-2.6.11-mm2. I'm hoping
you'll consider putting this version back into your tree - the previous
rounds of comment seem to have shaken out all the API issues and the number
of comments on the code itself have also steadily dwindled.
This patch is essentially the same as the relayfs redux part 5 patch, with
some minor changes based on reviewer comments. Thanks again to Pekka
Enberg for those. The patch size without documentation is now a little
smaller at just over 40k. Here's a detailed list of the changes:
- removed the attribute_flags in relay open and changed it to a
boolean specifying either overwrite or no-overwrite mode, and removed
everything referencing the attribute flags.
- added a check for NULL names in relayfs_create_entry()
- got rid of the unnecessary multiple labels in relay_create_buf()
- some minor simplification of relay_alloc_buf() which got rid of a
couple params
- updated the Documentation
In addition, this version (through code contained in the relay-apps tarball
linked to below, not as part of the relayfs patch) tries to make it as easy
as possible to create the cooperating kernel/user pieces of a typical and
common type of logging application, one where kernel logging is kicked off
when a user space data collection app starts and stops when the collection
app exits, with the data being automatically logged to disk in between. To
create this type of application, you basically just include a header file
(relay-app.h, included in the relay-apps tarball) in your kernel module,
define a couple of callbacks and call an initialization function, and on
the user side call a single function that sets up and continuously monitors
the buffers, and writes data to files as it becomes available. Channels
are created when the collection app is started and destroyed when it exits,
not when the kernel module is inserted, so different channel buffer sizes
can be specified for each separate run via command-line options. See the
README in the relay-apps tarball for details.
Also included in the relay-apps tarball are a couple examples
demonstrating how you can use this to create quick and dirty kernel
logging/debugging applications. They are:
- tprintk, short for 'tee printk', which temporarily puts a kprobe on
printk() and writes a duplicate stream of printk output to a relayfs
channel. This could be used anywhere there's printk() debugging code
in the kernel which you'd like to exercise, but would rather not have
your system logs cluttered with debugging junk. You'd probably want
to kill klogd while you do this, otherwise there wouldn't be much
point (since putting a kprobe on printk() doesn't change the output
of printk()). I've used this method to temporarily divert the packet
logging output of the iptables LOG target from the system logs to
relayfs files instead, for instance.
- klog, which just provides a printk-like formatted logging function
on top of relayfs. Again, you can use this to keep stuff out of your
system logs if used in place of printk.
The example applications can be found here:
http://prdownloads.sourceforge.net/dprobes/relay-apps.tar.gz?download
From: Christoph Hellwig <hch@lst.de>
avoid lookup_hash usage in relayfs
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.
When enabled then per-CPU watchdog threads are started, which try to run
once per second. If they get delayed for more than 10 seconds then a
callback from the timer interrupt detects this condition and prints out a
warning message and a stack dump (once per lockup incident). The feature
is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
only gets the debug info out, automatically, and on all CPUs affected by
the lockup.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
(which at least on UP usually means an immediate context switch to one of
the waiter threads). This waiter wakes up and after a few instructions it
attempts to acquire the cv internal lock, but that lock is still held by
the thread calling pthread_cond_signal. So it goes to sleep and eventually
the signalling thread is scheduled in, unlocks the internal lock and wakes
the waiter again.
Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).
Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force in place of
lll_mutex_unlock is not enough and probably why it has been disabled at
that time:
The number is value in cv->__data.__lock.
thr1 thr2 thr3
0 pthread_cond_wait
1 lll_mutex_lock (cv->__data.__lock)
0 lll_mutex_unlock (cv->__data.__lock)
0 lll_futex_wait (&cv->__data.__futex, futexval)
0 pthread_cond_signal
1 lll_mutex_lock (cv->__data.__lock)
1 pthread_cond_signal
2 lll_mutex_lock (cv->__data.__lock)
2 lll_futex_wait (&cv->__data.__lock, 2)
2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
# FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2 lll_mutex_unlock_force (cv->__data.__lock)
0 cv->__data.__lock = 0
0 lll_futex_wake (&cv->__data.__lock, 1)
1 lll_mutex_lock (cv->__data.__lock)
0 lll_mutex_unlock (cv->__data.__lock)
# Here, lll_mutex_unlock doesn't know there are threads waiting
# on the internal cv's lock
Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one of
these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.
We would need to use lll_mutex_unlock_force in pthread_cond_signal after
requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.
Another alternative is to do the unlocking pthread_cond_signal needs to do
(the lock can't be unlocked before lll_futex_wake, as that is racy) in the
kernel.
I have implemented both variants, futex-requeue-glibc.patch is the first
one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
The kernel interface allows userland to specify how exactly an unlocking
operation should look like (some atomic arithmetic operation with optional
constant argument and comparison of the previous futex value with another
constant).
It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as a
starting point by maintainers to write support for their arches and ATM
will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been
(lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.
With the following benchmark on UP x86-64 I get:
for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s
The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.
Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a multi-part message in MIME format. If the cpu lacks 3DNOW
feature, we can use a normal prefetcht0 instruction instead of NOP5.
"prefetchw (%rxx)" and "prefetcht0 (%rxx)" have the same length, ranging
from 3 to 5 bytes depending on the register. So this patch even helps
AMD64, shortening the length of the code.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When handling writes to /proc/irq, current code is re-programming rte
entries directly. This is not recommended and could potentially cause
chipset's to lockup, or cause missing interrupts.
CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
interrupt is pending. The same needs to be done for /proc/irq handling as well.
Otherwise user space irq balancers are really not doing the right thing.
- Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
lack of a generic name.
- added move_irq out of IRQ_BALANCE, and added this same to X86_64
- Added new proc handler for write, so we can do deferred write at irq
handling time.
- Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
it now shows only active cpu masks, or exactly what was set.
- Provided a common move_irq implementation, instead of duplicating
when using generic irq framework.
Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
Tested UP builds as well.
MSI testing: tbd: I have cards, need to look for a x-over cable, although I
did test an earlier version of this patch. Will test in a couple days.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Zwane Mwaikambo <zwane@holomorphy.com>
Grudgingly-acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The function prototypes for iosapic_enable_intr() and
iosapic_pci_fixup() in include/asm-ia64/iosapic.h are no longer
needed. This patch removes them. The original patch has been posted by
Satoru Takeuchi.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The config option 'CONFIG_ACPI_DEALLOCATE_IRQ' is no longer
needed. This patch removes it.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The function prototype for handl_IRQ_event() in include/asm-ia64/irq.h
is no longer needed. This patch removes it.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- Add user specified context to all uCM events. Users will not retrieve
any events associated with the context after destroying the corresponding
cm_id.
- Provide the ib_cm_init_qp_attr() call to userspace clients of the CM.
This call may be used to set QP attributes properly before modifying the QP.
- Fixes some error handling synchonization and cleanup issues.
- Performs some minor code cleanup.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Patch from Tony Lindgren
This patch syncs the mainline kernel with linux-omap tree.
The highlights of the patch are:
- Start adding 24xx support by Paul Mundt
- Clean-up of cpu detection by Dirk Behme and Tony Lindgren
- Add DSP header by Toshihiro Kobayashi
- Add support for mtd-xip by Vladimir Barinov
- Add various new mux registers
- Move OMAP specific serial defines back to serial.h
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add the Simtec Anubis to the list of supported
machines in the arch/arm/mach-s3c2410 directory.
This ensures the core peripherals are registered,
the timer source is configured and the correct
power-management is enabled.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Hi Jeff,
This is version 19 of the Wireless Extensions. It was supposed
to be the fallback of the WPA API changes, but people seem quite happy
about it (especially Jouni), so the patch is rather small.
The patch has been fully tested with 2.6.13 and various
wireless drivers, and is in its final version. Would you mind pushing
that into Linus's kernel so that the driver and the apps can take
advantage ot it ?
It includes :
o iwstat improvement (explicit dBm). This is the result of
long discussions with Dan Williams, the authors of
NetworkManager. Thanks to him for all the fruitful feedback.
o remove pointer from event stream. I was not totally sure if
this pointer was 32-64 bits clean, so I'd rather remove it and be at
peace with it.
o remove linux header from wireless.h. This has long been
requested by people writting user space apps, now it's done, and it
was not even painful.
o final deprecation of spy_offset. You did not like it, it's
now gone for good.
o Start deprecating dev->get_wireless_stats -> debloat netdev
o Add "check" version of event macros for ieee802.11
stack. Jiri Benc doesn't like the current macros, we aim to please ;-)
All those changes, except the last one, have been bit-roting on
my web pages for a while...
Patches for most kernel drivers will follow. Patches for the
Orinoco and the HostAP drivers have been sent to their respective
maintainers.
Have fun...
Jean
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch adds a driver for a new 1000 Mbit ethernet NIC. It is
integrated on the south bridge that is used for our Cell Blades.
The code gets the MAC address from the Open Firmware device tree, so it
won't compile on platforms other than ppc64.
This is the first public release, so I don't expect the first version to
get merged, but I'd aim for integration within the 2.6.13 time frame.
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Ax2asc was still using a static buffer for all invocations which isn't
exactly SMP-safe. Change ax2asc to take an additional result buffer as
the argument. Change all callers to provide such a buffer.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new timestamp get/set routines should have const attribute
on parameters (helps to indicate direction).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch kills __ip_ct_expect_unlink_destroy and export
unlink_expect as ip_ct_unlink_expect. As it was discussed [1], the function
__ip_ct_expect_unlink_destroy is a bit confusing so better do the following
sequence: ip_ct_destroy_expect and ip_conntrack_expect_put.
[1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-August/020794.html
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the NAT module is loaded when connections are already confirmed
it must not change their tuples anymore. This is especially important
with CONFIG_NETFILTER_DEBUG, the netfilter listhelp functions will
refuse to remove an entry from a list when it can not be found on
the list, so when a changed tuple hashes to a new bucket the entry
is kept in the list until and after the conntrack is freed.
Allocate the exact conntrack tuple for NAT for already confirmed
connections or drop them if that fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A permanent expectation exists until timeing out and can expect
multiple related connections.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SUPPORT_SYSRQ is false, gcc can emit warnings for
the uart_handle_sysrq_char() that results. Using an
empty inline returning zero kills the warning.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we set the class bit in kernel SLB entries, and clear it on
user SLB entries. On POWER5, ERAT entries created in real mode have
the class bit clear. So to avoid flushing kernel ERAT entries on each
context switch, this patch inverts our usage of the class bit, setting
it on user SLB entries and clearing it on kernel SLB entries.
Booted on POWER5 and G5.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move oprofile_impl.h into include/asm-ppc64 in preparation for moving
oprofile_model into cpu feature struct.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the CPU_FTR_PMC8 feature now we encode the number of PMCs
directly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a field in the cputable struct to store the number of PMCs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch merges several include files from
asm-ppc and asm-ppc64 into the new asm-powerpc.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Merged several nearly-identical header files from asm-ppc and asm-ppc64
into asm-powerpc.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
cmpxchg has the following code:
__typeof__(*(ptr)) _o_ = (o);
__typeof__(*(ptr)) _n_ = (n);
Unfortunately it makes gcc 4.0 store and load the variables to the stack.
Eg in atomic_dec_and_test we get:
stw r10,112(r1)
stw r9,116(r1)
lwz r9,112(r1)
lwz r0,116(r1)
x86 is just casting the values so do that instead. Also change __xchg*
and __cmpxchg* to take unsigned values, removing a few sign extensions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Consolidate the early console and PPCDBG code in udbg.c
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Take udbg out of ppc_md. Allows us to not overwrite early udbg inits
when assigning ppc_md.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
inline pmac_call_feature references ppc_md so include asm/machdep.h
in asm/pmac_feature.h
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ipv4 and ipv6 protocols need to access it unconditionally.
SYSCTL=n build failure reported by Russell King.
Signed-off-by: David S. Miller <davem@davemloft.net>
Every file should #include the header files containing the prototypes
of it's global functions.
In this case this showed that the prototype of irlan_print_filter()
was wrong which is also corrected in this patch.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
at the moment, the list_head semantics are
list_add(node, head)
whereas current klist semantics are
klist_add(head, node)
This is bound to cause confusion, and since klist is the newcomer, it
should follow the list_head semantics.
I also added missing include guards to klist.h
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch from Richard Purdie
This patch updates the PCMCIA pxa2xx_sharpsl driver to support multiple scoop
devices by adding a scoop to pcmcia slot mapping structure. It adds platform
support for poodle, is known to work on spitz (which is dual slot) and
should also support collie with a minor amount of further work.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Drop the I2C_ACK_TEST ioctl, which was commented out. It never really
existed (not after 1999 anyway), and there is no such thing as a ack
test on I2C/SMBus anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I2C_DEVNAME and i2c_clientname were introduced in 2.5.68 [1] to help
media/video driver authors who wanted their code to be compatible with
both Linux 2.4 and 2.6. The cause of the incompatibility has gone since
[2], so I think we can get rid of them, as they tend to make the code
harder to read and longer to preprocess/compile for no more benefit.
I'd hope nobody seriously attempts to keep media/video driver compatible
across Linux trees anymore, BTW.
[1] http://marc.theaimsgroup.com/?l=linux-kernel&m=104930186524598&w=2
[2] http://www.linuxhq.com/kernel/v2.6/0-test3/include/linux/i2c.h
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Delete an outdated comment about i2c_algorithm.id being computed
from algo->id.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The I2C_ALGO_* constants have no more users, delete them. Also update
the comments in i2c-id.h so that they reflect the current state of the
file.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In theory, there should be no more users of I2C_ALGO_* at this point.
However, it happens that several drivers were using I2C_ALGO_* for
adapter ids, so we need to correct these before we can get rid of all
the I2C_ALGO_* definitions.
Note that this also fixes a bug in media/video/tvaudio.c:
/* don't attach on saa7146 based cards,
because dedicated drivers are used */
if ((adap->id & I2C_ALGO_SAA7146))
return 0;
This test was plain broken, as it would succeed for many more adapters
than just the saa7146: any those id would share at least one bit with
the saa7146 id. We are really lucky that the few other adapters we want
this driver to work with did not fulfill that condition.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Merge the algorithm id part (16 upper bits) of the i2c adapters ids
into the definition of the adapters ids directly. After that, we don't
need to OR both ids together for each i2c_adapter structure.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are no more users of i2c_algorithm.id, so we can finally drop
this structure member.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the adapter id rather than the algorithm id to detect the i2c-isa
pseudo-adapter. This saves one level of dereferencing, and the
algorithm ids will soon be gone anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The name member of the i2c_algorithm is never used, although all
drivers conscientiously fill it. We can drop it completely, this
structure doesn't need to have a name.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I see very little reason why vid_from_reg is inlined. It is not
exactly short, its parameters are seldom known in advance, and it is
never called in speed critical areas. Uninlining it should cause
little performance loss if any, and saves a signficant space as well
as compilation time.
As suggested by Alexey Dobriyan, I am leaving vid_to_reg inline for now,
as it is short and has a single user so far.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Delete DEFAULT_VRM from hwmon-vid.h, it has no more users.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The only part left in i2c-sensor is the VRM/VRD/VID handling code.
This is in no way related to i2c, so it doesn't belong there. Move
the code to hwmon, where it belongs.
Note that not all hardware monitoring drivers do VRM/VRD/VID
operations, so less drivers depend on hwmon-vid than there were
depending on i2c-sensor.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The only thing left in i2c-sensor.h are module parameter definition
macros. It's only an extension of what i2c.h offers, and this extension
is not sensors-specific. As a matter of fact, a few non-sensors drivers
use them. So we better merge them in i2c.h, and get rid of i2c-sensor.h
altogether.
Signed-off-by: Jean Delvare <khali@linux-fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The i2c_detect function has no more user, delete it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We now have two identical structures, i2c_address_data in i2c-sensor.h
and i2c_client_address_data in i2c.h. We can kill one of them, I choose
to keep the one in i2c.h as it makes more sense (this structure is not
specific to sensors.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The way i2c-sensor handles forced addresses could be optimized. It
defines a structure (i2c_force_data) to associate a module parameter
with a given kind value, but in fact this kind value is always the
index of the structure in each array it is used in. So this additional
value can be omitted, and still be deduced in the code handling these
arrays.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for kind-forced addresses to i2c_probe, like i2c_detect
has for (essentially) hardware monitoring drivers.
Note that this change will slightly increase the size of the drivers
using I2C_CLIENT_INSMOD, with no immediate benefit. This is a
requirement if we want to merge i2c_probe and i2c_detect though, and
seems a reasonable price to pay in comparison with the previous
cleanups which saved much more than that (such as the i2c-isa cleanup
or the i2c address ranges removal.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move SENSORS_LIMIT from i2c-sensor.h to hwmon.h, as it is in no way
related to i2c.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We could inline i2c_adapter_id, as it is really, really short. Doing
so saves a few bytes both in i2c-core and in the drivers using this
function.
before after diff
drivers/hwmon/adm1026.ko 41344 41305 -39
drivers/hwmon/asb100.ko 27325 27246 -79
drivers/hwmon/gl518sm.ko 20824 20785 -39
drivers/hwmon/it87.ko 26419 26380 -39
drivers/hwmon/lm78.ko 21424 21385 -39
drivers/hwmon/lm85.ko 41034 40939 -95
drivers/hwmon/w83781d.ko 39561 39514 -47
drivers/hwmon/w83792d.ko 32979 32932 -47
drivers/i2c/i2c-core.ko 24708 24531 -177
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I would like to announce support for W83792D chip. This driver was developed
by Winbond Electronics Corp. I added sysfs attributes callbacks infrastructure
plus various code fixes and codingstyle cleanups. I would like to thank Winbond
for supporting free software.
This patch is against 2.6.13rc3 plus hwmon-class and hwmon-split.
Separate patch for documantation and hwmon class register will follow.
Signed-off-by: Rudolf Marek <r.marek@sh.cvut.cz>
Signed-off-by: Chunhao Huang <DZShen@Winbond.com.tw>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move the definitions of i2c_is_isa_client and i2c_is_isa_adapter from
i2c.h to i2c-isa.h. Only hybrid drivers still need them.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kill normal_isa in header files, documentation and all chip drivers, as
it is no more used.
normal_i2c could be renamed to normal, but I decided not to do so at the
moment, so as to limit the number of changes. This might be done later
as part of the i2c_probe/i2c_detect merge.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Convert i2c-isa from a dumb i2c_adapter into a pseudo i2c-core for ISA
hardware monitoring drivers. The isa i2c_adapter is no more registered
with i2c-core, drivers have to explicitely connect to it using the new
i2c_isa_{add,del}_driver interface.
At this point, all ISA chip drivers are useless, because they still
register with i2c-core in the hope i2c-isa is registered there as well,
but it isn't anymore.
The fake bus will be named i2c-9191 in sysfs. This is the number it
already had internally in various places, so it's not exactly new,
except that now the number is seen in userspace as well. This shouldn't
be a problem until someone really has 9192 I2C busses in a given system
;)
The fake bus will no more show in "i2cdetect -l", as it won't be seen by
i2c-dev anymore (not being registered with i2c-core), which is a good
thing, as i2cdetect/i2cdump/i2cset cannot operate on this fake bus
anyway.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Temporarily export a few structures and functions from i2c-core, because we
will soon need them in i2c-isa.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the sysfs class "hwmon" for use by hardware monitoring
(sensors) chip drivers. It also fixes up the related Kconfig/Makefile
bits.
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move the inline function kobj_to_i2c_client() from max6875.c to i2c.h.
Signed-off-by: Ben Gardner <bgardner@wabtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On s390 the lock value used for spinlocks consists of the lower 32 bits of the
PSW that holds the lock. If this address happens to be on a four gigabyte
boundary the lock is left unlocked. This allows other cpus to grab the same
lock and enter a lock protected code path concurrently. In theory this can
happen if the vmalloc area for the code of a module crosses a 4 GB boundary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
debug feature changes/bug fixes:
- Use get_clock() function instead of private inline assembly.
- Use 'struct timeval' instead of 'struct timespec' for call to
tod_to_timeval(). Now the microsecond part of the timestamp is correct
again.
- Fix a locking problem: when creating a snapshot of the current content
of the debug areas, lock the entire debug_info object.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new machine check handler still has a few bugs.
1) The system entry time has to be stored in the machine check handler,
2) the machine check return psw may not be stored at the usual place
because it might overwrite the return psw of the interrupted context,
3) the return address for the call to s390_handle_mcck in the i/o interrupt
handler is not correct,
4) the system call cleanup has to take the different save area of the
machine check handler into account,
5) the machine check handler may not call UPDATE_VTIME before
CREATE_STACK_FRAME, and
6) the io leave path needs a critical section cleanup to make sure that the
TIF_MCCK_PENDING bit is really checked before switching back to user space.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file seems to be an accident.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We were leaking pmd pages when 3_LEVEL_PGTABLES was enabled. This fixes that.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a lot of code which is duplicated between the 2 and 3 level
implementation, with the only difference that the 3-level implementation is a
bit more generalized (instead of accessing directly pte_t.pte, it uses the
appropriate access macros).
So this code is joined together.
As obvious, a "core code nice cleanup" is not a "stability-friendly patch" so
usual care applies.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Normally, activate_mm() is called from exec(), and thus it used to be a
no-op because we use a completely new "MM context" on the host (for
instance, a new process), and so we didn't need to flush any "TLB entries"
(which for us are the set of memory mappings for the host process from the
virtual "RAM" file).
Kernel threads, instead, are usually handled in a different way. So, when
for AIO we call use_mm(), things used to break and so Benjamin implemented
activate_mm(). However, that is only needed for AIO, and could slow down
exec() inside UML, so be smart: detect being called for AIO (via
PF_BORROWED_MM) and do the full flush only in that situation.
Comment also the caller so that people won't go breaking UML without
noticing. I also rely on the caller's locks for testing current->flags.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which
can be used by UML to singlestep a process: it will receive SINGLESTEP
interceptions for normal instructions and syscalls, but syscall execution will
be skipped just like with PTRACE_SYSEMU.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike <jdike@addtoit.com>,
Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
except that the kernel does not execute the requested syscall; this is useful
to improve performance for virtual environments, like UML, which want to run
the syscall on their own.
In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
and on exit, and each time you also have two context switches; with SYSEMU you
avoid the 2nd stop and so save two context switches per syscall.
Also, some architectures don't have support in the host for changing the
syscall number via ptrace(), which is currently needed to skip syscall
execution (UML turns any syscall into getpid() to avoid it being executed on
the host). Fixing that is hard, while SYSEMU is easier to implement.
* This version of the patch includes some suggestions of Jeff Dike to avoid
adding any instructions to the syscall fast path, plus some other little
changes, by myself, to make it work even when the syscall is executed with
SYSENTER (but I'm unsure about them). It has been widely tested for quite a
lot of time.
* Various fixed were included to handle the various switches between
various states, i.e. when for instance a syscall entry is traced with one of
PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
Basically, this is done by remembering which one of them was used even after
the call to ptrace_notify().
* We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
to make do_syscall_trace() notice that the current syscall was started with
SYSEMU on entry, so that no notification ought to be done in the exit path;
this is a bit of a hack, so this problem is solved in another way in next
patches.
* Also, the effects of the patch:
"Ptrace - i386: fix Syscall Audit interaction with singlestep"
are cancelled; they are restored back in the last patch of this series.
Detailed descriptions of the patches doing this kind of processing follow (but
I've already summed everything up).
* Fix behaviour when changing interception kind #1.
In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
only after doing the debugger notification; but the debugger might have
changed the status of this flag because he continued execution with
PTRACE_SYSCALL, so this is wrong. This patch fixes it by saving the flag
status before calling ptrace_notify().
* Fix behaviour when changing interception kind #2:
avoid intercepting syscall on return when using SYSCALL again.
A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
crashes.
The problem is in arch/i386/kernel/entry.S. The current SYSEMU patch
inhibits the syscall-handler to be called, but does not prevent
do_syscall_trace() to be called after this for syscall completion
interception.
The appended patch fixes this. It reuses the flag TIF_SYSCALL_EMU to
remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
the flag is unused in the depicted situation.
* Fix behaviour when changing interception kind #3:
avoid intercepting syscall on return when using SINGLESTEP.
When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
problems with singlestepping on UML in SKAS with SYSEMU. It looped
receiving SIGTRAPs without moving forward. EIP of the traced process was
the same for all SIGTRAPs.
What's missing is to handle switching from PTRACE_SYSCALL_EMU to
PTRACE_SINGLESTEP in a way very similar to what is done for the change from
PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
notified and then wake ups the process; the syscall is executed (or skipped,
when do_syscall_trace() returns 0, i.e. when using PTRACE_SYSEMU), and
do_syscall_trace() is called again. Since we are on the return path of a
SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
we must still avoid notifying the parent of the syscall exit. Now, this
behaviour is extended even to resuming with PTRACE_SINGLESTEP.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the builtin functions for memset/memclr/memcpy, special optimizations for
page operations have dedicated functions now. Uninline memmove/memchr and
move all functions into a single file and clean it up a little.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move a few cache functions into its own file and fix flush_icache_range() so
it can handle both kernel and user addresses correctly (assuming context is
set correctly).
Turn copy_to_user_page/copy_from_user_page into inline functions and add a
missing cache flush.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The timers lack .suspend/.resume methods. Because of this, jiffies got a
big compensation after a S3 resume. And then softlockup watchdog reports
an oops. This occured with HPET enabled, but it's also possible for other
timers.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds type-checking to pm_message_t, so that people can't confuse it
with int or u32. It also allows us to fix "disk yoyo" during suspend (disk
spinning down/up/down).
[We've tried that before; since that cpufreq problems were fixed and I've
tried make allyes config and fixed resulting damage.]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
for_each_cpu walks through all processors in cpu_possible_map, which is
defined as cpu_callout_map on i386 and isn't initialised until all
processors have been booted. This breaks things which do for_each_cpu
iterations early during boot. So, define cpu_possible_map as a bitmap with
NR_CPUS bits populated. This was triggered by a patch i'm working on which
does alloc_percpu before bringing up secondary processors.
From: Alexander Nyberg <alexn@telia.com>
i386-boottime-for_each_cpu-broken.patch
i386-boottime-for_each_cpu-broken-fix.patch
The SMP version of __alloc_percpu checks the cpu_possible_map before
allocating memory for a certain cpu. With the above patches the BSP cpuid
is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
machines (as soon as someone tries to dereference something allocated via
__alloc_percpu, which in fact is never allocated since the cpu is not set
in cpu_possible_map).
Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a clone operation for pgd updates.
This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them. This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.
The new function is:
clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
dst - pointer to pgd range anwhere on a pgd page
src - ""
count - the number of pgds to copy.
dst and src can be on the same page, but the range must not overlap
and must not cross a page boundary.
Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend. CC'ing Pavel in case
he has any feedback on this.
Thanks to Chris Wright for noticing that this could be more optimal in
PAE compiles by eliminating the memset.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a notify to the die_nmi notify that the system is about to
be taken down. If the notify is handled with a NOTIFY_STOP return, the
system is given a new lease on life.
We also change the nmi watchdog to carry on if die_nmi returns.
This give debug code a chance to a) catch watchdog timeouts and b) possibly
allow the system to continue, realizing that the time out may be due to
debugger activities such as single stepping which is usually done with
"other" cpus held.
Signed-off-by: George Anzinger<george@mvista.com>
Cc: Keith Owens <kaos@ocs.com.au>
Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pushf/popf in switch_to are ONLY used to switch IOPL. Making this
explicit in C code is more clear. This pushf/popf pair was added as a
bugfix for leaking IOPL to unprivileged processes when using
sysenter/sysexit based system calls (sysexit does not restore flags).
When requesting an IOPL change in sys_iopl(), it is just as easy to change
the current flags and the flags in the stack image (in case an IRET is
required), but there is no reason to force an IRET if we came in from the
SYSENTER path.
This change is the minimal solution for supporting a paravirtualized Linux
kernel that allows user processes to run with I/O privilege. Other
solutions require radical rewrites of part of the low level fault / system
call handling code, or do not fully support sysenter based system calls.
Unfortunately, this added one field to the thread_struct. But as a bonus,
on P4, the fastest time measured for switch_to() went from 312 to 260
cycles, a win of about 17% in the fast case through this performance
critical path.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Privilege checking cleanup. Originally, these diffs were much greater, but
recent cleanups in Linux have already done much of the cleanup. I added
some explanatory comments in places where the reasoning behind certain
tests is rather subtle.
Also, in traps.c, we can skip the user_mode check in handle_BUG(). The
reason is, there are only two call chains - one via die_if_kernel() and one
via do_page_fault(), both entering from die(). Both of these paths already
ensure that a kernel mode failure has happened. Also, the original check
here, if (user_mode(regs)) was insufficient anyways, since it would not
rule out BUG faults from V8086 mode execution.
Saving the %ss segment in show_regs() rather than assuming a fixed value
also gives better information about the current kernel state in the
register dump.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some more assembler cleanups I noticed along the way.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Noticed by Chuck Ebbert: the .ldt entry of the TSS was set up incorrectly.
It never mattered since this was a leftover from old times, so remove it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also, setting PDPEs in PAE mode does not require atomic operations, since the
PDPEs are cached by the processor, and only reloaded on an explicit or
implicit reload of CR3.
Since the four PDPEs must always be present in an active root, and the kernel
PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using
task gates (which reload CR3). Actually, much of this is moot, since the user
PDPEs are never updated either, and the only usage of task gates is by the
doublefault handler. It appears the only place PGDs get updated in PAE mode
is in init_low_mappings() / zap_low_mapping() for initial page table creation
and recovery from ACPI sleep state, and these sites are safe by inspection.
Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on
P4.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC can generate better code around descriptor update and access functions
when there is not an explicit "eax" register constraint.
Testing: You won't boot if this is messed up, since the TSS descriptor will be
corrupted. Verified the assembler and booted.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 inline assembler cleanup.
This change encapsulates descriptor and task register management. Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 arch cleanup. Introduce the serialize macro to serialize processor
state. Why the microcode update needs it I am not quite sure, since wrmsr()
is already a serializing instruction, but it is a microcode update, so I will
keep the semantic the same, since this could be a timing workaround. As far
as I can tell, this has always been there since the original microcode update
source.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 Inline asm cleanup. Use cr/dr accessor functions.
Also, a potential bugfix. Also, some CR accessors really should be volatile.
Reads from CR0 (numeric state may change in an exception handler), writes to
CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
was not there to begin with, and in no case should kernel memory be clobbered,
except when doing a TLB flush, which already has memory clobber.
I noticed that page invalidation does not have a memory clobber. I can't find
a bug as a result, but there is definitely a potential for a bug here:
#define __flush_tlb_single(addr) \
__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is subarch update for ES7000. I've modified platform check code and
removed unnecessary OEM table parsing for newer systems that don't use OEM
information during boot. Parsing the table in fact is causing problems,
and the platform doesn't get recognized. The patch only affects the ES7000
subach.
Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 generic subarchitecture requires explicit dmi strings or command line
to enable bigsmp mode. The patch below removes that restriction, and uses
bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
xAPIC support.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The memory descriptors that comprise the EFI memory map are not fixed in
stone such that the size could change in the future. This uses the memory
descriptor size obtained from EFI to iterate over the memory map entries
during boot. This enables the removal of an x86 specific pad (and ifdef)
in the EFI header. I also couldn't stomach the broken up nature of the
function to put EFI runtime calls into virtual mode any longer so I fixed
that up a bit as well.
For reference, this patch only impacts x86.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch has fixed the following warnings.
arch/mips/kernel/genex.S:250:5: warning: "CONFIG_64BIT" is not defined
arch/mips/math-emu/cp1emu.c:1128:5: warning: "__mips64" is not defined
arch/mips/math-emu/cp1emu.c:1206:5: warning: "__mips64" is not defined
arch/mips/math-emu/cp1emu.c:1270:5: warning: "__mips64" is not defined
arch/mips/math-emu/cp1emu.c:323:5: warning: "__mips64" is not defined
arch/mips/math-emu/cp1emu.c:808:5: warning: "__mips64" is not defined
arch/mips/math-emu/cp1emu.c:953:5: warning: "__mips64" is not defined
arch/mips/mm/tlbex.c:519:5: warning: "CONFIG_64BIT" is not defined
include/asm/reg.h:73:5: warning: "CONFIG_64BIT" is not defined
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the MIPS coherency configuration such that we always keep the mapping
state in <asm/pci.h> when we need to on non-coherent platforms.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- MIPS Denmark does no longer exist; the PCI vendor ID is now owned by
MIPS Technologies.
- Add ID for SOC-it, MIPS's system controller.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the virtual MIPS system that is emulated by Qemu. See
http://www.linux-mips.org/wiki/Qemu for a detailed current status.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the one file which managed to survive the removel of HP Laserjet
support.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There seem to be no more users or interest in the NEC Osprey evaluation
system for the NEC VR4181 SOC which is an old part anyway, so remove the
code. More information on the Osprey can be found at
http://www.linux-mips.org/wiki/Osprey.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch has updated IRQ handling for vr41xx.
o added common IRQ dispatch
o changed IRQ number in int-handler.S
o added resource management to icu.c
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to indicate to the hypervisor that it needs to save our VMX
registers when switching partitions on a shared-processor system, just as
it needs to for FP and PMC registers.
This could be made to be on-demand when VMX is used, but we don't do that
for FP nor PMC right now either so let's not overcomplicate things.
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: <engebret@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updates and enhancement to the ppc32 mv64x60 code:
- move code to get mem size from mem ctlr to bootwrapper
- address some errata in the mv64360 pic code
- some minor cleanups
- export one of the bridge's regs via sysfs so user daemon can watch for
extraction events
Signed-off-by: Mark A. Greer <mgreer@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add declaration and cacheable_memcpy(). I'll be needing this function in
new 4xx EMAC driver I'm going to submit to netdev soon.
IMHO, the better place for the declaration would be asm-powerpc/string.h,
unfortunately, ppc64 doesn't have this function, so asm-ppc/system.h is the
next best place.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add dcr_base field to ocp_func_mal_data. This is preparation step for the
new EMAC driver.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move 4xx PHY_MODE_XXX defines to asm-ppc/ibm_ocp.h. This is a preparation
step for the new EMAC driver.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While ppc32 has the CONFIG_HZ Kconfig option, it wasnt actually being used.
Connect it up and set all platforms to 250Hz. This pretty much mimics the
ppc64 patch from Anton Blanchard.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the ability to identify an SOC by a name and id. There are cases in
which the integer identifier is not sufficient to specify a specific SOC.
In these cases we can use a string to further qualify the match.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
flush_dcache_icache_page() will be called on an instruction page fault. We
can't sleep in the fault handler, so use kmap_atomic() instead of just
kmap() for the Book-E case.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Renamed global variables used to convey if the watchdog is enabled and
periodicity of the timer and moved the declarations into a header for these
variables
Signed-off-by: Matt McClintock <msm@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a field to struct ocp_func_emac_data that allows
platform-specific unsupported PHY features to be passed in to the ibm_emac
ethernet driver.
This patch also adds some logic for the Bamboo eval board to populate this
field based on the dip switches on the board. This is a workaround for the
improperly biased RJ-45 sockets on the Rev. 0 Bamboo.
Signed-off-by: Wade Farnsworth <wfarnsworth@mvista.com>
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Added ppc_sys device and system definitions for PowerQUICC II devices.
This will allow drivers for PQ2 to be proper platform device drivers.
Which can be shared on PQ3 processors with the same peripherals.
Signed-off-by: Matt McClintock <msm@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GFP flags must be passed as unisgned int __nocast these days, else we'll
get tons of sparse warnings in every driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the SPD823TS board is no longer maintained and thus being removed
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the REDWOOD board is no longer maintained and thus being removed
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the OAK board is no longer maintained and thus being removed
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the MCPN765 board is no longer maintained and thus being removed
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the ASH board is no longer maintained and thus being removed
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add page_state info to the per-node meminfo file in sysfs. This is mostly
just for informational purposes.
The lack of this information was brought up recently during a discussion
regarding pagecache clearing, and I put this patch together to test out one
of the suggestions.
It seems like interesting info to have, so I'm submitting the patch.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Any architecture that has hardware updated A/D bits that require
synchronization against other processors during PTE operations can benefit
from doing non-atomic PTE updates during address space destruction.
Originally done on i386, now ported to x86_64.
Doing a read/write pair instead of an xchg() operation saves the implicit
lock, which turns out to be a big win on 32-bit (esp w PAE).
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.
Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.
pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.
This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is used only in slab.c and each architecture gets to define whcih
underlying type is to be used.
Seems a bit silly - move it to slab.c and use the same type for all
architectures: unsigned int.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I don't think we need to call hugetlb_clean_stale_pgtable() anymore
in 2.6.13 because of the rework with free_pgtables(). It now collect
all the pte page at the time of munmap. It used to only collect page
table pages when entire one pgd can be freed and left with staled pte
pages. Not anymore with 2.6.13. This function will never be called
and We should turn it into a BUG_ON.
I also spotted two problems here, not Adam's fault :-)
(1) in huge_pte_alloc(), it looks like a bug to me that pud is not
checked before calling pmd_alloc()
(2) in hugetlb_clean_stale_pgtable(), it also missed a call to
pmd_free_tlb. I think a tlb flush is required to flush the mapping
for the page table itself when we clear out the pmd pointing to a
pte page. However, since hugetlb_clean_stale_pgtable() is never
called, so it won't trigger the bug.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
patch later in the series. Instead of repeating (_PAGE_PRESENT |
_PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Version 6 of the ARM architecture introduces the concept of 16MB pages
(supersections) and 36-bit (40-bit actually, but nobody uses this) physical
addresses. 36-bit addressed memory and I/O and ARMv6 can only be mapped
using supersections and the requirement on these is that both virtual and
physical addresses be 16MB aligned. In trying to add support for ioremap()
of 36-bit I/O, we run into the issue that get_vm_area() allows for a
maximum of 512K alignment via the IOREMAP_MAX_ORDER constant. To work
around this, we can:
- Allocate a larger VM area than needed (size + (1ul << IOREMAP_MAX_ORDER))
and then align the pointer ourselves, but this ends up with 512K of
wasted VM per ioremap().
- Provide a new __get_vm_area_aligned() API and make __get_vm_area() sit
on top of this. I did this and it works but I don't like the idea
adding another VM API just for this one case.
- My preferred solution which is to allow the architecture to override
the IOREMAP_MAX_ORDER constant with it's own version.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is
set just for non-linear PTE's. Correct the comment for i386, x86_64, UML.
Also clearify _PAGE_NONE.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a capability check to sys_set_zone_reclaim(). This syscall is not
something that should be available to a user.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This bitop does not need to be atomic because it is performed when there will
be no references to the page (ie. the page is being freed).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch was recently discussed on linux-mm:
http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2
I inherited a large code base from Ray for page migration. There was a
small patch in there that I find to be very useful since it allows the
display of the locality of the pages in use by a process. I reworked that
patch and came up with a /proc/<pid>/numa_maps that gives more information
about the vma's of a process. numa_maps is indexes by the start address
found in /proc/<pid>/maps. F.e. with this patch you can see the page use
of the "getty" process:
margin:/proc/12008 # cat maps
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-200000000002c000 r-xp 00000000 08:04 516 /lib/ld-2.3.3.so
2000000000038000-2000000000040000 rw-p 00028000 08:04 516 /lib/ld-2.3.3.so
2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000260000-2000000000268000 ---p 00208000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923 /usr/lib/locale/en_US.utf8/LC_CTYPE
2000000000300000-2000000000308000 r--s 00000000 08:04 60071467 /usr/lib/gconv/gconv-modules.cache
2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399 /sbin/mingetty
6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399 /sbin/mingetty
6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0 [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0 [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0 [vdso]
cat numa_maps
2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
getty uses ld.so. The first vma is the code segment which is used by 43
other processes and the pages are evenly distributed over the 4 nodes.
The second vma is the process specific data portion for ld.so. This is
only one page.
The display format is:
<startaddress> Links to information in /proc/<pid>/map
<memory policy> This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
MaxRef= <maximum reference to a page in this vma>
Pages= <Nr of pages in use>
Mapped= <Nr of pages with mapcount >
Anon= <nr of anonymous pages>
Nx= <Nr of pages on Node x>
The content of the proc-file is self-evident. If this would be tied into
the sparsemem system then the contents of this file would not be too
useful.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The idea of a swap_device_lock per device, and a swap_list_lock over them all,
is appealing; but in practice almost every holder of swap_device_lock must
already hold swap_list_lock, which defeats the purpose of the split.
The only exceptions have been swap_duplicate, valid_swaphandles and an
untrodden path in try_to_unuse (plus a few places added in this series).
valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
demand attention. However, with the hold time in get_swap_pages so much
reduced, I've not yet found a load and set of swap device priorities to show
even swap_duplicate benefitting from the split. Certainly the split is mere
overhead in the common case of a single swap device.
So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
(generally we seem to prefer an _ in the name, and not hide in a macro).
If someone can show a regression in swap_duplicate, then probably we should
add a hashlock for the swap_map entries alone (shorts being anatomic), so as
to help the case of the single swap device too.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_swap_page has often shown up on latency traces, doing lengthy scans while
holding two spinlocks. swap_list_lock is already dropped, now scan_swap_map
drop swap_device_lock before scanning the swap_map.
While scanning for an empty cluster, don't worry that racing tasks may
allocate what was free and free what was allocated; but when allocating an
entry, check it's still free after retaking the lock. Avoid dropping the lock
in the expected common path. No barriers beyond the locks, just let the
cookie crumble; highest_bit limit is volatile, but benign.
Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
fall - just use schedule_timeout, we don't want to burden scan_swap_map
itself, and it's very unlikely that anyone can really still be in
scan_swap_map once swapoff gets this far.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The swap header's unsigned int last_page determines the range of swap pages,
but swap_info has been using int or unsigned long in some cases: use unsigned
int throughout (except, in several places a local unsigned long is useful to
avoid overflows when adding).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The "Adding %dk swap" message shows the number of swap extents, as a guide to
how fragmented the swapfile may be. But a useful further guide is what total
extent they span across (sometimes scarily large).
And there's no need to keep nr_extents in swap_info: it's unused after the
initial message, so save a little space by keeping it on stack.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are several comments that swap's extent_list.prev points to the lowest
extent: that's not so, it's extent_list.next which points to it, as you'd
expect. And a couple of loops in add_swap_extent which go all the way through
the list, when they should just add to the other end.
Fix those up, and let map_swap_page search the list forwards: profiles shows
it to be twice as quick that way - because prefetch works better on how the
structs are typically kmalloc'ed? or because usually more is written to than
read from swap, and swap is allocated ascendingly?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Someone mentioned that almost all the architectures used basically the same
implementation of get_order. This patch consolidates them into
asm-generic/page.h and includes that in the appropriate places. The
exceptions are ia64 and ppc which have their own (presumably optimised)
versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This splits up sparse_index_alloc() into two pieces. This is needed
because we'll allocate the memory for the second level in a different place
from where we actually consume it to keep the allocation from happening
underneath a lock
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With cleanups from Dave Hansen <haveblue@us.ibm.com>
SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
mem_sections. This two level layout scheme is able to achieve smaller
memory requirements for SPARSEMEM with the tradeoff of an additional shift
and load when fetching the memory section. The current SPARSEMEM
implementation is a one dimensional array of mem_sections which is the
default SPARSEMEM configuration. The patch attempts isolates the
implementation details of the physical layout of the sparsemem section
array.
SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
memory_present() calls. This is not always feasible, so architectures
which do not need it may allocate everything statically by using
SPARSEMEM_STATIC.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME. Architecture
platforms with a very sparse physical address space would likely want to
select this option. For those architecture platforms that don't select the
option, the code generated is equivalent to SPARSEMEM currently in -mm.
I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.
ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
pointers to mem_sections. This two level layout scheme is able to achieve
smaller memory requirements for SPARSEMEM with the tradeoff of an
additional shift and load when fetching the memory section. The current
SPARSEMEM -mm implementation is a one dimensional array of mem_sections
which is the default SPARSEMEM configuration. The patch attempts isolates
the implementation details of the physical layout of the sparsemem section
array.
ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.
I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
and tested with aim.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend mapping of the consumer usage page in hid-input.c to handle
more cases appearing on new USB keyboards.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This is part of Thomas Gleixner's generic IRQ patch, which converts
ARM to use the generic IRQ subsystem. Here, we wrap calls to
desc->handler() in an inline function, desc_handle_irq(). This
reduces the size of Thomas' patch since the changes become more
localised.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is part of Thomas Gleixner's generic IRQ patch, which converts
ARM to use the generic IRQ subsystem. Here, we rename two of the
irq_chip methods - wake becomes set_wake, and type becomes set_type.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Adds a new ios for setting the chip select pin on MMC cards. Needed on
SD controllers which use this pin for other things and therefore cannot
have it pulled high at all times.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fixed a problem with the internal Owner ID allocation and
deallocation mechanisms for control method execution and
recursive method invocation. This should eliminate the
OWNER_ID_LIMIT exceptions and "Invalid OwnerId" messages
seen on some systems. Recursive method invocation depth
is currently limited to 255. (Alexey Starikovskiy)
http://bugzilla.kernel.org/show_bug.cgi?id=4892
Completely eliminated all vestiges of support for the
"module-level executable code" until this support is
fully implemented and debugged. This should eliminate the
NO_RETURN_VALUE exceptions seen during table load on some
systems that invoke this support.
http://bugzilla.kernel.org/show_bug.cgi?id=5162
Fixed a problem within the resource manager code where
the transaction flags for a 64-bit address descriptor were
handled incorrectly in the type-specific flag byte.
Consolidated duplicate code within the address descriptor
resource manager code, reducing overall subsystem code size.
Signed-off-by: Robert Moore <Robert.Moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in these two files.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Need pfn_valid macro, even on MMUless platforms.
Enclose the macro args of __pa and __va in parentheses.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All we need to do is resegment the queue so that
we record SACK information accurately. The edges
of the SACK blocks guide our resegmenting decisions.
With help from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
I've finally found a potential cause of the sk_forward_alloc underflows
that people have been reporting sporadically.
When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
check sk_forward_alloc even though a large amount of time may have
elapsed since we allocated the page. In the mean time someone could've
come along and liberated packets and reclaimed sk_forward_alloc memory.
This patch makes tcp_sendmsg check sk_forward_alloc every time as we
do in do_tcp_sendpages.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces sk_stream_wmem_schedule as a short-hand for
the sk_forward_alloc checking on egress.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The crypto layer currently uses in_atomic() to determine whether it is
allowed to sleep. This is incorrect since spin locks don't always cause
in_atomic() to return true.
Instead of that, this patch returns to an earlier idea of a per-tfm flag
which determines whether sleeping is allowed. Unlike the earlier version,
the default is to not allow sleeping. This ensures that no existing code
can break.
As usual, this flag may either be set through crypto_alloc_tfm(), or
just before a specific crypto operation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tun/tap only supports the EN10MB ARP type. For use with
wireless and other networking types it should be possible to set the
ARP type via an ioctl.
Patch v2: Included check that the tap interface is down before changing the
link type out from underneath it
Signed-off-by: Mike Kershaw <dragorn@kismetwireless.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Nicolas Pitre
The prototype for sys_fadvise64_64() is:
long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
The argument list is therefore as follows on legacy ABI:
fd: type int (r0)
offset: type long long (r1-r2)
len: type long long (r3-sp[0])
advice: type int (sp[4])
With EABI this becomes:
fd: type int (r0)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
advice: type int (sp[8])
Not only do we have ABI differences here, but the EABI version requires
one additional word on the syscall stack.
To avoid the ABI mismatch and the extra stack space required with EABI
this syscall is now defined with a different argument ordering
on ARM as follows:
long sys_arm_fadvise64_64(int fd, int advice, loff_t offset, loff_t len)
This gives us the following ABI independent argument distribution:
fd: type int (r0)
advice: type int (r1)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
Now, since the syscall entry code takes care of 5 registers only by
default including the store of r4 to the stack, we need a wrapper to
store r5 to the stack as well. Because that wrapper was missing and was
always required this means that sys_fadvise64_64 never worked on ARM and
therefore we can safely reuse its syscall number for our new
sys_arm_fadvise64_64 interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This kills warnings when building drivers/ide/ide-iops.c
and puts us in-line with what other platforms do here.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Sascha Hauer
This patch adds support for setting and getting RTS / CTS via
set_mtctrl / get_mctrl functions.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from David Vrabel
Correct the ioread* and iowrite* functions. In particular, add an offset to the cookie in ioport_map so we can map I/O port ranges starting from 0 (0 is for reporting errors).
Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
- notifying the PROM of specific features that are supported by the OS.
This is used to enable PROM feature if and only if the corresponding
feature is implemented in the OS
- fetch feature sets that are supported by the current PROM. This allows
the OS to selectively enable features when the PROM support is available.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The start_tx and stop_tx methods were passed a flag to indicate
whether the start/stop was from the tty start/stop callbacks, and
some drivers used this flag to decide whether to ask the UART to
immediately stop transmission (where the UART supports such a
feature.)
There are other cases when we wish this to occur - when CTS is
lowered, or if we change from soft to hard flow control and CTS
is inactive. In these cases, this flag was false, and we would
allow the transmitter to drain before stopping.
There is really only one case where we want to let the transmitter
drain before disabling, and that's when we run out of characters
to send.
Hence, re-jig the start_tx and stop_tx methods to eliminate this
flag, and introduce new functions for the special "disable and
allow transmitter to drain" case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The idea behind a RAID class is to provide a uniform interface to all
RAID subsystems (both hardware and software) in the kernel.
To do that, I've made this class a transport class that's entirely
subsystem independent (although the matching routines have to match per
subsystem, as you'll see looking at the code). I put it in the scsi
subdirectory purely because I needed somewhere to play with it, but it's
not a scsi specific module.
I used a fusion raid card as the test bed for this; with that kind of
card, this is the type of class output you get:
jejb@titanic> ls -l /sys/class/raid_devices/20\:0\:0\:0/
total 0
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-0 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:0/20:1:0:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-1 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:1/20:1:1:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 device -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:0:0/20:0:0:0/
-r--r--r-- 1 root root 16384 Aug 16 17:21 level
-r--r--r-- 1 root root 16384 Aug 16 17:21 resync
-r--r--r-- 1 root root 16384 Aug 16 17:21 state
So it's really simple: for a SCSI device representing a hardware raid,
it shows the raid level, the array state, the resync % complete (if the
state is resyncing) and the underlying components of the RAID (these are
exposed in fusion on the virtual channel 1).
As you can see, this type of information can be exported by almost
anything, including software raid.
The more difficult trick, of course, is going to be getting it to
perform configuration type actions with writable attributes.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
One of the changes in the attribute_container code in the scsi-misc tree
was to add a lock to protect the list of devices per container. This,
unfortunately, leads to potential scheduling while atomic problems if
there's a sleep in the function called by a trigger.
The correct solution is to use the kernel klist infrastructure instead
which allows lockless traversal of a list.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Peter Staubach pointed out that it is not correct to check
current->personality & PER_LINUX32 (this will have false
hits on several other personality values).
Signed-off-by: Tony Luck <tony.luck@intel.com>
ATAPI is getting close to being ready. To increase exposure, we enable
the code in the upstream kernel, but default it to off (present
behavior). Users must pass atapi_enabled=1 as a module option (if
module) or on the kernel command line (if built in) to turn on
discovery of their ATAPI devices.
Add register_sound_special_device() function to allow assignment of
device pointer to a specific OSS device for HAL.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
PCM Midlevel,ALSA<-OSS emulation,USB USX2Y
This patch removes open_flag from struct _snd_pcm_substream.
All of its uses are substituted by querying struct _snd_pcm_substream's
member ffile instead.
Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
YMFPCI driver
Implements mixer controls for the volume of each playback substream of
the main PCM device.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Documentation,AD1816A driver
Added clockfreq module option for the card with a different clock frequency
than 33kHz.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
AC97 Codec,PCI drivers
I've made the review changes and as requested I've pasted the RFC by
Nicolas below:-
'I would like to know what people think of the following patch. It
allows for a codec on an AC97 bus to be shared with other drivers which
are completely unrelated to audio. It registers a new bus type, and
whenever a codec instance is created then a device for it is also
registered with the driver model using that bus type. This allows, for
example, to use the extra features of the UCB1400 like the touchscreen
interface and the additional GPIOs and ADCs available on that chip for
battery monitoring. I have a working UCB1400 touchscreen driver here
that simply registers with the driver model happily working alongside
with audio features using this.'
Changes over RFC:-
o Now matches codec name within codec group.
o Added ac97_dev_release() to stop kernel complaining about no release
method for device.
o Added 'config SND_AC97_BUS' to sound/pci/Kconfig and moved 'config
SND_AC97_CODEC' out with the PCI=n statement.
o module is now called snd-ac97-bus
Signed-off-by: Liam Girdwood <liam.girdwood@wolfsonmicro.com>
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Documentation,CS46xx driver,EMU10K1/EMU10K2 driver,AD1848 driver
SB16/AWE driver,CMIPCI driver,ENS1370/1+ driver,RME32 driver
RME96 driver,ICE1712 driver,ICE1724 driver,KORG1212 driver
RME HDSP driver,RME9652 driver
This patch changes .iface to SNDRV_CTL_ELEM_IFACE_MIXER whre _PCM or
_HWDEP was used in controls that are not associated with a specific PCM
(sub)stream or hwdep device, and changes some controls that got
inconsitent .iface values due to copy+paste errors. Furthermore, it
makes sure that all control that do use _PCM or _HWDEP use the correct
number in the .device field.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
AC97 Codec
o Enhanced current WM97xx support to provide additional controls and
use the kcontrol suffix naming convention.
o Added AC97_HAS_NO_MIC, AC97_HAS_NO_TONE and AC97_HAS_NO_STD_PCM.
o Cleaned up WM97xx related comments.
o Removed some wm9713 double mono controls and replaced with stereo
controls.
Signed-off-by: Liam Girdwood <liam.girdwood@wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix build problem found by compiling driver with DEBUG defined that used tcp.h.
Since pr_debug(arg) expands to printk("<7>" arg) the argument
needs to be string that can be concatenated.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can put the __softirq_pending mask in the cpudata,
no need for the silly NR_CPUS array in kernel/softirq.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Implemented a full bytewise compare to determine if a table load
request is attempting to load a duplicate table. The compare is
performed if the table signatures and table lengths match. This
will allow different tables with the same OEM Table ID and
revision to be loaded.
Although the BIOS is technically violating the ACPI spec when
this happens -- it does happen -- so Linux must handle it.
Signed-off-by: Robert Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
While ppc64 has the CONFIG_HZ Kconfig option, it wasnt actually being
used. Connect it up and set all platforms to 250Hz.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Here's the 970MP's PVR (processor version register) entry for oprofile.
Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
They differed in either simple comments or in the protecting ifdefs.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move the identical files from include/asm-ppc{,64}/ to
include/asm-powerpc/. Remove hdreg.h completely as it is unused in
the tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ppc and ppc64 trees are hopefully going to merge over time, so this
patch begins the process by creating a place for the merging of the
header files.
Create include/asm-powerpc (and move linkage.h into it from
asm-{ppc,ppc64} since we don't like empty directories). Modify the
ppc and ppc64 Makefiles to cope.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Create vio_bus_ops so that we just pass a structure to vio_bus_init
instead of three separate function pointers.
Rearrange vio.h to avoid forward references. vio.h only needs
struct device_node from prom.h so remove the include and just
declare it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Take some assignments out of vio_register_device_common and
rename it to vio_register_device.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With CONFIG_HUGETLB_PAGE=n:
In file included from kernel/sysctl.c:37:
include/linux/hugetlb.h:104:1: warning: "hugetlb_free_pgd_range" redefined
In file included from include/linux/mm.h:36,
from kernel/sysctl.c:23:
include/asm/pgtable.h:492:1: warning: this is the location of the previous definition
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
So that applications can set dccp_sock->dccps_pkt_size, that in turn
is used in the CCID3 half connection init routines to set
ccid3hc[tr]x_s and use it in its rate calculations.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some shub2 changes were not in the tree when Greg cleaned up the sn2
region definitions in 1b66776da7, so this
one didn't get fixed.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This target allows users to modify the hoplimit header field of the
IPv6 header.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new iptables target allows manipulation of the TTL of an IPv4 packet.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rest of endian warnings now belongs to tr.c exclusively.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* RCU versions of hlist_***_rcu
* fib_alias partial rcu port just whats needed now.
Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally written by Henrik Nordstrom <hno@marasystems.com>, taken
from netfilter patch-o-matic and added ip6_tables support.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally written by Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>,
taken from netfilter patch-o-matic and fixed up to work with current
kernels.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new field to net device to hold the permanent
hardware address, and adds a new generic ethtool_op function to
get that address.
Signed-off-by: Jon Wetzel <jon_wetzel@dell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes timestamp, timestamp echo, and elapsed time to use units of 10
usecs as per DCCP spec. This has been tested to verify that times are correct.
Also fixed up length and used hton/ntoh more.
Still to add in later patches:
- actually use elapsed time to adjust RTT
(commented out as was prior to this patch)
- send options at times more closely following the spec
(content is now correct)
Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocols that make extensive use of SKB cloning,
for example TCP, eat at least 2 allocations per
packet sent as a result.
To cut the kmalloc() count in half, we implement
a pre-allocation scheme wherein we allocate
2 sk_buff objects in advance, then use a simple
reference count to free up the memory at the
correct time.
Based upon an initial patch by Thomas Graf and
suggestions from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
This variant is needed to satisfy sparse __user annotations.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Of this type, mostly:
CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETLINK_ADD_MEMBERSHIP/NETLINK_DROP_MEMBERSHIP are used to join/leave
groups, NETLINK_PKTINFO is used to enable nl_pktinfo control messages
for received packets to get the extended destination group number.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the group number allows increasing the number of groups without
beeing limited by the size of the bitmask. It introduces one limitation
for netlink users: messages can't be broadcasted to multiple groups anymore,
however this feature was never used inside the kernel.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ads a new "connbytes" match that utilizes the CONFIG_NF_CT_ACCT
per-connection byte and packet counters. Using it you can do things like
packet classification on average packet size within a connection.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As proposed by Andi Kleen, this is required esp. for x86_64 architecture,
where 64bit code needs 8byte aligned 64bit data types, but 32bit userspace
apps will only align to 4bytes.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this the previous setup is back, i.e. tcp_diag can be built as a module,
as dccp_diag and both share the infrastructure available in inet_diag.
If one selects CONFIG_INET_DIAG as module CONFIG_INET_TCP_DIAG will also be
built as a module, as will CONFIG_INET_DCCP_DIAG, if CONFIG_IP_DCCP was
selected static or as a module, if CONFIG_INET_DIAG is y, being statically
linked CONFIG_INET_TCP_DIAG will follow suit and CONFIG_INET_DCCP_DIAG will be
built in the same manner as CONFIG_IP_DCCP.
Now to aim at UDP, converting it to use inet_hashinfo, so that we can use
iproute2 for UDP sockets as well.
Ah, just to show an example of this new infrastructure working for DCCP :-)
[root@qemu ~]# ./ss -dane
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 0 *:5001 *:* ino:942 sk:cfd503a0
ESTAB 0 0 127.0.0.1:5001 127.0.0.1:32770 ino:943 sk:cfd50a60
ESTAB 0 0 127.0.0.1:32770 127.0.0.1:5001 ino:947 sk:cfd50700
TIME-WAIT 0 0 127.0.0.1:32769 127.0.0.1:5001 timer:(timewait,3.430ms,0) ino:0 sk:cf209620
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next changeset will introduce net/ipv4/tcp_diag.c, moving the code that was put
transitioanlly in inet_diag.c.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next changeset will rename tcp_diag.[ch] to inet_diag.[ch].
I'm taking this longer route so as to easy review, making clear the changes
made all along the way.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next changeset will rename tcp_diag to inet_diag and move the tcp_diag code out
of it and into a new tcp_diag.c, similar to the net/dccp/diag.c introduced in
this changeset, completing the transition to a generic inet_diag
infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.
Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the same way as was done with the v4 counterparts, this will be moved
to inet6_hashtables.c.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix gcc-3.4.x warning about iplicit operator precedence in NF_QUEUE_NR()
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
# grep -r 'netif_carrier_o[nf]' linux-2.6.12 | wc -l
246
# size vmlinux.org vmlinux.carrier
text data bss dec hex filename
4339634 1054414 259296 5653344 564360 vmlinux.org
4337710 1054414 259296 5651420 563bdc vmlinux.carrier
And this ain't an allyesconfig kernel!
Signed-off-by: David S. Miller <davem@davemloft.net>
I obviously wanted to use bitwise-or, not logical or.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also changes the list_for_each_entry_safe_continue behaviour to match its
kerneldoc comment, that is, to start after the pos passed.
Also adds several helper functions from previously open coded fragments, making
the code more clear.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
With ugly ifdefs, etc, but this actually:
1. keeps the existing ABI, i.e. no need to recompile the iproute2
utilities if not interested in DCCP.
2. Provides all the tcp_diag functionality in DCCP, with just a
small patch that makes iproute2 support DCCP.
Of course I'll get this cleaned-up in time, but for now I think its
OK to be this way to quickly get this functionality.
iproute2-ss050808 patch at:
http://vger.kernel.org/~acme/iproute2-ss050808.dccp.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
minimal renaming/moving done in this changeset to ease review.
Most of it is just changes of struct tcp_sock * to struct sock * parameters.
With this we move to a state closer to two interesting goals:
1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
for any INET transport protocol that has struct inet_hashinfo and are
derived from struct inet_connection_sock. Keeps the userspace API, that will
just not display DCCP sockets, while newer versions of tools can support
DCCP.
2. INET generic transport pluggable Congestion Avoidance infrastructure, using
the current TCP CA infrastructure with DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using most of the infrastructure TCP uses, with a dccp_death_row,
etc. As per my current interpretation of the draft what we have with
this changeset seems to be all we need (or very close to it 8)).
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also export the ones that will be used in the next changeset, when
DCCP uses this infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That groups all of the tables and variables associated to the TCP timewait
schedulling/recycling/killing code, that now can be isolated from the TCP
specific code and used by other transport protocols, such as DCCP.
Next changeset will move this code to net/ipv4/inet_timewait_sock.c
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the usage of packet type into the SKB control
buffer. After this patch it is now possible to shrink the sk_buff
structure and redefine its pkt_type.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the sparse warnings "implicit cast to nocast type"
for the priority or gfp_mask parameters of the memory allocations.
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the remote port negotiation (RPN) of the RFCOMM
protocol for Bluetooth.
Signed-off-by: J. Suter <jsuter@hardwave.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HCI page scan repetition mode change event contains the actual
page scan repetition mode for the remote device. It is the same
value that is received from an inquiry response and it can be used
to make further reconnections faster.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements a workaround for buggy Bluetooth 1.2 devices from
Silicon Wave. Their inquiry results with RSSI contain the page scan mode
field. This field was removed in the final Bluetooth 1.2 specification.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using this new iptables DCCP protocol header match, it is possible to
create simplistic stateless packet filtering rules for DCCP. It
permits matching of port numbers, packet type and options.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The protocol header files in <linux/foo.h> are usually structured in a
way to be included by userspace code. The top section consists of
general protocol structure definitions, typedefs, enums - followed by
an #ifdef __KERNEL__ section.
Currently <linux/dccp.h> doesn't follow that convention and can
therefore not be used from userspace. However, for example iptables'
libipt_dccp.c actually needs various definitions from there.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check whether pf is too large in order to prevent array overflow.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a /proc/net/netfilter/nf_queue file, similar to the
recently-added /proc/net/netfilter/nf_log. It indicates which queue
handler is registered to which protocol family. This is useful since
there are now multiple queue handlers in the treee (ip[6]_queue,
nfnetlink_queue).
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for passing the real 'physical' device ifindex
down to userspace via nfnetlink_log and nfnetlink_queue.
This feature basically obsoletes net/bridge/netfilter/ebt_ulog.c, and
it is likely ebt_ulog.c will die with one of the next couple of
patches.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Used in the dccp CCID3 code, that is going to be submitted RSN.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also improves reqsk_queue_prune and renames it to
inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock
and inet_request_sock objects, not just with request_sock ones thus
belonging to inet_request_sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Development to this point was done on a subversion repository at:
http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/
This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.
If I ever decide to take this offline I'll provide the full history at
some other suitable place.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code contributed by Stephen Hemminger.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this we're very close to getting all of the current TCP
refactorings in my dccp-2.6 tree merged, next changeset will export
some functions needed by the current DCCP code and then dccp-2.6.git
will be born!
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Completing the previous changeset, this also generalises tcp_v4_synq_add,
renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the
DCCP tree, which I plan to merge RSN.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.
The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of tcp_create_openreq_child, will be used in
dccp_create_openreq_child, and is a nice sock function anyway.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the parts of tcp_time_wait that are not TCP specific, tcp_time_wait uses
it and so will dccp_time_wait.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
And also some TIME_WAIT functions.
[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 282955 13122 9312 305389 4a8ed net/ipv4/built-in.o
/tmp/after.size: 281566 13122 9312 304000 4a380 net/ipv4/built-in.o
[acme@toy net-2.6.14]$
I kept them still inlined, will uninline at some point to see what
would be the performance difference.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This paves the way to generalise the rest of the sock ID lookup
routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
kernels (where IPv6 is always built as a module):
[root@qemu ~]# grep tw_sock /proc/slabinfo
tw_sock_TCPv6 0 0 128 31 1
tw_sock_TCP 0 0 96 41 1
[root@qemu ~]#
Now if a protocol wants to use the TIME_WAIT generic infrastructure it
only has to set the sk_prot->twsk_obj_size field with the size of its
inet_timewait_sock derived sock and proto_register will create
sk_prot->twsk_slab, for now its only for INET sockets, but we can
introduce timewait_sock later if some non INET transport protocolo
wants to use this stuff.
Next changesets will take advantage of this new infrastructure to
generalise even more TCP code.
[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o
/tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o
[acme@toy net-2.6.14]$
Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
(qemu host)).
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[acme@toy net-2.6.14]$ grep built-in /tmp/before /tmp/after
/tmp/before: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o
/tmp/after: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o
Will be used in DCCP, not exporting it right now not to get in Adrian
Bunk's exported-but-not-used-on-modules radar 8)
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It really just makes the existing code be a helper function that
tcp_v4_hash and tcp_unhash uses, specifying the right inet_hashinfo,
tcp_hashinfo.
One thing I'll investigate at some point is to have the inet_hashinfo
pointer in sk_prot, so that we get all the hashtable information from
the sk pointer, this can lead to some extra indirections that may well
hurt performance/code size, we'll see. Ultimate idea would be that
sk_prot would provide _all_ the information about a protocol
implementation.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of places just needs the states, not even linux/tcp.h, where this
enum was, needs it.
This speeds up development of the refactorings as less sources are
rebuilt when things get moved from net/tcp.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also expose all of the tcp_hashinfo members, i.e. killing those
tcp_ehash, etc macros, this will more clearly expose already generic
functions and some that need just a bit of work to become generic, as
we'll see in the upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This required moving tcp_bucket_cachep to inet_hashinfo.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to use nested nfattr structures for ip_conntrack_expect. This is
bogus, since ip_conntrack and ip_conntrack_expect are communicated in
different netlink message types. both should be encoded at the top level
attributes, no extra nesting required. This patch addresses the issue.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, every nfnetlink subsystem had to specify it's
attribute count. However, in reality the attribute count depends on
the message type within the subsystem, not the subsystem itself. This
patch moves 'attr_count' from 'struct nfnetlink_subsys' into
nfnl_callback to fix this.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
refcnt underflow: the reference count is decremented when a conntrack
entry is removed from the hash but it is not incremented when entering
new entries.
missing protection of process context against softirq context: all
cache operations need to locally disable softirqs to avoid races.
Additionally the event cache can't be initialized when a packet
enteres the conntrack code but needs to be initialized whenever we
cache an event and the stored conntrack entry doesn't match the
current one.
incorrect flushing of the event cache in ip_ct_iterate_cleanup:
without real locking we can't flush the cache for different CPUs
without incurring races. The cache for different CPUs can only be
flushed when no packets are going through the
code. ip_ct_iterate_cleanup doesn't need to drop all references, so
flushing is moved to the cleanup path.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should really be in a inet_connection_sock, but I'm leaving it
for a later optimization, when some more fields common to INET
transport protocols now in tcp_sk or inet_sk will be chunked out into
inet_connection_sock, for now its better to concentrate on getting the
changes in the core merged to leave the DCCP tree with only DCCP
specific code.
Next changesets will take advantage of this move to generalise things
like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
the future, when tcp_tw_bucket gets transformed into the struct
timewait_sock hierarchy.
tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
moved to sk_prot.
A cascade of incremental changes will ultimately make the tcp_lookup
functions be fully generic.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to break down the complexity of the series of patches,
making it very clear that this one just does:
1. renames tcp_ prefixed hashtable functions and data structures that
were already mostly generic to inet_ to share it with DCCP and
other INET transport protocols.
2. Removes not used functions (__tb_head & tb_head)
3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
tcp_v4_build_header)
Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
__tcp_put_port, generic and get others like tcp_destroy_sock closer to generic
(tcp_orphan_count will go to sk->sk_prot to allow this).
Eventually most of these functions will be used passing the transport protocol
inet_hashinfo structure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be shared with DCCP (and others), this is the start of a series of patches
that will expose the already generic TCP hash table routines.
The few changes noticed when calling gcc -S before/after on a pentium4 were of
this type:
movl 40(%esp), %edx
cmpl %esi, 472(%edx)
je .L168
- pushl $291
+ pushl $272
pushl $.LC0
pushl $.LC1
pushl $.LC2
[acme@toy net-2.6.14]$ size net/ipv4/tcp_ipv4.before.o net/ipv4/tcp_ipv4.after.o
text data bss dec hex filename
17804 516 140 18460 481c net/ipv4/tcp_ipv4.before.o
17804 516 140 18460 481c net/ipv4/tcp_ipv4.after.o
Holler if some weird architecture has issues with things like this 8)
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a generic (layer3 independent) version of what ipt_ULOG is already
doing for IPv4 today. ipt_ULOG, ebt_ulog and finally also ip[6]t_LOG will
be deprecated by this mechanism in the long term.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is in preparation to nfnetlink_log:
- loggers now have to register struct nf_logger instead of nf_logfn
- nf_log_unregister() replaced by nf_log_unregister_pf() and
nf_log_unregister_logger()
- add comment to ip[6]t_LOG.h to assure nobody redefines flags
- add /proc/net/netfilter/nf_log to tell user which logger is currently
registered for which address family
- if user has configured logging, but no logging backend (logger) is
available, always spit a message to syslog, not just the first time.
- split ip[6]t_LOG.c into two parts:
Backend: Always try to register as logger for the respective address family
Frontend: Always log via nf_log_packet() API
- modify all users of nf_log_packet() to accomodate additional argument
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From tcp_v4_rebuild_header, that already was pretty generic, I only
needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and
establish the requirement that INET transport layer protocols that
want to use this function map TCP_SYN_SENT to its equivalent state.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From tcp_v4_setup_caps, that always is preceded by a call to
__sk_dst_set, so coalesce this sequence into sk_setup_caps, removing
one call to a TCP function in the IP layer.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This operation was already generic and DCCP will use it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take account of whether a socket is bound to a particular device when
selecting an IPv6 raw socket to receive a packet. Also perform this
check when receiving IPv6 packets with router alert options.
Signed-off-by: Andrew McDonald <andrew@mcdonald.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add new nfnetlink_queue module
- Add new ipt_NFQUEUE and ip6t_NFQUEUE modules to access queue numbers 1-65535
- Mark ip_queue and ip6_queue Kconfig options as OBSOLETE
- Update feature-removal-schedule to remove ip[6]_queue in December
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- split netfiler verdict in 16bit verdict and 16bit queue number
- add 'queuenum' argument to nf_queue_outfn_t and its users ip[6]_queue
- move NFNL_SUBSYS_ definitions from enum to #define
- introduce autoloading for nfnetlink subsystem modules
- add MODULE_ALIAS_NFNL_SUBSYS macro
- add nf_unregister_queue_handlers() to register all handlers for a given
nf_queue_outfn_t
- add more verbose DEBUGP macro definition to nfnetlink.c
- make nfnetlink_subsys_register fail if subsys already exists
- add some more comments and debug statements to nfnetlink.c
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rerouting functionality is required by the core, therefore it has
to be implemented by the core and not in individual queue handlers.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Remove bogus code for compiling netlink as module
- Add module refcounting support for modules implementing a netlink
protocol
- Add support for autoloading modules that implement a netlink protocol
as soon as someone opens a socket for that protocol
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is nothing IPv4-specific in it. In fact, it was already used by
IPv6, too... Upcoming nfnetlink_queue code will use it for any kind
of packet.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead, set it in one place, namely the beginning of
netif_receive_skb().
Based upon suggestions from Jamal Hadi Salim.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global function:
- xfrm4_state.c: xfrm4_state_fini
- remove the following unneeded EXPORT_SYMBOL's:
- ip_output.c: ip_finish_output
- ip_output.c: sysctl_ip_default_ttl
- fib_frontend.c: ip_dev_find
- inetpeer.c: inet_peer_idlock
- ip_options.c: ip_options_compile
- ip_options.c: ip_options_undo
- net/core/request_sock.c: sysctl_max_syn_backlog
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding just wants the device before the skb_bond()
decapsulation occurs, so simply pass that original
device into packet_type->func() as an argument.
It remains to be seen whether we can use this same
exact thing to get rid of skb->input_dev as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ctnetlink subsystem for userspace-access to ip_conntrack table.
This allows reading and updating of existing entries, as well as
creating new ones (and new expect's) via nfnetlink.
Please note the 'strange' byte order: nfattr (tag+length) are in host
byte order, while the payload is always guaranteed to be in network
byte order. This allows a simple userspace process to encapsulate netlink
messages into arch-independent udp packets by just processing/swapping the
headers and not knowing anything about the actual payload.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the private element from skbuff, that is only used by
HIPPI. Instead it uses skb->cb[] to hold the additional data that is
needed in the output path from hard_header to device driver.
PS: The only qdisc that might potentially corrupt this cb[] is if
netem was used over HIPPI. I will take care of that by fixing netem
to use skb->stamp. I don't expect many users of netem over HIPPI
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce "nfnetlink" (netfilter netlink) layer. This layer is used as
transport layer for all userspace communication of the new upcoming
netfilter subsystems, such as ctnetlink, nfnetlink_queue and some day even
the mythical pkttables ;)
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a notifier chain based event mechanism for ip_conntrack state
changes. As opposed to the previous implementations in patch-o-matic, we
do no longer need a field in the skb to achieve this.
Thanks to the valuable input from Patrick McHardy and Rusty on the idea
of a per_cpu implementation.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the "list" member of struct sk_buff, as it is entirely
redundant. All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.
Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu@fr.zoreil.com> fixed
up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller. I did some basic
testing on my notebook and it seems to work.
The only real in-tree user of nfcache was IPVS, who only needs a
single bit. Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them. Maybe the IPVS guys can resolve that to further save space.
Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(
The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.
- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
compile-conditional skb->ipvs_property field. IPVS maintainers can
decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discussed at netconf'05, we convert nfmark and conntrack-mark to be
32bits even on 64bit architectures.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Richard Purdie
Add some extra pxa27x register definitions needed for the Sharp
SL-C3000 (Spitz).
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
There is no need to define __ARCH_WANT_SYS_FADVISE64 on ARM since it
only serves to compile in a compatibility wrapper for sys_fadvise64
which never was tied to any syscall number.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add the definitions for the S3C2410_CLKSLOW registers to
the header files, and show the values when the system
starts up
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Deepak Saxena
This patch implements the set_irq_type() hooks for configuring GPIO
IRQ type and updates all the platforms to use it instead of the
gpio_line_config() function which is now used to configure input
vs. output on the pins.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It appears that a memory barrier soon after a mispredicted
branch, not just in the delay slot, can cause the hang
condition of this cpu errata.
So move them out-of-line, and explicitly put them into
a "branch always, predict taken" delay slot which should
fully kill this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the spinlock routines were moved out of line into
kernel/spinlock.c this made it so that the debugging
spinlocks record lock acquisition program counts in the
kernel/spinlock.c functions not in their callers.
This makes the debugging info kind of useless.
So record the correct caller's program counter and
now this feature is useful once more.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed sparc architecture specific users of asm/segment.h and
asm-sparc/segment.h itself
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed sparc64 architecture specific users of asm/segment.h and
asm-sparc64/segment.h itself
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current uncorrectable error handling was poor enough
that the processor could just loop taking the same
trap over and over again. Fix things up so that we
at least get a log message and perhaps even some register
state.
In the process, much consolidation became possible,
particularly with the correctable error handler.
Prefix assembler and C function names with "spitfire"
to indicate that these are for Ultra-I/II/IIi/IIe only.
More work is needed to make these routines robust and
featureful to the level of the Ultra-III error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
* ieee1394_device_id has kernel_ulong_t field after an odd number of
__u32 ones. Since mod_devicetable.h is included both from kernel and
from host build helper, we may be in trouble if we are building on
32bit host for 64bit target - userland sees unsigned long long,
kernel sees unsigned long and while their sizes match, alignments
might not. Fixed by forcing alignment. Fortunately, almost nobody
else needs that - the rest of such fields is naturally aligned as it
is.
* of_device_id has void * in it. Host userland helpers need
kernel_ulong_t instead, since their void * might have nothing to do
with the kernel one. Fixed in the same way it's done for similar
problems in pcmcia_device_id (ifdef __KERNEL__).
* pcmcia_device_id has the same problem as ieee1394_device_id. Fixed
the same way.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paulus, I think this is now a reasonable candidate for the post-2.6.13
queue.
Relax address restrictions for hugepages on ppc64
Presently, 64-bit applications on ppc64 may only use hugepages in the
address region from 1-1.5T. Furthermore, if hugepages are enabled in
the kernel config, they may only use hugepages and never normal pages
in this area. This patch relaxes this restriction, allowing any
address to be used with hugepages, but with a 1TB granularity. That
is if you map a hugepage anywhere in the region 1TB-2TB, that entire
area will be reserved exclusively for hugepages for the remainder of
the process's lifetime. This works analagously to hugepages in 32-bit
applications, where hugepages can be mapped anywhere, but with 256MB
(mmu segment) granularity.
This patch applies on top of the four level pagetable patch
(http://patchwork.ozlabs.org/linuxppc64/patch?id=1936).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch moves power4_enable_pmcs() to arch/ppc64/kernel/pmc.c.
I've tested it on P5 LPAR and P4. It does what it used to.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If both CONFIG_XMON and CONFIG_XMON_DEFAULT is enabled in the .config,
there is no way to disable xmon again. setup_system calls first xmon_init,
later parse_early_param. So a new 'xmon=off' cmdline option will do the right
thing.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can now remove CONFIG_MSCHUNKS as it doesn't do anything interesting
anymore.
The only macro in abs_addr.h which is called by non-iSeries code is
phys_to_abs(), so remove the other dummy implementations, and we add a
firmware feature check to phys_to_abs().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We no longer need the lmb code to know about abs and phys addresses, so
remove the physbase variable from the lmb_property struct.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
abs_to_phys() is a macro that turns out to do nothing, and also has the
unfortunate property that it's not the inverse of phys_to_abs() on iSeries.
The following is for my benefit as much as everyone else.
With CONFIG_MSCHUNKS enabled, the lmb code is changed such that it keeps
a physbase variable for each lmb region. This is used to take the possibly
discontiguous lmb regions and present them as a contiguous address space
beginning from zero.
In this context each lmb region's base address is its "absolute" base
address, and its physbase is it's "physical" address (from Linux's point of
view). The abs_to_phys() macro does the mapping from "absolute" to "physical".
Note: This is not related to the iSeries mapping of physical to absolute
(ie. Hypervisor) addresses which is maintained with the msChunks structure.
And the msChunks structure is not controlled via CONFIG_MSCHUNKS.
Once upon a time you could compile for non-iSeries with CONFIG_MSCHUNKS
enabled. But these days CONFIG_MSCHUNKS depends on CONFIG_PPC_ISERIES, so
for non-iSeries code abs_to_phys() is a no-op.
On iSeries we always have one lmb region which spans from 0 to
systemcfg->physicalMemorySize (arch/ppc64/kernel/iSeries_setup.c line 383).
This region has a base (ie. absolute) address of 0, and a physbase address
of 0 (as calculated in lmb_analyze() (arch/ppc64/kernel/lmb.c line 144)).
On iSeries, abs_to_phys(aa) is defined as lmb_abs_to_phys(aa), which finds
the lmb region containing aa (and there's only one, ie. 0), and then does:
return lmb.memory.region[0].physbase + (aa - lmb.memory.region[0].base)
physbase == base == 0, so you're left with "return aa".
So remove abs_to_phys(), and lmb_abs_to_phys() which is the implementation
of abs_to_phys() for iSeries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
physRpn_to_absRpn is a no-op on non-iSeries platforms, remove the two
redundant calls.
There's only one caller on iSeries so fold the logic in there so we can get
rid of it completely.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The only caller of chunk_offset() and abs_chunk() is phys_to_abs(), so
fold the former two into the latter.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rename the msChunks struct to get rid of the StUdlY caps and make it a bit
clearer what it's for.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Chunks are 256KB, so use constants for the size/shift/mask, rather than
getting them from the msChunks struct. The iSeries debugger (??) might still
need access to the values in the msChunks struct, so we keep them around
for now, but set them from the constant values.
Replace msChunks_entry typedef with regular u32.
Simplify msChunks_alloc() to manipulate klimit directly, rather than via
a parameter.
Move msChunks_alloc() and msChunks into iSeries_setup.c, as that's where
they're used.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The msChunks code was written to work on pSeries, but now it's only used on
iSeries. This means there's no need to do PTRRELOC anymore, so remove it all.
A few places were getting "extern reloc_offset()" from abs_addr.h, move it
into system.h instead.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make firmware_has_feature() evaluate at compile time for the non pSeries
case and tidy up code where possible.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Create the firmware_has_feature() inline and move the firmware feature
stuff into its own header file.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The firmware_features field of struct cpu_spec should really be a separate
variable as the firmware features do not depend on the chip and the
bitmask is constructed independently. By removing it, we save 112 bytes
from the cpu_specs array and we access the bitmask directly instead of via
the cur_cpu_spec pointer.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On ppc64 machines with segment tables, CPU0's segment table is at a
fixed address, currently 0x9000. This patch moves it to the free
space at 0x6000, just below the fwnmi data area. This saves 8k of
space in vmlinux and the runtime kernel image.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Comments in head.S suggest that the iSeries naca has a fixed address,
because tools expect to find it there. The only tool which appears to
access the naca is addRamDisk, but both the in-kernel version and the
version used in RHEL and SuSE in fact locate the NACA the same way as
the hypervisor does, by following the pointer in the hvReleaseData
structure.
Since the requirement for a fixed address seems to be obsolete, this
patch removes the naca from head.S and replaces it with a normal C
initializer.
For good measure, it removes an old version of addRamDisk.c which was
sitting, unused, in the ppc32 tree.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch just splits out the pSeries specific parts of vio.c.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch allows us to have a different bus if matching function for
each platform.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since the iSeries vio iommu tables cannot be used until after the vio bus has
been initialised, move the initialisation of the tables to there.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch splits the iSeries specific parts out of vio.c.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch updates the format of the flattened device-tree passed
between the boot trampoline and the kernel to support a more compact
representation, for use by embedded systems mostly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Implement 4-level pagetables for ppc64
This patch implements full four-level page tables for ppc64, thereby
extending the usable user address range to 44 bits (16T).
The patch uses a full page for the tables at the bottom and top level,
and a quarter page for the intermediate levels. It uses full 64-bit
pointers at every level, thus also increasing the addressable range of
physical memory. This patch also tweaks the VSID allocation to allow
matching range for user addresses (this halves the number of available
contexts) and adds some #if and BUILD_BUG sanity checks.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds back the code that was taken out, thus re-enabling:
* The PHY Layer to initialize without crashing
* Drivers to actually connect to PHYs
* The entire PHY Control Layer
This patch is used by the gianfar driver, and other drivers which are in
development.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
- changes license of all code from OSL+GPL to plain ole GPL
- except for NVIDIA, who hasn't yet responded about sata_nv
- copyright holders were already contacted privately
- adds info in each driver about where hardware/protocol docs may be
obtained
- where I have made major contributions, updated copyright dates
Debug variables and procfs dir should be "ieee80211", not "ipw".
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
IEEE 802.11 code has no business touching payloads of EAPOL frames.
There are some EAPOL structures defined for debugging and these were
confusingly called EAP types which they are not. Let's just remove these
before someone else starts using them in the kernel.
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
No need to maintain support for WIRELESS_EXT < 17 since this kernel
tree is already using WIRELESS_EXT 18.
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This one removes struct scsi_request entirely from sd. In the process,
I noticed we have no callers of scsi_wait_req who don't immediately
normalise the sense, so I updated the API to make it take a struct
scsi_sense_hdr instead of simply a big sense buffer.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This one's slightly more difficult. The transport class uses
REQ_FAILFAST, so another interface (scsi_execute) had to be invented to
take the extra flag. Also, the sense functions are shifted around to
allow spi_execute to place data directly into a struct scsi_sense_hdr.
With this change, there's probably a lot of unnecessary sense buffer
allocation going on which we can fix later.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
After this, we just have some drivers, all the ULDs and the SPI
transport class using scsi_wait_req().
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
attribute_container_classdev_to_container is an exported function of the
attribute_container.c file. However, there's no prototype for it. Now
I actually want to use it, so add one.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Original From: Mike Christie <michaelc@cs.wisc.edu>
Add scsi_execute_req() as a replacement for scsi_wait_req()
Fixed up various pieces (added REQ_SPECIAL and caught req use after
free)
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some nodes can have large holes on x86-64.
This fixes problems with the VM allowing too many dirty pages because it
overestimates the number of available RAM in a node. In extreme cases you
can end up with all RAM filled with dirty pages which can lead to deadlocks
and other nasty behaviour.
This patch just tells the VM about the known holes from e820. Reserved
(like the kernel text or mem_map) is still not taken into account, but that
should be only a few percent error now.
Small detail is that the flat setup uses the NUMA free_area_init_node() now
too because it offers more flexibility.
(akpm: lotsa thanks to Martin for working this problem out)
Cc: Martin Bligh <mbligh@mbligh.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently had a BUG_ON() go off spuriously on a gcc 4.0 compiled kernel.
It turns out gcc-4.0 was removing a sign extension while earlier gcc
versions would not. Thinking this to be a compiler bug, I submitted a
report:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23422
It turns out we need to cast the input in order to tell gcc to sign extend
it.
Thanks to Andrew Pinski for his help on this bug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to support P-state transitions on ia64. This driver is based on ACPI,
and uses the ACPI processor driver interface to find out the P-state support
information for the processor. This driver plugs into generic cpufreq
infrastructure.
Once this driver is loaded successfully, ondemand/userspace governor can be
used to change the CPU frequency dynamically based on load or on request from
userspace process.
Refer :
ACPI specification -
http://www.acpi.info
P-state related PAL calls -
http://developer.intel.com/design/itanium/downloads/24869909.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to abstract irq_affinity down to the pci provider level since
different SGI hardware implements this in different ways.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
From: Michael Wu <flamingice@sourmilk.net>
This patch:
- fixes misc. whitespace/comments
- replaces u16 with __le16/__be16 where appropriate
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
From: Gertjan van Wingerde <gwingerde@home.nl>
Attached patch cleans up the long lists of #defines for status codes,
reason codes, and information elements.
Signed-off-by: Gertjan van Wingerde <gwingerde@home.nl>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
From: Gertjan van Wingerde <gwingerde@home.nl>
Attached patch updates the definitions of the generic ieee80211 stack to
the latest versions of the published 802.11x specification suite.
Signed-off-by: Gertjan van Wingerde <gwingerde@home.nl>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Delete the ability to build an ACPI kernel that does
not include PCI support. When such a machine is created
and it requires a tuned kernel, send a patch.
http://bugzilla.kernel.org/show_bug.cgi?id=1364
Signed-off-by: Len Brown <len.brown@intel.com>
Clean up of SGI SN partitioning related code.
The SN_SAL_GET_SN_INFO SAL call returns the partition ID, making
the SN_SAL_SYSCTL_PARTITION_GET SAL call redundant. Remove sn_partid
and use sn_partition_id.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a new exported function for determining the nearest node
with CPUs for I/O nodes and fix a bug where the hwperf dynamic
misc device was being registered before misc_init().
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Bugfix to export PCI topology information in /proc/sgi_sn/sn_topology.
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently, region numbers are defined in several files, with several
names. For example, we have REGION_KERNEL in asm/page.h and
RGN_KERNEL in pgtable.h
We also have address definitions that should depend on the
RGN_XXX macros, but are currently just long constants.
The following patch reorganises all the definitions so that they have
the same form (RGN_XXX), are in one place, and that addresses that
depend on RGN_XXX are derived from them.
(This is a necessary but not sufficient patch to allow UML-like
operation on IA64).
Thanks to David Mosberger for catching the change I missed in mmu_context.h.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Parenthesize "p" to avoid ambiguity. No callers have a problem today;
this is just to clean up the bad form.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As pointed out in the following thread, the CLOCK_TICK_RATE setting for
IXP4xx is incorrect b/c the HW ignores the lowest 2 bits of the LATCH
value.
http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2005-August/030950.html
Tnx to George Anziger and Egil Hjelmeland for finding the issue.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
remove the bogus games with explicit ifdefs on __CHECKER__
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a bunch of functions switched from volatile to __attribute__((noreturn)) and
from const to __attribute_pure__
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
extern on physid_2_cpu[] does not belong in smp.h - the thing is static.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
alpha xchg has to be a macro - alpha disables always_inline and if that
puppy does not get inlined, we immediately blow up on undefined reference.
Happens even on gcc3; with gcc4 that happens a _lot_.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fixed kconfig dependencies on ISA_DMA_API for parts of sound/* that rely
on it.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We ran into the limit with the maximum number of waiters at one of our sites.
This patch increases the number of possible waiters from 2^15 to 2^31 by using
a long for the counter in struct rw_semaphore. S390 and alpha already do this.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Kenneth Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer
as the result. Breaks ROSE completly and AX.25 if UID policy set to deny.
o While the list structure of AX.25's UID to callsign mapping table was
properly protected by a spinlock, it's elements were not refcounted
resulting in a race between removal and usage of an element.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket flag cleanups that went into 2.6.12-rc1 are basically oring
the flags of an old socket into the socket just being created.
Unfortunately that one was just initialized by sock_init_data(), so already
has SOCK_ZAPPED set. As the result zapped sockets are created and all
incoming connection will fail due to this bug which again was carefully
replicated to at least AX.25, NET/ROM or ROSE.
In order to keep the abstraction alive I've introduced sock_copy_flags()
to copy the socket flags from one sockets to another and used that
instead of the bitwise copy thing. Anyway, the idea here has probably
been to copy all flags, so sock_copy_flags() should be the right thing.
With this the ham radio protocols are usable again, so I hope this will
make it into 2.6.13.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interrupts from devices sharing the same IRQ could cause
ata_host_intr to finish commands being processed by atapi_packet_task
if the commands are using ATA_PROT_ATAPI_NODATA or ATA_PROT_ATAPI_DMA
protocol. This is because libata interrupt handler is unaware that
interrupts are not expected during that period. This patch adds
ATA_FLAG_NOINTR flag to tell the interrupt handler that we're not
expecting interrupts.
Note that once proper HSM is implemented for interrupt-driven PIO,
this should be merged into it and this flag will be removed.
ahci.c is a different kind of beast, so it's left alone.
* The following drivers use ata_qc_issue_prot and ata_interrupt, so
changes in libata core will do.
ata_piix sata_sil sata_svw sata_via sata_sis sata_uli
* The following drivers use ata_qc_issue_prot and custom intr handler.
They need this change to work correctly.
sata_nv sata_vsc
* The following drivers use custom issue function and intr handler.
Currently all custom issue functions don't support ATAPI, so this
change is irrelevant, updated for consistency and to avoid later
mistakes.
sata_promise sata_qstor sata_sx4
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This bug could cause oopses and page state corruption, because ncpfs
used the generic page-cache symlink handlign functions. But those
functions only work if the page cache is guaranteed to be "stable", ie a
page that was installed when the symlink walk was started has to still
be installed in the page cache at the end of the walk.
We could have fixed ncpfs to not use the generic helper routines, but it
is in many ways much cleaner to instead improve on the symlink walking
helper routines so that they don't require that absolute stability.
We do this by allowing "follow_link()" to return a error-pointer as a
cookie, which is fed back to the cleanup "put_link()" routine. This
also simplifies NFS symlink handling.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC 4.x really dislikes the games we are playing in
unaligned.c, and the cleanest way to fix this is to
move things into assembler.
Noted by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no point in having the host name duplicated between
the mmc_host structure and the encapsulated class device
structure.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Create a mmc_host class to allow enumeration of MMC host controllers
even though they have no card(s) inserted.
Patch based on work by Pierre Ossman.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Not only was this unused, but its somewhat eccentric declaration
of "static inline const unsigned long" gives gcc4 heartburn.
Signed-off-by: Tony Luck <tony.luck@intel.com>
BCM5785 (HT1000) is a Opteron Southbridge from Serverworks/Broadcom that
incorporates a single channel ATA100 IDE controller that is functionally
identical to the Serverworks CSB6 IDE controller. This patch adds support
for the new PCI device ID and also the support for this controller.
Signed-off-by: Narendra Sankar <nsankar@broadcom.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Adds support for Netcell Revolution to pci-ide generic driver by including
it in the list of devices matched. Includes the Revolution in the list of
simplex devices forced into DMA mode.
Signed-off-by: Matt Gillette <matt.gillette@netcell.com>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Fixes the incorrect DCR base value for the 440SP SRAM controller.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes build on 4xx stb03xxx when general purpose dma engine support is
enabled.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add inotify and ioprio syscall stubs to SH64.
Signed-off-by: Robert Love <rml@novell.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add inotify and ioprio syscall stubs to SH.
Signed-off-by: Robert Love <rml@novell.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Down the road we want to eliminate the use of the global kernel lock entirely
from the NFS client. To do this, we need to protect the fields in the
nfs_inode structure adequately. Start by serializing updates to the
"cache_validity" field.
Note this change addresses an SMP hang found by njw@osdl.org, where processes
deadlock because nfs_end_data_update and nfs_revalidate_mapping update the
"cache_validity" field without proper serialization.
Test plan:
Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on
large SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce atomic bitops to manipulate the bits in the nfs_inode structure's
"flags" field.
Using bitops means we can use a generic wait_on_bit call instead of an ad hoc
locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from
nfs_inode at the same time.
The other new flags field will continue to use bitmask and logic AND and OR.
This permits several flags to be set at the same time efficiently. The
following patch adds a spin lock to protect these flags, and this spin lock
will later cover other fields in the nfs_inode structure, amortizing the cost
of using this type of serialization.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Certain bits in nfsi->flags can be manipulated with atomic bitops, and some
are better manipulated via logical bitmask operations.
This patch splits the flags field into two. The next patch introduces atomic
bitops for one of the fields.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add platform device data for the SA11x0 MCP device. This allows
platforms to customise the configuration of the SA11x0 MCP device
according to their needs.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Shub2 provides a much improved mechanism for issuing internode
TLB purges. Add code to support the newer mechanism. There is also
some debug code (disabled) that is useful for testing.
Collect statistics on the number, type & duration of TLB purges.
This data will be useful for making future improvements in the algorithms.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the SN address macros so that they work on both shub1
and shub2. Most of the code to support shub2 was added last year
but this patch fixes a few bugs and adds macros to help generate
both processor-specific physical addresses & numalink physical
addresses. More cleanup & optimization will be done later.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Paulus suggested that we put xLparMap in its own .c file so that we can
generate a .s file to be included into head.S. This doesn't get around
the problem of having it at a fixed address, but it makes it more
palatable.
It would be good if this could be included in 2.6.13 as it solves our
build problems with various versions of binutils and gcc. In
particular, it allows us to build an iSeries kernel on Debian unstable
using their biarch compiler.
This has been built and booted on iSeries and built for pSeries and g5.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On the 6700/6702 PXH part, a MSI may get corrupted if an ACPI hotplug
driver and SHPC driver in MSI mode are used together.
This patch will prevent MSI from being enabled for the SHPC as part of
an early pci quirk, as well as on any pci device which sets the no_msi
bit.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Ebbert noticed that the desc_empty macro is incorrect. Fix it.
Thankfully, this is not used as a security check, but it can falsely
overwrite TLS segments with carefully chosen base / limits. I do not
believe this is an issue in practice, but it is a kernel bug.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
[ x86-64 had the same problem, and the same fix. Linus ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When the client performs an exclusive create and opens the file for writing,
a Netapp filer will first create the file using the mode 01777. It does this
since an NFSv3/v4 exclusive create cannot immediately set the mode bits.
The 01777 mode then gets put into the inode->i_mode. After the file creation
is successful, we then do a setattr to change the mode to the correct value
(as per the NFS spec).
The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so
the latter retains the 01777 mode. A bit later, the VFS notices this, and calls
remove_suid(). This of course now resets the file mode to inode->i_mode & 0777.
Hey presto, the file mode on the server is now magically changed to 0777. Duh...
Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a MOVE_SELF event to inotify. It is sent whenever the inode
you are watching is moved. We need this event so that we can catch
something like this:
- app1:
watch /etc/mtab
- app2:
cp /etc/mtab /tmp/mtab-work
mv /etc/mtab /etc/mtab~
mv /tmp/mtab-work /etc/mtab
app1 still thinks it's watching /etc/mtab but it's actually watching
/etc/mtab~.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IEEE 802.11 has a capability field flag called ESS, but ieee80211 had
renamed this to BSS for some reason. hostap has been using
WLAN_CAPABILITY_ESS and since that matches with the standard, lets use
it as the name for this define. Add WLAN_CAPABILITY_BSS as a backwards
compatibility name for the same bit since ieee80211 and ipw2200 are
using this and there are versions outside kernel tree that expect to
find this define name.
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
IEEE 802.11 frame control has two bits reserved for protocol
version. IEEE80211_FCTL_VERS was not used anywhere, but I would assume
it was supposed to be a mask for the protocol field and as such, it
should be 0x0003, not 0x0002. This matches with WLAN_FC_PVER
definition in hostap.
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This reverts commits
71db63acff
[PATCH] increase PCIBIOS_MIN_IO on x86
and
0b2bfb4e7f
ACPI: increase PCIBIOS_MIN_IO on x86
since Lukas Sandströ<lukass@etek.chalmers.se> reports that this breaks
his on-board nvidia audio.
We should re-visit this later. For now we revert the change
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently tried to construct a totally generic transport class and
found there were certain features missing from the current abstract
transport class. Most notable is that you have to hang the data on the
class_device but most of the API is framed in terms of the generic
device, not the class_device.
These changes are two fold
- Provide the class_device to all of the setup and configure APIs
- Provide and extra API to take the device and the attribute class and
return the corresponding class_device
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch is necessary if we begin exposing underlying physical disks
(which can attach to the SPI transport class) of the hardware RAID
cards, since we don't want any SPI parameters binding to the RAID
devices.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There was a rather silly and embarrassing typo in the sh _syscall6().
For the syscall ABI we have the trapa value specified as 0x10 + number
of arguments, this was being set incorrectly in the _syscall6() case
which ended up causing some problems for users.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The patch below should fix a race which could cause stale TLB entries.
Specifically, when 2 CPUs ended up racing for entrance to
wrap_mmu_context(). The losing CPU would find that by the time it
acquired ctx.lock, mm->context already had a valid value, but then it
failed to (re-)check the delayed TLB flushing logic and hence could
end up using a context number when there were still stale entries in
its TLB. The fix is to check for delayed TLB flushes only after
mm->context is valid (non-zero). The patch also makes GCC v4.x
happier by defining a non-volatile variant of mm_context_t called
nv_mm_context_t.
Signed-off-by: David Mosberger-Tang <David.Mosberger@acm.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes a race during initialization with the NAPI softirq
processing by using an RCU approach.
This race was discovered when refill_skbs() was added to
the setup code.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add limited retry logic to netpoll_send_skb
Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter.
We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many instances of
skb->protocol = htons(ETH_P_*);
skb->protocol = __constant_htons(ETH_P_*);
and
skb->protocol = *_type_trans(...);
Most of *_type_trans() are already endian-annotated, so, let's shift
attention on other warnings.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Altix patch to add an SN pci provider for TIOCE, which is SGI's
PCI Express implementation.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to add TIO "huge-window" address support to sn_dma_flush().
Update copyright in affected files.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix patch to abstract the force_interrupt() mechanism away from the
pcibr provider.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cosmetic altix patch to rename SGI_PCIBR_ERROR to something more generic and
remove a duplicate #define.
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
1. Nontemporal store for spin unlock.
A nontemporal store will not update the LRU setting for the cacheline. The
cacheline with the lock may therefore be evicted faster from the cpu
caches. Doing so may be useful since it increases the chance that the
exclusive cache line has been evicted when another cpu is trying to
acquire the lock.
The time between dropping and reacquiring a lock on the same cpu is
typically very small so the danger of the cacheline being
evicted is negligible.
2. Avoid semaphore operation in write_unlock and use nontemporal store
write_lock uses a cmpxchg like the regular spin_lock but write_unlock uses
clear_bit which requires a load and then a loop over a cmpxchg. The
following patch makes write_unlock simply use a nontemporal store to clear
the highest 8 bits. We will then still have the lower 3 bytes (24 bits)
left to count the readers.
Doing the byte store will reduce the number of possible readers from 2^31
to 2^24 = 16 million.
These patches were discussed already:
http://marc.theaimsgroup.com/?t=111472054400001&r=1&w=2http://marc.theaimsgroup.com/?l=linux-ia64&m=111401837707849&w=2
The nontemporal stores will only work using GCC. If a compiler is used
that does not support inline asm then fallback C code is used. This
will preserve the byte store but not be able to do the nontemporal stores.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following stupid compile error that happens
when CONFIG_HOTPLUG is not defined on ia64.
arch/ia64/kernel/built-in.o(.text+0x712): In function `acpi_unregister_ioapic':
: undefined reference to `iosapic_remove'
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch from Ben Dooks
Rename the s3c2410_report_oc() to s3c2410_usb_report_oc()
as this is an usb specific function.
Change port power on the usb-simtec implementation to only
power up the output if both are set, as per the usb 1.1
specification
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Unfortunately, we can't use the "user" bit in the page tables to
control whether a page table entry is "global" or "asid" specific,
since the vector page is mapped as "user" accessible but is not
process specific.
Therefore, give direct control of the ARMv6 "nG" (not global)
bit to the mm layers.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. Move hwif_to_node to ide.h
2. Use hwif_to_node in ide-disk.c
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the now unused fsnotify_unlink & fsnotify_rmdir code.
Compile tested.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>