e_start acquires svc_export_cache.hash_lock, and e_stop releases it. Add
lock annotations to these two functions so that sparse can check callers
for lock pairing, and so that sparse will not complain about these
functions since they intentionally use locks in this manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist. This will cause the nfs server to listen on that socket.
To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file will list all ports that nfsd has open.
Default when TCP enabled will be
ipv4 udp 0.0.0.0 2049
ipv4 tcp 0.0.0.0 2049
Later, the list of ports will be settable.
'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Separate out the code for creating a new service, and for creating initial
sockets.
Some of these new functions will have multiple callers soon.
[akpm@osdl.org: cleanups]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.
Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.
This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.
Note that this doesn't make it possible to change versions while the server is
running. This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.
However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.
So:
- add an option to lockd_up saying which protocol is needed
- Always open sockets for which an explicit port was given, otherwise
only open a socket of the type required
- Change nfsd to do one lockd_up per socket rather than one per thread.
This
- removes the dependancy on CONFIG_NFSD_TCP
- means that lockd may open sockets other than at startup
- means that lockd will *not* listen on UDP if the only
mounts are TCP mount (and nfsd hasn't started).
The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that node_to_cpumask() is available to modules for all
supported combinations of architecture and CONFIG_NUMA.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that the cpu_online_map and cpu_possible_map bitmasks, and
hence all the macros in <linux/cpumask.h> that require them, are available to
modules for all supported combinations of architecture and CONFIG_SMP.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As reported in http://bugzilla.kernel.org/show_bug.cgi?id=6970, ISDN can issue
excessively-long udelays, which triggers a build-time error on ARM.
This is very sucky of ISDN, but I doubt if anyone is going to suddenly fix it.
So change the macro to do the microsecond counting itself.
Cc: <tch@wpkg.org>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch to the Siemens Gigaset driver fixes the compile warning
"ignoring return value of 'class_device_create_file', declared with
attribute warn_unused_result" appearing with CONFIG_ENABLE_MUST_CHECK=y in
release 2.6.18-rc1-mm1.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Acked-by: Hansjoerg Lipp <hjlipp@web.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A simple module to test Linux Kernel Dump mechanism. This module uses
jprobes to install/activate pre-defined crash points. At different crash
points, various types of crashing scenarios are created like a BUG(),
panic(), exception, recursive loop and stack overflow. The user can
activate a crash point with specific type by providing parameters at the
time of module insertion. Please see the file header for usage
information. The module is based on the Linux Kernel Dump Test Tool by
Fernando <http://lkdtt.sourceforge.net>.
This module could be merged with mainline. Jprobes is used here so that the
context in which crash point is hit, could be maintained. This implements
all the crash points as done by LKDTT except the one in the middle of
tasklet_action().
Signed-off-by: Ankita Garg <ankita@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When kprobe is re-entered, the re-entered kprobe kernel path will will call
atomic_notifier_call_chain function, if this function is kprobed that will
incur numerous kprobe recursive fault. This patch disallows kprobes on
atomic_notifier_call_chain function.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Documentation/kprobes.txt updated to reflect:
o In-kernel symbol resolution
o CONFIG_KALLSYMS dependency
o Usage of JPROBE_ENTRY
o Addition of regs_return_value()
Also update the references list and usage examples to use correct module
interfaces.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the regs_return_value() macro to extract the return value in an
architecture agnostic manner, given the pt_regs.
Other architecture maintainers may want to add similar helpers.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kallsyms_lookup_name() allows for <module:symbol> style specification for
looking up symbol addresses. Handle the case where the user specifies
<module:.symbol> on powerpc, given that 64-bit powerpc uses function
descriptors.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In an effort to make kprobe modules more portable, here is a patch that:
o Introduces the "symbol_name" field to struct kprobe.
The symbol->address resolution now happens in the kernel in an
architecture agnostic manner. 64-bit powerpc users no longer have
to specify the ".symbols"
o Introduces the "offset" field to struct kprobe to allow a user to
specify an offset into a symbol.
o The legacy mechanism of specifying the kprobe.addr is still supported.
However, if both the kprobe.addr and kprobe.symbol_name are specified,
probe registration fails with an -EINVAL.
o The symbol resolution code uses kallsyms_lookup_name(). So
CONFIG_KPROBES now depends on CONFIG_KALLSYMS
o Apparantly kprobe modules were the only legitimate out-of-tree user of
the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
o Modify tcp_probe.c that uses the kprobe interface so as to make it
work on multiple platforms (in its earlier form, the code wouldn't
work, say, on powerpc)
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replaces the pid_t value with a struct pid to avoid pid wrap around
problems.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem with remembering a user space process by its pid is that it is
possible that the process will exit, pid wrap around will occur.
Converting to a struct pid avoid that problem, and paves the way for
implementing a pid namespace.
Also since usb is the only user of kill_proc_info_as_uid rename
kill_proc_info_as_uid to kill_pid_info_as_uid and have the new version take
a struct pid.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This has been needed for a long time, but now with the advent of a
reference counted struct pid there are real consequences for getting this
wrong.
Someone I think it was Oleg Nesterov pointed out that this construct was
missing locking, when I introduced struct pid. After taking time to review
the locking construct already present I figured out which lock needs to be
taken. The other paths that access f_owner.pid take either the f_owner
read or the write lock.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Message queues can signal a process waiting for a message.
This patch replaces the pid_t value with a struct pid to avoid pid wrap
around problems.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
to introduce struct pspace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define a per-container pid space object. And create one instance of this
object, init_pspace, to define the entire pid space. Subsequent patches
will provide/use interfaces to create/destroy pid spaces.
Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h
where it will be used in subsequent patches to define pid spaces.
Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285
[akpm@osdl.org: cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I think it is hardly possible to read the current do_each_task_pid(). The
new version is much simpler and makes the code smaller.
Only the do_each_task_pid change is tested, the do_each_pid_task isn't.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use struct pidmap instead of pidmap_t.
This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
to kill pidmap_t.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use struct pidmap instead of pidmap_t.
Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As part of an SMP cleanliness pass over UML, I consted a bunch of
structures in order to not have to document their locking. One of these
structures was a struct tty_operations. In order to const it in UML
without introducing compiler complaints, the declaration of
tty_set_operations needs to be changed, and then all of its callers need to
be fixed.
This patch declares all struct tty_operations in the tree as const. In all
cases, they are static and used only as input to tty_set_operations. As an
extra check, I ran an i386 allyesconfig build which produced no extra
warnings.
53 drivers are affected. I checked the history of a bunch of them, and in
most cases, there have been only a handful of maintenance changes in the
last six months. serial_core.c was the busiest one that I looked at.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
they aren't yet (don't just update timestamp unconditionally). Uninline
the hash function to save 259 Bytes.
This tiny inode change which may improve cache behaviour also shaves off 8
Bytes from file_update_time() on i386.
Included a tiny codestyle cleanup, too.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
File handles can be requested to send sigio and sigurg to processes. By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I took a good hard look at the locking and it appears the locking on vt_pid
is the console semaphore. Every modified path is called under the console
semaphore except reset_vc when it is called from fn_SAK or do_SAK both of
which appear to be in interrupt context. In addition I need to be careful
because in the presence of an oops the console_sem may be arbitrarily
dropped.
Which leads me to conclude the current locking is inadequate for my needs.
Given the weird cases we could hit because of oops printing instead of
introducing an extra spin lock to protect the data and keep the pid to
signal and the signal to send in sync, I have opted to use xchg on just the
struct pid * pointer instead.
Due to console_sem we will stay in sync between vt_pid and vt_mode except
for a small window during a SAK, or oops handling. SAK handling should
kill any user space process that care, and oops handling we are broken
anyway. Besides the worst that can happen is that I try to send the wrong
signal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is such a rare path it took me a while to figure out how to test
this after soring out the locking.
This patch does several things.
- The variables used are moved into a structure and declared in vt_kern.h
- A spinlock is added so we don't have SMP races updating the values.
- Instead of raw pid_t value a struct_pid is used to guard against
pid wrap around issues, if the daemon to spawn a new console dies.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As we stop storing pid_t's and move to storing struct pid *. We need a way to
get the pid_t from the struct pid to report to user space what we have stored.
Having a clean well defined way to do this is especially important as we move
to multiple pid spaces as may need to report a different value to the caller
depending on which pid space the caller is in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pids aren't something that drivers should care about. However there are a lot
of helper layers in the kernel that do care, and are built as modules. Before
I can convert them to using struct pid instead of pid_t I need to export the
appropriate symbols so they can continue to be built.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the signal functions all either take a task or a pid_t argument.
This patch implements variants that take a struct pid *. After all of the
users have been update it is my intention to remove the variants that take a
pid_t as using pid_t can be more work (an extra hash table lookup) and
difficult to get right in the presence of multiple pid namespaces.
There are two kinds of functions introduced in this patch. The are the
general use functions kill_pgrp and kill_pid which take a priv argument that
is ultimately used to create the appropriate siginfo information, Then there
are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation
helpers that take an explicit siginfo.
The distinction is made because filling out an explcit siginfo is tricky, and
will be even more tricky when pid namespaces are introduced.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To avoid pid rollover confusion the kernel needs to work with struct pid *
instead of pid_t. Currently there is not an iterator that walks through all
of the tasks of a given pid type starting with a struct pid. This prevents us
replacing some pid_t instances with struct pid. So this patch adds
do_each_pid_task which walks through the set of task for a given pid type
starting with a struct pid.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the last round of cleaning up the pid hash table a more general struct pid
was introduced, that can be referenced counted.
With the more general struct pid most if not all places where we store a pid_t
we can now store a struct pid * and remove the need for a hash table lookup,
and avoid any possible problems with pid roll over.
Looking forward to the pid namespaces struct pid * gives us an absolute form a
pid so we can compare and use them without caring which pid namespace we are
in.
This patchset introduces the infrastructure needed to use struct pid instead
of pid_t, and then it goes on to convert two different kernel users that
currently store a pid_t value.
There are a lot more places to go but this is enough to get the basic idea.
Before we can merge a pid namespace patch all of the kernel pid_t users need
to be examined. Those that deal with user space processes need to be
converted to using a struct pid *. Those that deal with kernel processes need
to converted to using the kthread api. A rare few that only use their current
processes pid values get to be left alone.
This patch:
task_session returns the struct pid of a tasks session.
task_pgrp returns the struct pid of a tasks process group.
task_tgid returns the struct pid of a tasks thread group.
task_pid returns the struct pid of a tasks process id.
These can be used to avoid unnecessary hash table lookups, and to implement
safe pid comparisions in the face of a pid namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
assume the directories have an associated task, and cannot currently be used
on the /proc root directory because it does not have such a task.
This small changes allows for base.c to be simplified and later when multiple
pid spaces are introduced it makes getting the needed context information
trivial.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There were enough changes in my last round of cleaning up proc I had to break
up the patch series into smaller chunks, and my last chunk never got resent.
This patchset gives proc dynamic inode numbers (the static inode numbers were
a pain to maintain and prevent all kinds of things), and removes the horrible
switch statements that had to be kept in sync with everything else. Being
fully table driver takes us 90% of the way of being able to register new
process specific attributes in proc.
This patch:
Group the functions by what they implement instead of by type of operation.
As it existed base.c was quickly approaching the point where it could not be
followed.
No functionality or code changes asside from adding/removing forward
declartions are implemented in this patch.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls. For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.
This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.
Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.
This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned. For directory that are
either created or destroyed during the readdir you may or may not see them.
Furthermore you may seek to a directory offset you have previously seen.
These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects. Plus it is a simple semantic to
implement reliable service. It is just a matter of calling readdir a
second time if you are wondering if something new has show up.
These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.
The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm. Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There
are only 40 cache lines for the entire 32K pid space. A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.
If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.
In addition this takes no additional locks and is actually less code than
what we are doing now.
Also another very subtle bug in this area has been fixed. It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader. This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.
Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.
[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When listing loaded modules during an oops or panic, also list each
module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).
If a module is did not taint the kernel, it is just listed like
usbcore
but if it did taint the kernel, it is listed like
wizmodem(PF)
Example:
[ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[ 3260.121729] [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
[ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0
[ 3260.121748] Oops: 0002 [1] SMP
[ 3260.121753] CPU 1
[ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
[ 3260.121785] Pid: 5556, comm: bash Tainted: P 2.6.18-git10 #1
[Alternatively, I can look into listing tainted flags with 'lsmod',
but that won't help in oopsen/panics so much.]
[akpm@osdl.org: cleanup]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>