android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Tejun Heo	cf3680b90c	printk: fix possible printk overrun printk recursion detection prepends message to printk_buf and offsets printk_buf when actual message is printed but it forgets to trim buffer length accordingly. This can result in overrun in extreme cases. Fix it. [ mingo@elte.hu: bug was introduced by me via: commit `32a7600668` Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jan 25 21:07:58 2008 +0100 printk: make printk more robust by not allowing recursion ] Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-26 07:42:37 -08:00
Dale Farnsworth	1481197b50	Subject: lockdep: include all lock classes in all_lock_classes Add each lock class to the all_lock_classes list when it is first registered. Previously, lock classes were added to all_lock_classes when the lock class was first used. Since one of the uses of the list is to find unused locks, this didn't work well. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 23:03:02 +01:00
Harvey Harrison	67ca7bde2e	sched: fix signedness warnings in sched.c Unsigned long values are always assigned to switch_count, make it unsigned long. kernel/sched.c:3897:15: warning: incorrect type in assignment (different signedness) kernel/sched.c:3897:15: expected long switch_count kernel/sched.c:3897:15: got unsigned long <noident> kernel/sched.c:3921:16: warning: incorrect type in assignment (different signedness) kernel/sched.c:3921:16: expected long switch_count kernel/sched.c:3921:16: got unsigned long <noident> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:17 +01:00
Ingo Molnar	7eee3e677d	sched: clean up __pick_last_entity() a bit Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:17 +01:00
Balbir Singh	70eee74b70	sched: remove duplicate code from sched_fair.c pick_task_entity() duplicates existing code. This functionality can be easily obtained using rb_last(). Avoid code duplication by using rb_last(). Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:17 +01:00
Ingo Molnar	6892b75e60	sched: make early bootup sched_clock() use safer do not call sched_clock() too early. Not only might rq->idle not be set up - but pure per-cpu data might not be accessible either. this solves an ia64 early bootup hang with CONFIG_PRINTK_TIME=y. Tested-by: Tony Luck <tony.luck@gmail.com> Acked-by: Tony Luck <tony.luck@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:16 +01:00
Linus Torvalds	04e2f1741d	Add memory barrier semantics to wake_up() & co Oleg Nesterov and others have pointed out that on some architectures, the traditional sequence of set_current_state(TASK_INTERRUPTIBLE); if (CONDITION) return; schedule(); is racy wrt another CPU doing CONDITION = 1; wake_up_process(p); because while set_current_state() has a memory barrier separating setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION variable, there is no such memory barrier on the wakeup side. Now, wake_up_process() does actually take a spinlock before it reads and sets the task state on the waking side, and on x86 (and many other architectures) that spinlock is in fact equivalent to a memory barrier, but that is not generally guaranteed. The write that sets CONDITION could move into the critical region protected by the runqueue spinlock. However, adding a smp_wmb() to before the spinlock should now order the writing of CONDITION wrt the lock itself, which in turn is ordered wrt the accesses within the spinlock (which includes the reading of the old state). This should thus close the race (which probably has never been seen in practice, but since smp_wmb() is a no-op on x86, it's not like this will make anything worse either on the most common architecture where the spinlock already gave the required protection). Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 18:05:03 -08:00
Li Zefan	bc231d2a04	cgroup: remove dead code in cgroup_get_rootdir() Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:25 -08:00
Li Zefan	68db38f153	cgroup: remove duplicate code in find_css_set() The list head res->tasks gets initialized twice in find_css_set(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:25 -08:00
Li Zefan	8d53d55d27	cgroup: fix subsys bitops Cgroup uses unsigned long for subsys bitops, not unsigned long long. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:24 -08:00
Li Zefan	f777073848	cgroup: fix memory leak in cgroup_get_sb() opts.release_agent is not kfree()ed in all necessary places. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:24 -08:00
Li Zefan	a043e3b2c6	cgroup: fix comments fix: - comments about need_forkexit_callback - comments about release agent - typo and comment style, etc. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:24 -08:00
Srinivasa Ds	4362758279	kprobes: refuse kprobe insertion on add/sub_preempt_counter() Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these functions inturn call add/sub_preempt_count(). So we need to refuse user from inserting probe in to these functions. This patch disallows user from probing add/sub_preempt_count(). Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:13:24 -08:00
Thomas Gleixner	a0c1e9073e	futex: runtime enable pi and robust functionality Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default implementation returns -ENOSYS, which is currently not handled inside of the futex guts. Futex PI calls and robust list exits with a held futex result in an endless loop in the futex code on architectures which have no support. Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would add a fair amount of extra if/else constructs to the already complex code. It is also not possible to disable the robust feature before user space tries to register robust lists. Compile time disabling is not a good idea either, as there are already architectures with runtime detection of futex_atomic_cmpxchg_inatomic support. Detect the functionality at runtime instead by calling cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization code. This is guaranteed to fail, but the call of futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled. On architectures, which use the asm-generic implementation or have a runtime CPU feature detection, a -ENOSYS return value disables the PI/robust features. On architectures with a working implementation the call returns -EFAULT and the PI/robust features are enabled. The relevant syscalls return -ENOSYS and the robust list exit code is blocked, when the detection fails. Fixes http://lkml.org/lkml/2008/2/11/149 Originally reported by: Lennart Buytenhek Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Cc: Riku Voipio <riku.voipio@movial.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:15 -08:00
Thomas Gleixner	3e4ab747ef	futex: fix init order When the futex init code fails to initialize the futex pseudo file system it returns early without initializing the hash queues. Should the boot succeed then a futex syscall which tries to enqueue a waiter on the hashqueue will crash due to the unitilialized plist heads. Initialize the hash queues before the filesystem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Cc: Riku Voipio <riku.voipio@movial.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:15 -08:00
Harvey Harrison	de4fc64f0f	markers: fix sparse warnings in markers.c char can be unsigned kernel/marker.c:64:20: error: dubious one-bit signed bitfield kernel/marker.c:65:14: error: dubious one-bit signed bitfield Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:14 -08:00
Rafael J. Wysocki	3a2d5b7001	PM: Introduce PM_EVENT_HIBERNATE callback state During the last step of hibernation in the "platform" mode (with the help of ACPI) we use the suspend code, including the devices' ->suspend() methods, to prepare the system for entering the ACPI S4 system sleep state. But at least for some devices the operations performed by the ->suspend() callback in that case must be different from its operations during regular suspend. For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and pass it to the device drivers' ->suspend() methods during the last phase of hibernation, so that they can distinguish this case and handle it as appropriate. Modify the drivers that handle PM_EVENT_SUSPEND in a special way and need to handle PM_EVENT_HIBERNATE in the same way. These changes are necessary to fix a hibernation regression related to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 10:40:04 -08:00
Linus Torvalds	20f8d2a493	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (26 commits) PM: Make suspend_device() static PCI ACPI: Fix comment describing acpi_pci_choose_state Hibernation: Handle DEBUG_PAGEALLOC on x86 ACPI: fix build warning ACPI: TSC breaks atkbd suspend ACPI: remove is_processor_present prototype acer-wmi: Add DMI match for mail LED on Acer TravelMate 4200 series ACPI: sparse fix, replace macro with static function ACPI: thinkpad-acpi: add tablet-mode reporting ACPI: thinkpad-acpi: minor hotkey_radio_sw fixes ACPI: thinkpad-acpi: improve thinkpad-acpi input device documentation ACPI: thinkpad-acpi: issue input events for tablet swivel events ACPI: thinkpad-acpi: make the video output feature optional ACPI: thinkpad-acpi: synchronize input device switches ACPI: thinkpad-acpi: always track input device open/close ACPI: thinkpad-acpi: trivial fix to documentation ACPI: thinkpad-acpi: trivial fix to module_desc typo intel_menlo: extract return values using PTR_ERR ACPI video: check for error from thermal_cooling_device_register ACPI thermal: extract return values using PTR_ERR ...	2008-02-21 16:33:19 -08:00
Kay Sievers	120fc3d77a	modules: do not try to add sysfs attributes if !CONFIG_SYSFS Thanks to Alexey for the testing and the fix of the fix. Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-02-21 15:27:08 -08:00
Rafael J. Wysocki	8a235efad5	Hibernation: Handle DEBUG_PAGEALLOC on x86 Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by checking if the pages to be copied are marked as present in the kernel mapping and temporarily marking them as present if that's not the case. No functional modifications are introduced if CONFIG_DEBUG_PAGEALLOC is unset. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2008-02-21 02:15:28 -05:00
Thomas Gleixner	89d694b9db	genirq: do not leave interupts enabled on free_irq The default_disable() function was changed in commit: `76d2160147` genirq: do not mask interrupts by default It removed the mask function in favour of the default delayed interrupt disabling. Unfortunately this also broke the shutdown in free_irq() when the last handler is removed from the interrupt for those architectures which rely on the default implementations. Now we can end up with a enabled interrupt line after the last handler was removed, which can result in spurious interrupts. Fix this by adding a default_shutdown function, which is only installed, when the irqchip implementation does provide neither a shutdown nor a disable function. [@stable: affected versions: .21 - .24 ] Pointed-out-by: Michael Hennerich <Michael.Hennerich@analog.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Tested-by: Michael Hennerich <Michael.Hennerich@analog.com>	2008-02-19 10:43:58 +01:00
S.Caglar Onur	188fd89d53	genirq: spurious.c: use time_* macros The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. So following patch implements usage of the time_after() macro, defined at linux/jiffies.h, which deals with wrapping correctly Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-19 10:43:58 +01:00
Eric Paris	b0abcfc146	Audit: use == not = in if statements Clearly this was supposed to be an == not an = in the if statement. This patch also causes us to stop processing execve args once we have failed rather than continuing to loop on failure over and over and over. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-18 18:46:28 -08:00
Pavel Machek	db4315d6f5	timer_list: print relative expiry time signed Relative expiry time can get negative, so it should be signed. Signed-off-by: Pavel Machek <Pavel@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-17 17:29:38 +01:00
Linus Torvalds	c24ce1d887	Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt * git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: hrtimer: catch expired CLOCK_REALTIME timers early hrtimer: check relative timeouts for overflow	2008-02-14 21:27:52 -08:00
Jan Blunck	cf28b4863f	d_path: Make d_path() use a struct path d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to reflect this. [akpm@linux-foundation.org: fix build in mm/memory.c] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Bryan Wu <bryan.wu@analog.com> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:09 -08:00
Jan Blunck	44707fdf59	d_path: Use struct path in struct avc_audit_data audit_log_d_path() is a d_path() wrapper that is used by the audit code. To use a struct path in audit_log_d_path() I need to embed it into struct avc_audit_data. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	6ac08c39a1	Use struct path in fs_struct * Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	1d957f9bf8	Introduce path_put() * Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	4ac9137858	Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 `6894212` 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	db74ece990	Dont touch fs_struct in usermodehelper This test seems to be unnecessary since we always have rootfs mounted before calling a usermodehelper. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:32 -08:00
Thomas Gleixner	63070a79ba	hrtimer: catch expired CLOCK_REALTIME timers early A CLOCK_REALTIME timer, which has an absolute expiry time less than the clock realtime offset calls with a negative delta into the clock events code and triggers the WARN_ON() there. This is a false positive and needs to be prevented. Check the result of timer->expires - timer->base->offset right away and return -ETIME right away. Thanks to Frans Pop, who reported the problem and tested the fixes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Frans Pop <elendil@planet.nl>	2008-02-14 22:08:30 +01:00
Thomas Gleixner	5a7780e725	hrtimer: check relative timeouts for overflow Various user space callers ask for relative timeouts. While we fixed that overflow issue in hrtimer_start(), the sites which convert relative user space values to absolute timeouts themself were uncovered. Instead of putting overflow checks into each place add a function which does the sanity checking and convert all affected callers to use it. Thanks to Frans Pop, who reported the problem and tested the fixes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Tested-by: Frans Pop <elendil@planet.nl>	2008-02-14 22:08:30 +01:00
Mathieu Desnoyers	fb40bd78b0	Linux Kernel Markers: support multiple probes RCU style multiple probes support for the Linux Kernel Markers. Common case (one probe) is still fast and does not require dynamic allocation or a supplementary pointer dereference on the fast path. - Move preempt disable from the marker site to the callback. Since we now have an internal callback, move the preempt disable/enable to the callback instead of the marker site. Since the callback change is done asynchronously (passing from a handler that supports arguments to a handler that does not setup the arguments is no arguments are passed), we can safely update it even if it is outside the preempt disable section. - Move probe arm to probe connection. Now, a connected probe is automatically armed. Remove MARK_MAX_FORMAT_LEN, unused. This patch modifies the Linux Kernel Markers API : it removes the probe "arm/disarm" and changes the probe function prototype : it now expects a va_list * instead of a "...". If we want to have more than one probe connected to a marker at a given time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it, connecting a second probe handler to a marker will fail. It allow us, for instance, to do interesting combinations : Do standard tracing with LTTng and, eventually, to compute statistics with SystemTAP, or to have a special trigger on an event that would call a systemtap script which would stop flight recorder tracing. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Mason <mmlnx@us.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: David Smith <dsmith@redhat.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:20 -08:00
Nishanth Aravamudan	064d9efe94	hugetlb: fix overcommit locking proc_doulongvec_minmax() calls copy_to_user()/copy_from_user(), so we can't hold hugetlb_lock over the call. Use a dummy variable to store the sysctl result, like in hugetlb_sysctl_handler(), then grab the lock to update nr_overcommit_huge_pages. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Reported-by: Miles Lane <miles.lane@gmail.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:18 -08:00
Harvey Harrison	b5606c2d44	remove final fastcall users fastcall always expands to empty, remove it. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:18 -08:00
Paul E. McKenney	fbf6bfca76	rcupdate: fix comment This comment caused some consternation during fastcall removal. Make it truthful. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:18 -08:00
Linus Torvalds	3174ffaa93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: rt-group: refure unrunnable tasks sched: rt-group: clean up the ifdeffery sched: rt-group: make rt groups scheduling configurable sched: rt-group: interface sched: rt-group: deal with PI sched: fix incorrect irq lock usage in normalize_rt_tasks() sched: fair-group: separate tg->shares from task_group_lock hrtimer: more hrtimer_init_sleeper() fallout.	2008-02-13 08:22:41 -08:00
Peter Zijlstra	b68aa2300c	sched: rt-group: refure unrunnable tasks Refuse to accept or create RT tasks in groups that can't run them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:40 +01:00
Peter Zijlstra	bccbe08a60	sched: rt-group: clean up the ifdeffery Clean up some of the excessive ifdeffery introduces in the last patch. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:40 +01:00
Peter Zijlstra	052f1dc7eb	sched: rt-group: make rt groups scheduling configurable Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:40 +01:00
Peter Zijlstra	9f0c1e560c	sched: rt-group: interface Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. This avoids picking a granularity for the ratio. Extend the /sys/kernel/uids/<uid>/ interface to allow setting the group's rt_runtime. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:39 +01:00
Peter Zijlstra	23b0fdfc92	sched: rt-group: deal with PI Steven mentioned the fun case where a lock holding task will be throttled. Simple fix: allow groups that have boosted tasks to run anyway. If a runnable task in a throttled group gets boosted the dequeue/enqueue done by rt_mutex_setprio() is enough to unthrottle the group. This is ofcourse not quite correct. Two possible ways forward are: - second prio array for boosted tasks - boost to a prio ceiling (this would also work for deadline scheduling) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:39 +01:00
Peter Zijlstra	4cf5d77a6e	sched: fix incorrect irq lock usage in normalize_rt_tasks() lockdep spotted this bogus irq locking. normalize_rt_tasks() can be called from hardirq context through sysrq-n Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:39 +01:00
Peter Zijlstra	8ed3699682	sched: fair-group: separate tg->shares from task_group_lock On Mon, 2008-02-11 at 15:09 +0300, Denis V. Lunev wrote: > BUG: sleeping function called from invalid context > at /home/den/src/linux-netns26/kernel/mutex.c:209 > in_atomic():1, irqs_disabled():0 > no locks held by swapper/0. > Pid: 0, comm: swapper Not tainted 2.6.24 #304 > > Call Trace: > <IRQ> [<ffffffff80252d1e>] ? __debug_show_held_locks+0x15/0x27 > [<ffffffff8022c2a8>] __might_sleep+0xc0/0xdf > [<ffffffff8049f1df>] mutex_lock_nested+0x28/0x2a9 > [<ffffffff80231294>] sched_destroy_group+0x18/0xea > [<ffffffff8023e835>] sched_destroy_user+0xd/0xf > [<ffffffff8023e8c1>] free_uid+0x8a/0xab > [<ffffffff80233e24>] __put_task_struct+0x3f/0xd3 > [<ffffffff80236708>] delayed_put_task_struct+0x23/0x25 > [<ffffffff8026fda7>] __rcu_process_callbacks+0x8d/0x215 > [<ffffffff8026ff52>] rcu_process_callbacks+0x23/0x44 > [<ffffffff8023a2ae>] __do_softirq+0x79/0xf8 > [<ffffffff8020f8c3>] ? profile_pc+0x2a/0x67 > [<ffffffff8020d38c>] call_softirq+0x1c/0x30 > [<ffffffff8020f689>] do_softirq+0x61/0x9c > [<ffffffff8023a233>] irq_exit+0x51/0x53 > [<ffffffff8021bd1a>] smp_apic_timer_interrupt+0x77/0xad > [<ffffffff8020ce3b>] apic_timer_interrupt+0x6b/0x70 > <EOI> [<ffffffff8020b0dd>] ? default_idle+0x43/0x76 > [<ffffffff8020b0db>] ? default_idle+0x41/0x76 > [<ffffffff8020b09a>] ? default_idle+0x0/0x76 > [<ffffffff8020b186>] ? cpu_idle+0x76/0x98 separate the tg->shares protection from the task_group lock. Reported-by: Denis V. Lunev <den@openvz.org> Tested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:39 +01:00
Peter Zijlstra	720a2592cf	hrtimer: more hrtimer_init_sleeper() fallout. Missed an instance... futex_lock_pi() hrtimer_init_sleeper() rt_mutex_timed_lock() rt_mutex_timed_fastlock() rt_mutex_slowlock() hrtimer_start() Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-13 15:45:36 +01:00
H. Peter Anvin	c98aa86df3	timeconst.pl: correct reversal of USEC_TO_HZ and HZ_TO_USEC The USEC_TO_HZ and HZ_TO_USEC constant sets were mislabelled, with seriously incorrect results. This among other things manifested itself as cpufreq not working when a tickless kernel was configured. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Tested-by: Carlos R. Mafra <crmafra@ift.unesp.br> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-12 14:29:26 -08:00
Oleg Nesterov	c289b074b6	hrtimer: don't modify restart_block->fn in restart functions hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is pointless and complicates its usage. Note that if sys_restart_syscall() doesn't actually happen, we have a bogus "pending" restart->fn anyway, this is harmless. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-10 10:48:03 +01:00
Oleg Nesterov	416529374b	hrtimer: fix rmtp/restarts handling in compat_sys_nanosleep() Spotted by Pavel Emelyanov and Alexey Dobriyan. compat_sys_nanosleep() implicitly uses hrtimer_nanosleep_restart(), this can't work. Make a suitable compat_nanosleep_restart() helper. Introduced by commit `c70878b4e0` hrtimer: hook compat_sys_nanosleep up to high res timer code Also, set ->addr_limit = KERNEL_DS before doing hrtimer_nanosleep(), this func was changed by the previous patch and now takes the "__user " parameter. Thanks to Ingo Molnar for fixing the bug in this patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-10 10:48:03 +01:00
Oleg Nesterov	080344b988	hrtimer: fix rmtp handling in hrtimer_nanosleep() Spotted by Pavel Emelyanov and Alexey Dobriyan. hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to the local variable which lives in the caller's stack frame. This means that if sys_restart_syscall() actually happens and it is interrupted as well, we don't update the user-space variable, but write into the already dead stack frame. Introduced by commit `04c227140f` hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier Change the callers to pass "__user rmtp" to hrtimer_nanosleep(), and change hrtimer_nanosleep() to use copy_to_user() to actually update rmtp. Small problem remains. man 2 nanosleep states that rtmp should be written if nanosleep() was interrupted (it says nothing whether it is OK to update rmtp if nanosleep returns 0), but (with or without this patch) we can dirty rem even if nanosleep() returns 0. NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other bugs. Fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> include/linux/hrtimer.h \| 2 - kernel/hrtimer.c \| 51 +++++++++++++++++++++++++----------------------- kernel/posix-timers.c \| 14 +------------ 3 files changed, 30 insertions(+), 37 deletions(-)	2008-02-10 10:48:03 +01:00
john stultz	e13a2e61dd	ntp: correct inconsistent interval/tick_length usage clocksource initialization and error accumulation. This corrects a 280ppm drift seen on some systems using acpi_pm, and affects other clocksources as well (likely to a lesser degree). Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-10 10:48:03 +01:00
S.Çağlar Onur	c1cb795338	Update kernel/.gitignore with new auto-generated files Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-09 23:27:01 -08:00
Linus Torvalds	dde0013782	Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc [POWERPC] Enable hotplug memory remove for 64-bit powerpc [POWERPC] Add remove_memory() for 64-bit powerpc [POWERPC] Make cell IOMMU fixed mapping printk more useful [POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops [POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges [POWERPC] Fix cell IOMMU null pointer explosion on old firmwares [POWERPC] spufs: Fix timing dependent false return from spufs_run_spu [POWERPC] spufs: No need to have a runnable SPU for libassist update [POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write [POWERPC] spufs: Fix state_mutex leaks [POWERPC] Disable G5 NAP mode during SMU commands on U3	2008-02-08 09:31:42 -08:00
Ralf Baechle	46f4f8f665	IRQ_NOPROBE helper functions Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq method may crash the system because handle_percpu_irq does not check IRQ_WAITING. This for example hits the MIPS Qemu configuration. This patch provides two helper functions set_irq_noprobe and set_irq_probe to set rsp. clear the IRQ_NOPROBE flag. The only current caller is MIPS code but this really belongs into generic code. As an aside, interrupt probing these days has become a mostly obsolete if not dangerous art. I think Linux interrupts should be changed to default to non-probing but that's subject of this patch. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Acked-and-tested-by: Rob Landley <rob@landley.net> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:42 -08:00
Yi Yang	06b2a76d25	Add new string functions strict_strto* and convert kernel params to use them Currently, for every sysfs node, the callers will be responsible for implementing store operation, so many many callers are doing duplicate things to validate input, they have the same mistakes because they are calling simple_strtol/ul/ll/uul, especially for module params, they are just numeric, but you can echo such values as 0x1234xxx, 07777888 and 1234aaa, for these cases, module params store operation just ignores succesive invalid char and converts prefix part to a numeric although input is acctually invalid. This patch tries to fix the aforementioned issues and implements strict_strtox serial functions, kernel/params.c uses them to strictly validate input, so module params will reject such values as 0x1234xxxx and returns an error: write error: Invalid argument Any modules which export numeric sysfs node can use strict_strtox instead of simple_strtox to reject any invalid input. Here are some test results: Before applying this patch: [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# After applying this patch: [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak -bash: echo: write error: Invalid argument [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak -bash: echo: write error: Invalid argument [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak -bash: echo: write error: Invalid argument [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak -bash: echo: write error: Invalid argument [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak 4096 [root@yangyi-dev /]# [akpm@linux-foundation.org: fix compiler warnings] [akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de] Signed-off-by: Yi Yang <yi.y.yang@intel.com> Cc: Greg KH <greg@kroah.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:41 -08:00
Sam Ravnborg	fa7303e22c	cpu: fix section mismatch warnings for enable_nonboot_cpus Fix following warning: WARNING: o-x86_64/kernel/built-in.o(.text+0x36d8b): Section mismatch in reference from the function enable_nonboot_cpus() to the function .cpuinit.text:_cpu_up() enable_nonboot_cpus() are used solely from CONFIG_CONFIG_PM_SLEEP_SMP=y and PM_SLEEP_SMP imply HOTPLUG_CPU therefore the reference to _cpu_up() is valid. Annotate enable_nonboot_cpus() with __ref to silence modpost. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:41 -08:00
Pavel Emelyanov	48d13e483c	Don't operate with pid_t in rtmutex tester The proper behavior to store task's pid and get this task later is to get the struct pid pointer and get the task with the pid_task() call. Make it for rt_mutex_waiter->deadlock_task_pid field. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:41 -08:00
Pavel Emelyanov	8dc86af006	Use find_task_by_vpid in posix timers All the functions that need to lookup a task by pid in posix timers obtain this pid from a user space, and thus this value refers to a task in the same namespace, as the current task lives in. So the proper behavior is to call find_task_by_vpid() here. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:41 -08:00
H. Peter Anvin	bdc807871d	avoid overflows in kernel/time.c When the conversion factor between jiffies and milli- or microseconds is not a single multiply or divide, as for the case of HZ == 300, we currently do a multiply followed by a divide. The intervening result, however, is subject to overflows, especially since the fraction is not simplified (for HZ == 300, we multiply by 300 and divide by 1000). This is exposed to the user when passing a large timeout to poll(), for example. This patch replaces the multiply-divide with a reciprocal multiplication on 32-bit platforms. When the input is an unsigned long, there is no portable way to do this on 64-bit platforms there is no portable way to do this since it requires a 128-bit intermediate result (which gcc does support on 64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but since the output is a 32-bit integer in the cases affected, just simplify the multiply-divide (3/10 instead of 300/1000). The reciprocal multiply used can have off-by-one errors in the upper half of the valid output range. This could be avoided at the expense of having to deal with a potential 65-bit intermediate result. Since the intent is to avoid overflow problems and most of the other time conversions are only semiexact, the off-by-one errors were considered an acceptable tradeoff. At Ralf Baechle's suggestion, this version uses a Perl script to compute the necessary constants. We already have dependencies on Perl for kernel compiles. This does, however, require the Perl module Math::BigInt, which is included in the standard Perl distribution starting with version 5.8.0. In order to support older versions of Perl, include a table of canned constants in the script itself, and structure the script so that Math::BigInt isn't required if pulling values from said table. Running the script requires that the HZ value is available from the Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the architectures which didn't already have it (alpha, cris, frv, h8300, m32r, m68k, m68knommu, sparc, v850, and xtensa.) It does not touch the sh or sh64 architectures, since Paul Mundt has dealt with those separately in the sh tree. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Ralf Baechle <ralf@linux-mips.org>, Cc: Sam Ravnborg <sam@ravnborg.org>, Cc: Paul Mundt <lethal@linux-sh.org>, Cc: Richard Henderson <rth@twiddle.net>, Cc: Michael Starvik <starvik@axis.com>, Cc: David Howells <dhowells@redhat.com>, Cc: Yoshinori Sato <ysato@users.sourceforge.jp>, Cc: Hirokazu Takata <takata@linux-m32r.org>, Cc: Geert Uytterhoeven <geert@linux-m68k.org>, Cc: Roman Zippel <zippel@linux-m68k.org>, Cc: William L. Irwin <sparclinux@vger.kernel.org>, Cc: Chris Zankel <chris@zankel.net>, Cc: H. Peter Anvin <hpa@zytor.com>, Cc: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:39 -08:00
Joe Perches	7ef3d2fd17	printk_ratelimit() functions should use CONFIG_PRINTK Makes an embedded image a bit smaller. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:39 -08:00
Li Zefan	6d141c3ff6	workqueue: make delayed_work_timer_fn() static delayed_work_timer_fn() is a timer function, make it static. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:37 -08:00
Adrian Bunk	a36219ac93	The scheduled 'time' option removal The scheduled removal of the 'time' option. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:36 -08:00
Jesper Juhl	efae09f3e9	Nuke duplicate header from sysctl.c Don't include linux/security.h twice in kernel/sysctl.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:34 -08:00
Jesper Juhl	f8db694e46	Nuke a duplicate include from profile.c Remove duplicate inclusion of linux/profile.h from kernel/profile.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:34 -08:00
Jesper Juhl	2dc9c91315	Nuke duplicate include from printk.c Remove the duplicate inclusion of linux/jiffies.h from kernel/printk.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:34 -08:00
Jan Beulich	8b21985c91	constify tables in kernel/sysctl_check.c Remains the question whether it is intended that many, perhaps even large, tables are compiled in without ever having a chance to get used, i.e. whether there shouldn't #ifdef CONFIG_xxx get added. [akpm@linux-foundation.org: fix cut-n-paste error] Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Harvey Harrison	7ad5b3a505	kernel: remove fastcall in kernel/* [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Li Zefan	3eb056764d	time: fix typo in comments Fix typo in comments. BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise checkpatch.pl will be complaining. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Li Zefan	cf4fc6cb76	timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres Function timekeeping_is_continuous() no longer checks flag CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename the function accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Li Zefan	0b858e6ff9	clockevent: simplify list operations list_for_each_safe() suffices here. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Li Zefan	818c357802	clocksource: remove redundant code Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating() won't do anyting with the cs flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pavel Emelyanov	146a505d49	Get rid of the kill_pgrp_info() function There's only one caller left - the kill_pgrp one - so merge these two functions and forget the kill_pgrp_info one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pavel Emelyanov	d5df763b81	Clean up the kill_something_info This is the first step (of two) in removing the kill_pgrp_info. All the users of this function are in kernel/signal.c, but all they need is to call __kill_pgrp_info() with the tasklist_lock read-locked. Fortunately, one of its users is the kill_something_info(), which already needs this lock in one of its branches, so clean these branches up and call the __kill_pgrp_info() directly. Based on Oleg's view of how this function should look. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pavel Emelyanov	6c5f3e7b43	Pidns: make full use of xxx_vnr() calls Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were _all_ converted to operate on the current pid namespace. After this each call like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo) one. Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where appropriate. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Oleg Nesterov	fea9d17554	ITIMER_REAL: convert to use struct pid signal_struct->tsk points to the ->group_leader and thus we have the nasty code in de_thread() which has to change it and restart ->real_timer if the leader is changed. Use "struct pid *leader_pid" instead. This also allows us to kill now unneeded send_group_sig_info(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Oleg Nesterov	d36174bc2b	uglify kill_pid_info() to fix kill() vs exec() race kill_pid_info()->pid_task() could be the old leader of the execing process. In that case it is possible that the leader will be released before we take siglock. This means that kill_pid_info() (and thus sys_kill()) can return a false -ESRCH. Change the code to retry when lock_task_sighand() fails. The endless loop is not possible, __exit_signal() both clears ->sighand and does detach_pid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:28 -08:00
Oleg Nesterov	ac9a8e3f0f	sys_getsid: don't use ->nsproxy directly With the new semantics of find_vpid() we don't need to play with ->nsproxy explicitely, _vxx() do the right things. Also s/tasklist/rcu/. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:28 -08:00
Eric W. Biederman	44c4e1b258	pid: Extend/Fix pid_vnr pid_vnr returns the user space pid with respect to the pid namespace the struct pid was allocated in. What we want before we return a pid to user space is the user space pid with respect to the pid namespace of current. pid_vnr is a very nice optimization but because it isn't quite what we want it is easy to use pid_vnr at times when we aren't certain the struct pid was allocated in our pid namespace. Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the parent process reported in the core dumps and the parent process in get_signal_to_deliver. So unless the performance impact is huge having an interface that does what we want instead of always what we want should be much more reliable and much less error prone. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Eric W. Biederman	161550d74c	pid: sys_wait... fixes This modifies do_wait and eligible child to take a pair of enum pid_type and struct pid *pid to precisely specify what set of processes are eligible to be waited for, instead of the raw pid_t value from sys_wait4. This fixes a bug in sys_waitid where you could not wait for children in just process group 1. This fixes a pid namespace crossing case in eligible_child. Allowing us to wait for a processes in our current process group even if our current process group == 0. This allows the no child with this pid case to be optimized. This allows us to optimize the pid membership test in eligible child to be optimized. This even closes a theoretical pid wraparound race where in a threaded parent if two threads are waiting for the same child and one thread picks up the child and the pid numbers wrap around and generate another child with that same pid before the other thread is scheduled (teribly insanely unlikely) we could end up waiting on the second child with the same pid# and not discover that the specific child we were waiting for has exited. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	5dee1707df	move the related code from exit_notify() to exit_signals() The previous bugfix was not optimal, we shouldn't care about group stop when we are the only thread or the group stop is in progress. In that case nothing special is needed, just set PF_EXITING and return. Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify(). So, from the performance POV the only difference is that we don't trust !signal_pending() until we take ->siglock. But this in fact fixes another ___pure___ theoretical minor race. __group_complete_signal() finds the task without PF_EXITING and chooses it as the target for signal_wake_up(). But nothing prevents this task from exiting in between without noticing the pending signal and thus unpredictably delaying the actual delivery. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	6806aac6d2	sys_setsid: remove now unneeded session != 1 check Eric's "fix clone(CLONE_NEWPID)" eliminated the last reason for this hack. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	d12619b5ff	fix group stop with exit race do_signal_stop() counts all sub-thread and sets ->group_stop_count accordingly. Every thread should decrement ->group_stop_count and stop, the last one should notify the parent. However a sub-thread can exit before it notices the signal_pending(), or it may be somewhere in do_exit() already. In that case the group stop never finishes properly. Note: this is a minimal fix, we can add some optimizations later. Say we can return quickly if thread_group_empty(). Also, we can move some signal related code from exit_notify() to exit_signals(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	430c623121	start the global /sbin/init with 0,0 special pids As Eric pointed out, there is no problem with init starting with sid == pgid == 0, and this was historical linux behavior changed in 2.6.18. Remove kernel_init()->__set_special_pids(), this is unneeded and complicates the rules for sys_setsid(). This change and the previous change in daemonize() mean that /sbin/init does not need the special "session != 1" hack in sys_setsid() any longer. We can't remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so update the comment only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	297bd42b15	move daemonized kernel threads into the swapper's session Daemonized kernel threads run in the init's session. This doesn't match the behaviour of kthread_create()'ed threads, and this is one of the 2 reasons why we need a special hack in sys_setsid(). Now that set_special_pids() was changed to use struct pid, not pid_t, we can use init_struct_pid and set 0,0 special pids. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	8520d7c7f8	teach set_special_pids() to use struct pid Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	e4cc0a9c87	fix setsid() for sub-namespace /sbin/init sys_setsid() still deals with pid_t's from the global namespace. This means that the "session > 1" check can't help for sub-namespace init, setsid() can't succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links. Remove the usage of task_struct->pid and convert the code to use "struct pid". This also simplifies and speedups the code, saves one find_pid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	4e021306cf	sys_setpgid(): simplify pid/ns interaction sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice versa. Use "struct pid" more consistently. Saves one find_vpid() and eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the code. Also use the same_thread_group() helper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	c543f1ee08	wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense. The exit_state was EXIT_ZOMBIE when the function was called, and another thread can change it to EXIT_DEAD right after the check. The second condition is not possible, detached non-traced threads were already filtered out by eligible_child(), we didn't drop tasklist since then. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	3a515e4a62	wait_task_continued/zombie: don't use task_pid_nr_ns() lockless Surprise, the other two wait_task_*() functions also abuse the task_pid_nr_ns() function, and may cause read-after-free or report nr == 0 in wait_task_continued(). wait_task_zombie() doesn't have this problem, but it is still better to cache pid_t rather than call task_pid_nr_ns() three times on the saved pid_namespace. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	f2cc3eb133	do_wait: fix security checks Imho, the current usage of security_task_wait() is not logical. Suppose we have the single child p, and security_task_wait(p) return -EANY. In that case waitpid(-1) returns this error. Why? Isn't it better to return ECHLD? We don't really have reapable children. Now suppose that child was stolen by gdb. In that case we find this child on ->ptrace_children and set flag = 1, but we don't check that the child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't match the behaviour above. Without WNOHANG do_wait() blocks only to return the error later, when the child will be untraced. Inho, really strange. I think eligible_child() should return the error only if the child's pid was requested explicitly, otherwise we should silently ignore the tasks which were nacked by security_task_wait(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	96fabbf55a	do_wait: cleanup delay_group_leader() usage eligible_child() == 2 means delay_group_leader(). With the previous patch this only matters for EXIT_ZOMBIE task, we can move that special check to the only place it is really needed. Also, with this patch we don't skip security_task_wait() for the group leaders in a non-empty thread group. I don't really understand the exact semantics of security_task_wait(), but imho this change is a bugfix. Also rearrange the code a bit to kill an ugly "check_continued" backdoor. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Roland McGrath <roland@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	1bad95c3be	wait_task_stopped(): remove unneeded delay_group_leader check wait_task_stopped() doesn't need the "delay_group_leader" parameter. If the child is not traced it must be a group leader. With or without subthreads ->group_stop_count == 0 when the whole task is stopped. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Mika Penttila <mika.penttila@kolumbus.fi> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	20686a309a	ptrace_stop: fix racy nonstop_code setting If the tracer is gone and we are not going to stop, ptrace_stop() sets ->exit_code = nostop_code. However, the tracer could actually clear the exit code before detaching. In that case get_signal_to_deliver() "resends" the signal which was cancelled by the debugger. For example, it is possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in STOPPED state. Change the behaviour of ptrace_stop(). If the caller is ptrace notify(), we should always clear ->exit_code. If the caller is get_signal_to_deliver(), we should not touch it at all. To do so, change the nonstop_code parameter to "bool clear_code" and change the callers accordingly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	9cbab81005	do_wait: factor out "retval != 0" checks Every branch if the main "if" statement does the same code at the end. Move it down. Also, fix the indentation. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	ee7c82da83	wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace wait_task_stopped() has multiple races with SIGCONT/SIGKILL. tasklist_lock does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info reported (including exit_code) may be wrong. In fact, the code under write_lock_irq(tasklist_lock) is not safe. The child may be PTRACE_DETACH'ed at this time by another subthread, in that case it is possible we are no longer its ->parent. Change wait_task_stopped() to take ->siglock before inspecting the task. This guarantees that the child can't resume and (for example) clear its ->exit_code, so we don't need to use xchg(&p->exit_code) and re-check. The only exception is ptrace_stop() which changes ->state and ->exit_code without ->siglock held during abort. But this can only happen if both the tracer and the tracee are dying (coredump is in progress), we don't care. With this patch wait_task_stopped() doesn't move the child to the end of the ->parent list on success. This optimization could be restored, but in that case we have to take write_lock(tasklist) and do some nasty checks. Also change the do_wait() since we don't return EAGAIN any longer. [akpm@linux-foundation.org: fix up after Willy renamed everything] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	6405f7f467	ptrace_stop: fix the race with ptrace detach+attach If the tracer went away (may_ptrace_stop() failed), ptrace_stop() drops tasklist and then changes the ->state from TASK_TRACED to TASK_RUNNING. This can fool another tracer which attaches to us in between. Change the ->state under tasklist_lock to ensure that ptrace_check_attach() can't wrongly succeed. Also, remove the unnecessary mb(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	c0c0b649d6	ptrace_check_attach: remove unneeded ->signal != NULL check It is not possible to see the PT_PTRACED task without ->signal/sighand under tasklist_lock, release_task() does ptrace_unlink() first. If the task was already released before, ptrace_attach() can't succeed and set PT_PTRACED. Remove this check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	34a1738f7d	kill my_ptrace_child() Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED" inline and simplify the corresponding logic in do_wait: we can't find the child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace() either sets TASK_STOPPED or wakes up the tracee. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Oleg Nesterov	6b39c7bfbd	kill PT_ATTACHED Since the patch "Fix ptrace_attach()/ptrace_traceme()/de_thread() race" commit `f5b40e363a` we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock. This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer" which were needed to catch the "ptrace attach is in progress" case. We can also remove the flag itself since nobody else uses it. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:26 -08:00
Andrew Morton	92dfc9dc7b	fix "modules: make module_address_lookup() safe" Get the constness right, avoid nasty cast. Cc: Ingo Molnar <mingo@elte.hu> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:24 -08:00

1 2 3 4 5 ...

3517 Commits