android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Al Viro	a33e675100	sanitize audit_ipc_obj() * get rid of allocations * make it return void * simplify callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-04 15:14:39 -05:00
Al Viro	f3298dc4f2	sanitize audit_socketcall * don't bother with allocations * now that it can't fail, make it return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-04 15:14:39 -05:00
Al Viro	4f6b434fee	don't reallocate buffer in every audit_sockaddr() No need to do that more than once per process lifetime; allocating/freeing on each sendto/accept/etc. is bloody pointless. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-04 15:14:39 -05:00
KOSAKI Motohiro	8916edef58	getrusage: RUSAGE_THREAD should return ru_utime and ru_stime Impact: task stats regression fix Original getrusage(RUSAGE_THREAD) implementation can return ru_utime and ru_stime. But commit "f06febc: timers: fix itimer/many thread hang" broke it. this patch restores it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-04 14:04:00 +01:00
Linus Torvalds	7d3b56ba37	Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...	2009-01-03 12:04:39 -08:00
Linus Torvalds	61420f59a5	Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting	2009-01-03 11:56:24 -08:00
Mike Travis	6ca09dfc9f	sched: put back some stack hog changes that were undone in kernel/sched.c Impact: prevents panic from stack overflow on numa-capable machines. Some of the "removal of stack hogs" changes in kernel/sched.c by using node_to_cpumask_ptr were undone by the early cpumask API updates, and causes a panic due to stack overflow. This patch undoes those changes by using cpumask_of_node() which returns a 'const struct cpumask '. In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further reducing stack usage. (Both of these updates removed 9 FIXME's!) Also: Pick up some remaining changes from the old 'cpumask_t' functions to the new 'struct cpumask ' functions. Optimize memory traffic by allocating each percpu local_cpu_mask on the same node as the referring cpu. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 19:00:09 +01:00
Ingo Molnar	6bdf197b04	ia64: cpumask fix for is_affinity_mask_valid() Impact: build fix on ia64 ia64's default_affinity_write() still had old cpumask_t usage: /home/mingo/tip/kernel/irq/proc.c: In function `default_affinity_write': /home/mingo/tip/kernel/irq/proc.c:114: error: incompatible type for argument 1 of `is_affinity_mask_valid' make[3]: * [kernel/irq/proc.o] Error 1 make[3]: * Waiting for unfinished jobs.... update it to cpumask_var_t. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 18:59:33 +01:00
Ingo Molnar	263ec6457b	cpumask: convert RCU implementations, fix Impact: cleanup This warning: kernel/rcuclassic.c: In function ‘rcu_start_batch’: kernel/rcuclassic.c:397: warning: passing argument 1 of ‘cpumask_andnot’ from incompatible pointer type triggers because one usage site of rcp->cpumask was not converted to to_cpumask(rcp->cpumask). There's no ill effects of this bug. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 18:59:25 +01:00
Mike Travis	7eb1955336	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 18:53:31 +01:00
Darren Hart	90621c40cc	futex: catch certain assymetric (get\|put)_futex_key calls Impact: add debug check Following up on my previous key reference accounting patches, this patch will catch puts on keys that haven't been "got". This won't catch nested get/put mismatches though. Build and boot tested, with minimal desktop activity and a run of the open_posix_testsuite in LTP for testing. No warnings logged. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-02 23:10:44 +01:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Mike Galbraith	0a582440ff	sched: fix sched_slice() Impact: fix bad-interactivity buglet Fix sched_slice() to emit a sane result whether a task is currently enqueued or not. Signed-off-by: Mike Galbraith <efault@gmx.de> Tested-by: Jayson King <dev@jaysonking.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched_fair.c \| 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-)	2009-01-02 17:10:43 +01:00
Rusty Russell	5db0e1e9e0	cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ Impact: cleanup Simple replacement, now the _nr is redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@redhat.com>	2009-01-01 10:12:29 +10:30
Rusty Russell	41c7bb9588	cpumask: convert rest of files in kernel/ Impact: Reduce stack usage, use new cpumask API. Mainly changing cpumask_t to 'struct cpumask' and similar simple API conversion. Two conversions worth mentioning: 1) we use cpumask_any_but to avoid a temporary in kernel/softlockup.c, 2) Use cpumask_var_t in taskstats_user_cmd(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com>	2009-01-01 10:12:28 +10:30
Rusty Russell	e0b582ec56	cpumask: convert kernel/cpu.c Impact: Reduce kernel stack and memory usage, use new cpumask API. Use cpumask_var_t for take_cpu_down() stack var, and frozen_cpus. Note that notify_cpu_starting() can be called before core_initcall allocates frozen_cpus, but the NULL check is optimized out by gcc for the CONFIG_CPUMASK_OFFSTACK=n case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:28 +10:30
Rusty Russell	c309b917ca	cpumask: convert kernel/profile.c Impact: Reduce kernel memory usage, use new cpumask API. Avoid a static cpumask_t for prof_cpu_mask, and an on-stack cpumask_t in prof_cpu_mask_write_proc. Both become cpumask_var_t. prof_cpu_mask is only allocated when profiling is on, but the NULL checks are optimized out by gcc for the !CPUMASK_OFFSTACK case. Also removed some strange and unnecessary casts. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:27 +10:30
Rusty Russell	bd232f97b3	cpumask: convert RCU implementations Impact: use new cpumask API. rcu_ctrlblk contains a cpumask, and it's highly optimized so I don't want a cpumask_var_t (ie. a pointer) for the CONFIG_CPUMASK_OFFSTACK case. It could use a dangling bitmap, and be allocated in __rcu_init to save memory, but for the moment we use a bitmap. (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK, so we use a bitmap here to show we really mean it). We remove on-stack cpumasks, using cpumask_var_t for rcu_torture_shuffle_tasks() and for_each_cpu_and in force_quiescent_state(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:26 +10:30
Rusty Russell	d036e67b40	cpumask: convert kernel/irq Impact: Reduce stack usage, use new cpumask API. ALPHA mod! Main change is that irq_default_affinity becomes a cpumask_var_t, so treat it as a pointer (this effects alpha). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:26 +10:30
Rusty Russell	6b954823c2	cpumask: convert kernel time functions Impact: Use new APIs Convert kernel/time functions to use struct cpumask *. Note the ugly bitmap declarations in tick-broadcast.c. These should be cpumask_var_t, but there was no obvious initialization function to put the alloc_cpumask_var() calls in. This was safe. (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK, so we use a bitmap here to show we really mean it). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-01 10:12:25 +10:30
Rusty Russell	e7577c50f2	cpumask: convert kernel/workqueue.c Impact: Reduce memory usage, use new cpumask API. cpu_populated_map becomes a cpumask_var_t, and cpu_singlethread_map is simply a cpumask pointer: it's simply the cpumask containing the first possible CPU anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:25 +10:30
Rusty Russell	a45185d2d7	cpumask: convert kernel/compat.c Impact: Reduce stack usage, use new cpumask API. Straightforward conversion; cpumasks' size is given by cpumask_size() (now a variable rather than fixed) and on-stack cpu masks use cpumask_var_t. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:24 +10:30
Rusty Russell	f1fc057c79	cpumask: remove any_online_cpu() users: kernel/ Impact: Remove obsolete API usage any_online_cpu() is a good name, but it takes a cpumask_t, not a pointer. There are several places where any_online_cpu() doesn't really want a mask arg at all. Replace all callers with cpumask_any() and cpumask_any_and(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-01 10:12:23 +10:30
Rusty Russell	4462344ee9	cpumask: convert kernel trace functions further Impact: Reduce future memory usage, use new cpumask API. Since the last patch was created and acked, more old cpumask users slipped into kernel/trace. Mostly trivial conversions, except struct trace_iterator's "started" member becomes a cpumask_var_t. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-01 10:12:23 +10:30
Rusty Russell	9e01c1b74c	cpumask: convert kernel trace functions Impact: Reduce future memory usage, use new cpumask API. (Eventually, cpumask_var_t will be allocated based on nr_cpu_ids, not NR_CPUS). Convert kernel trace functions to use struct cpumask API: 1) Use cpumask_copy/cpumask_test_cpu/for_each_cpu. 2) Use cpumask_var_t and alloc_cpumask_var/free_cpumask_var everywhere. 3) Use on_each_cpu instead of playing with current->cpus_allowed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>	2009-01-01 10:12:22 +10:30
Rusty Russell	4f4b6c1a94	cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: core Impact: cleanup In future, all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: James Morris <jmorris@namei.org> Cc: Eric Biederman <ebiederm@xmission.com>	2009-01-01 10:12:15 +10:30
Linus Torvalds	db200df0b3	Merge branch 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sparseirq: move __weak symbols into separate compilation unit sparseirq: work around __weak alias bug sparseirq: fix hang with !SPARSE_IRQ sparseirq: set lock_class for legacy irq when sparse_irq is selected sparseirq: work around compiler optimizing away __weak functions sparseirq: fix desc->lock init sparseirq: do not printk when migrating IRQ descriptors sparseirq: remove duplicated arch_early_irq_init() irq: simplify for_each_irq_desc() usage proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c irq: for_each_irq_desc() move to irqnr.h hrtimer: remove #include <linux/irq.h>	2008-12-31 09:00:59 -08:00
Martin Schwidefsky	79741dd357	[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Martin Schwidefsky	457533a7d3	[PATCH] fix scaled & unscaled cputime accounting The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Rusty Russell	2ca1a61583	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/x86/kernel/io_apic.c	2008-12-31 23:05:57 +10:30
Thomas Gleixner	1c5745aa38	sched_clock: prevent scd->clock from moving backwards, take #2 Redo: `5b7dba4`: sched_clock: prevent scd->clock from moving backwards which had to be reverted due to s2ram hangs: `ca7e716`: Revert "sched_clock: prevent scd->clock from moving backwards" ... this time with resume restoring GTOD later in the sequence taken into account as well. The "timekeeping_suspended" flag is not very nice but we cannot call into GTOD before it has been properly resumed and the scheduler will run very early in the resume sequence. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-31 09:53:21 +01:00
Ingo Molnar	818fa7f390	Merge branch 'tracing/kmemtrace' into tracing/kmemtrace2	2008-12-31 08:19:48 +01:00
Ingo Molnar	5fdf7e5975	Merge branch 'linus' into tracing/kmemtrace Conflicts: mm/slub.c	2008-12-31 08:14:29 +01:00
Linus Torvalds	6a94cb7306	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: (184 commits) [XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI [XFS] handle unaligned data in xfs_bmbt_disk_get_all [XFS] avoid memory allocations in xfs_fs_vcmn_err [XFS] Fix speculative allocation beyond eof [XFS] Remove XFS_BUF_SHUT() and friends [XFS] Use the incore inode size in xfs_file_readdir() [XFS] set b_error from bio error in xfs_buf_bio_end_io [XFS] use inode_change_ok for setattr permission checking [XFS] add a FMODE flag to make XFS invisible I/O less hacky [XFS] resync headers with libxfs [XFS] simplify projid check in xfs_rename [XFS] replace b_fspriv with b_mount [XFS] Remove unused tracing code [XFS] Remove unnecessary assertion [XFS] Remove unused variable in ktrace_free() [XFS] Check return value of xfs_buf_get_noaddr() [XFS] Fix hang after disallowed rename across directory quota domains [XFS] Fix compile with CONFIG_COMPAT enabled move inode tracing out of xfs_vnode. move vn_iowait / vn_iowake into xfs_aops.c ...	2008-12-30 17:48:25 -08:00
Huang Weiyi	1af237a099	tracing: removed duplicated #include Removed duplicated #include in kernel/trace/trace.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-30 17:35:40 -08:00
Linus Torvalds	526ea064f9	Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: oprofile: select RING_BUFFER ring_buffer: adding EXPORT_SYMBOLs oprofile: fix lost sample counter oprofile: remove nr_available_slots() oprofile: port to the new ring_buffer ring_buffer: add remaining cpu functions to ring_buffer.h oprofile: moving cpu_buffer_reset() to cpu_buffer.h oprofile: adding cpu_buffer_entries() oprofile: adding cpu_buffer_write_commit() oprofile: adding cpu buffer r/w access functions ftrace: remove unused function arg in trace_iterator_increment() ring_buffer: update description for ring_buffer_alloc() oprofile: set values to default when creating oprofilefs oprofile: implement switch/case in buffer_sync.c x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c x86/oprofile: reordering IBS code in op_model_amd.c oprofile: fix typo oprofile: whitspace changes only oprofile: update comment for oprofile_add_sample() oprofile: comment cleanup	2008-12-30 17:31:25 -08:00
Linus Torvalds	6de71484cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (98 commits) sparc: move select of ARCH_SUPPORTS_MSI sparc: drop SUN_IO sparc: unify sections.h sparc: use .data.init_task section for init_thread_union sparc: fix array overrun check in of_device_64.c sparc: unify module.c sparc64: prepare module_64.c for unification sparc64: use bit neutral Elf symbols sparc: unify module.h sparc: introduce CONFIG_BITS sparc: fix hardirq.h removal fallout sparc64: do not export pus_fs_struct sparc: use sparc64 version of scatterlist.h sparc: Commonize memcmp assembler. sparc: Unify strlen assembler. sparc: Add asm/asm.h sparc: Kill memcmp_32.S code which has been ifdef'd out for centuries. sparc: replace for_each_cpu_mask_nr with for_each_cpu sparc: fix sparse warnings in irq_32.c sparc: add include guards to kernel.h ...	2008-12-30 17:23:31 -08:00
Linus Torvalds	1dff81f20c	Merge branch 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block: (43 commits) bio: get rid of bio_vec clearing bounce: don't rely on a zeroed bio_vec list cciss: simplify parameters to deregister_disk function cfq-iosched: fix race between exiting queue and exiting task loop: Do not call loop_unplug for not configured loop device. loop: Flush possible running bios when loop device is released. alpha: remove dead BIO_VMERGE_BOUNDARY Get rid of CONFIG_LSF block: make blk_softirq_init() static block: use min_not_zero in blk_queue_stack_limits block: add one-hit cache for disk partition lookup cfq-iosched: remove limit of dispatch depth of max 4 times quantum nbd: tell the block layer that it is not a rotational device block: get rid of elevator_t typedef aio: make the lookup_ioctx() lockless bio: add support for inlining a number of bio_vecs inside the bio bio: allow individual slabs in the bio_set bio: move the slab pointer inside the bio_set bio: only mempool back the largest bio_vec slab cache block: don't use plugging on SSD devices ...	2008-12-30 17:20:05 -08:00
Linus Torvalds	179475a3b4	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, sparseirq: clean up Kconfig entry x86: turn CONFIG_SPARSE_IRQ off by default sparseirq: fix numa_migrate_irq_desc dependency and comments sparseirq: add kernel-doc notation for new member in irq_desc, -v2 locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP sparseirq, xen: make sure irq_desc is allocated for interrupts sparseirq: fix !SMP building, #2 x86, sparseirq: move irq_desc according to smp_affinity, v7 proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ sparse irqs: add irqnr.h to the user headers list sparse irqs: handle !GENIRQ platforms sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build sparseirq: fix Alpha build failure sparseirq: fix typo in !CONFIG_IO_APIC case x86, MSI: pass irq_cfg and irq_desc x86: MSI start irq numbering from nr_irqs_gsi x86: use NR_IRQS_LEGACY sparse irq_desc[] array: core kernel and x86 changes genirq: record IRQ_LEVEL in irq_desc[] irq.h: remove padding from irq_desc on 64bits	2008-12-30 16:20:19 -08:00
Linus Torvalds	bb758e9637	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix warning in kernel/hrtimer.c x86: make sure we really have an hpet mapping before using it x86: enable HPET on Fujitsu u9200 linux/timex.h: cleanup for userspace posix-timers: simplify de_thread()->exit_itimers() path posix-timers: check ->it_signal instead of ->it_pid to validate the timer posix-timers: use "struct pid" instead of "struct task_struct" nohz: suppress needless timer reprogramming clocksource, acpi_pm.c: put acpi_pm_read_slow() under CONFIG_PCI nohz: no softirq pending warnings for offline cpus hrtimer: removing all ur callback modes, fix hrtimer: removing all ur callback modes, fix hotplug hrtimer: removing all ur callback modes x86: correct link to HPET timer specification rtc-cmos: export second NVRAM bank Fixed up conflicts in sound/drivers/pcsp/pcsp.c and sound/core/hrtimer.c manually.	2008-12-30 16:16:21 -08:00
Linus Torvalds	5f34fe1cfc	Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits) stacktrace: provide save_stack_trace_tsk() weak alias rcu: provide RCU options on non-preempt architectures too printk: fix discarding message when recursion_bug futex: clean up futex_(un)lock_pi fault handling "Tree RCU": scalable classic RCU implementation futex: rename field in futex_q to clarify single waiter semantics x86/swiotlb: add default swiotlb_arch_range_needs_mapping x86/swiotlb: add default phys<->bus conversion x86: unify pci iommu setup and allow swiotlb to compile for 32 bit x86: add swiotlb allocation functions swiotlb: consolidate swiotlb info message printing swiotlb: support bouncing of HighMem pages swiotlb: factor out copy to/from device swiotlb: add arch hook to force mapping swiotlb: allow architectures to override phys<->bus<->phys conversions swiotlb: add comment where we handle the overflow of a dma mask on 32 bit rcu: fix rcutorture behavior during reboot resources: skip sanity check of busy resources swiotlb: move some definitions to header swiotlb: allow architectures to override swiotlb pool allocation ... Fix up trivial conflicts in arch/x86/kernel/Makefile arch/x86/mm/init_32.c include/linux/hardirq.h as per Ingo's suggestions.	2008-12-30 16:10:19 -08:00
Ingo Molnar	3fd4bc015e	tracing/kmemtrace: export kmemtrace_mark_alloc_node() / kmemtrace_mark_free() Impact: build fix Also fix up Kconfig dependencies and include files. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 16:06:00 +01:00
Frederic Weisbecker	36994e58a4	tracing/kmemtrace: normalize the raw tracer event to the unified tracing API Impact: new tracer plugin This patch adapts kmemtrace raw events tracing to the unified tracing API. To enable and use this tracer, just do the following: echo kmemtrace > /debugfs/tracing/current_tracer cat /debugfs/tracing/trace You will have the following output: # tracer: kmemtrace # # # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER # FREE \| \| \| \| \| \| \| \| # \| type_id 1 call_site 18446744071565527833 ptr 18446612134395152256 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 That was to stay backward compatible with the format output produced in inux/tracepoint.h. This is the default ouput, but note that I tried something else. If you change an option: echo kmem_minimalistic > /debugfs/trace_options and then cat /debugfs/trace, you will have the following output: # tracer: kmemtrace # # # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER # FREE \| \| \| \| \| \| \| \| # \| - C 0xffff88007c088780 file_free_rcu + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc + K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname - C 0xffff88007cad6000 putname + K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode + K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info + K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc + K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp + K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp Yeah I shall confess kmem_minimalistic should be: kmem_alternative. Whatever, I find it more readable but this a personal opinion of course. We can drop it if you want. On the ALLOC/FREE column, + means an allocation and - a free. On the type column, you have K = kmalloc, C = cache, P = page I would like the flags to be GFP_* strings but that would not be easy to not break the column with strings.... About the node...it seems to always be -1. I don't know why but that shouldn't be difficult to find. I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would be more easy to find the tracer headers if they are all in their common directory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 09:36:13 +01:00
Ingo Molnar	7a51cffbd1	relayfs: replace BUG() with WARN_ON() in relay_late_setup_files() Impact: turn boot crash into boot warning This BUG() can trigger: [ 16.684131] initcall fail_page_alloc_debugfs+0x0/0xc1 returned 0 after 0 usecs [ 16.692035] calling kmemtrace_setup_late+0x0/0xd5 @ 1 [ 16.700087] relay_late_setup_files: CPU 1 has no buffer, it must have! [ 16.704044] ------------[ cut here ]------------ [ 16.708030] kernel BUG at kernel/relay.c:680! [ 16.708030] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 16.708030] last sysfs file: [ 16.708030] [ 16.708030] Pid: 1, comm: swapper Not tainted (2.6.28-tip-03903-g9a39f58-dirty #13207) System Product Name [ 16.708030] EIP: 0060:[<c01604ae>] EFLAGS: 00010246 CPU: 1 [ 16.708030] EIP is at relay_late_setup_files+0x8c/0x176 Reduce it to a more reportable WARN_ONCE(). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 06:57:11 +01:00
Darren Hart	42d35d48ce	futex: make futex_(get\|put)_key() calls symmetric Impact: cleanup This patch makes the calls to futex_get_key_refs() and futex_drop_key_refs() explicitly symmetric by only "putting" keys we successfully "got". Also cleanup a couple return points that didn't "put" after a successful "get". Build and boot tested on an x86_64 system. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 06:38:15 +01:00
Rusty Russell	ce47d974f7	cpumask: arch_send_call_function_ipi_mask: core Impact: new API to reduce stack usage We're weaning the core code off handing cpumask's around on-stack. This introduces arch_send_call_function_ipi_mask(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:05:17 +10:30
Rusty Russell	54b11e6d57	cpumask: smp_call_function_many() Impact: Implementation change to remove cpumask_t from stack. Actually change smp_call_function_mask() to smp_call_function_many(). We avoid cpumasks on the stack in this version. (S390 has its own version, but that's going away apparently). We have to do some dancing to figure out if 0 or 1 other cpus are in the mask supplied and the online mask without allocating a tmp cpumask. It's still fairly cheap. We allocate the cpumask at the end of the call_function_data structure: if allocation fails we fallback to smp_call_function_single rather than using the baroque quiescing code (which needs a cpumask on stack). (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: npiggin@suse.de Cc: axboe@kernel.dk	2008-12-30 09:05:16 +10:30
Rusty Russell	3fa4152069	cpumask: make set_cpu_/init_cpu_ out-of-line They're only for use in boot/cpu hotplug code anyway, and this avoids the use of deprecated cpu_*_map. Stephen Rothwell points out that gcc 4.2.4 (on powerpc at least) didn't like the cast away of const anyway: include/linux/cpumask.h: In function 'set_cpu_possible': include/linux/cpumask.h:1052: warning: passing argument 2 of 'cpumask_set_cpu' discards qualifiers from pointer target type So this kills two birds with one stone. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:05:16 +10:30
Rusty Russell	b3199c025d	cpumask: switch over to cpu_online/possible/active/present_mask: core Impact: cleanup This implements the obsolescent cpu_online_map in terms of cpu_online_mask, rather than the other way around. Same for the other maps. The documentation comments are also updated to refer to _mask rather than _map. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2008-12-30 09:05:14 +10:30
Rusty Russell	33edcf133b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-12-30 08:02:35 +10:30
Ingo Molnar	a103e2ab73	tracing/selftest: remove TRACE_CONT reference Impact: build fix TRACE_CONT is gone - fix up the self-test too. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 15:07:47 +01:00
Ingo Molnar	f7d48cbde5	tracing/ftrace: make trace_find_cmdline() generally available Impact: build fix On !CONFIG_CONTEXT_SWITCH_TRACER trace_find_cmdline() is not defined: kernel/trace/trace_output.c: In function 'trace_ctxwake_print': kernel/trace/trace_output.c:499: error: implicit declaration of function 'trace_find_cmdline' kernel/trace/trace_output.c:499: warning: assignment makes pointer from integer without a cast Move it to the generic section in trace.h. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 13:06:24 +01:00
Frederic Weisbecker	e302cf3f96	tracing/branch-tracer: adapt to the stat tracing API Impact: refactor the branch tracer This patch adapts the branch tracer to the tracing API. This is a proof of concept because the branch tracer implements two "stat tracing" that were split in two files. So I added an option to the branch tracer: stat_all_branch. If it is set, then trace_stat will output all of the branches entries stats. Otherwise, it will print the annotated branches. Its is a kind of quick trick, waiting for a better solution. By default, the annotated branches stat are sorted by incorrect branch prediction percentage. Ie: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 0 1 100 native_smp_prepare_cpus smpboot.c 1228 0 1 100 hpet_rtc_timer_reinit hpet.c 1057 0 18032 100 sched_info_queued sched_stats.h 223 0 684 100 yield_task_fair sched_fair.c 984 0 282 100 pre_schedule_rt sched_rt.c 1263 0 13414 100 sched_info_dequeued sched_stats.h 178 0 21724 100 sched_info_switch sched_stats.h 270 0 1 100 get_signal_to_deliver signal.c 1820 0 8 100 __cancel_work_timer workqueue.c 560 0 212 100 verify_export_symbols module.c 1509 0 17 100 __rmqueue_fallback page_alloc.c 793 0 43 100 clear_page_mlock internal.h 129 0 124 100 try_to_unmap_anon rmap.c 1021 0 53 100 try_to_unmap_anon rmap.c 1013 0 6 100 vma_address rmap.c 232 0 3301 100 try_to_unmap_file rmap.c 1082 0 466 100 try_to_unmap_file rmap.c 1077 0 1 100 mem_cgroup_create memcontrol.c 1090 0 3 100 inotify_find_update_watch inotify.c 726 2 30163 99 perf_counter_task_sched_out perf_counter.c 385 1 2935 99 percpu_free allocpercpu.c 138 1544 297672 99 dentry_lru_del_init dcache.c 153 8 1074 99 input_pass_event input.c 86 1390 76781 98 mapping_unevictable pagemap.h 50 280 6665 95 pick_next_task_rt sched_rt.c 889 750 4826 86 next_pidmap pid.c 194 2 8 80 blocking_notifier_chain_regist notifier.c 220 36 130 78 ioremap_pte_range ioremap.c 22 1093 3247 74 IS_ERR err.h 34 1023 2908 73 sched_slice sched_fair.c 445 22 60 73 disk_put_part genhd.h 206 [...] It enables a developer to quickly address the source of incorrect branch predictions. Note that this sorting would be better with a second sort on the number of incorrect predictions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:55:46 +01:00
Frederic Weisbecker	dbd0b4b330	tracing/ftrace: provide the base infrastructure for histogram tracing Impact: extend the tracing API The goal of this patch is to normalize and make more easy the implementation of statistical (histogram) tracing. It implements a trace_stat file into the /debugfs/tracing directory where one can print a one-shot output of statistics/histogram entries. A tracer has to provide two basic iterator callbacks: stat_start() => the first entry stat_next(prev, idx) => the next one. Note that it is adapted for arrays or hash tables or lists.... since it provides a pointer to the previous entry and the current index of the iterator. These two callbacks are called to get a snapshot of the statistics at each opening of the trace_stat file because. The values are so updated between two "cat trace_stat". And the tracer is free to lock its datas during the iteration to keep consistent values. Since it is almost always interesting to sort statisticals values to address the problems by priority, this infrastructure provides a "sorting" of the stat entries too if desired. A tracer has just to provide a stat_cmp callback to compare two entries and the stat tracing infrastructure will build a sorted list of the given entries. A last callback, called stat_headers, can be implemented by a tracer to output headers on its trace. If one of these callbacks is changed on runtime, it just have to signal it to the stat tracing API by calling the init_tracer_stat() helper. Changes in V2: - Fix a memory leak if the user opens multiple times the trace_stat file without closing it. Now we always free our list before rebuilding it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:55:45 +01:00
Steven Rostedt	f633cef020	ftrace: change trace.c to use registered events Impact: rework trace.c to use new event register API Almost every ftrace event has to implement its output display in trace.c through a different function. Some events did not handle all the formats (trace, latency-trace, raw, hex, binary), and this method does not scale well. This patch converts the format functions to use the event API to find the event and and print its format. Currently, we have a print function for trace, latency_trace, raw, hex and binary. A trace_nop_print is available if the event wants to avoid output on a particular format. Perhaps other tracers could use this in the future (like mmiotrace and function_graph). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:46:12 +01:00
Steven Rostedt	f0868d1e23	ftrace: set up trace event hash infrastructure Impact: simplify/generalize/refactor trace.c The trace.c file is becoming more difficult to maintain due to the growing number of events. There is several formats that an event may be printed. This patch sets up the infrastructure of an event hash to allow for events to register how they should be printed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:46:11 +01:00
Steven Rostedt	c47956d9ae	ftrace: remove obsolete print continue functionality Impact: cleanup, remove obsolete code Now that the ring buffer used by ftrace allows for variable length entries, we do not need the 'cont' feature of the buffer. This code makes other parts of ftrace more complex and by removing this it simplifies the ftrace code. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:46:10 +01:00
Yinghai Lu	43a256322a	sparseirq: move __weak symbols into separate compilation unit GCC has a bug with __weak alias functions: if the functions are in the same compilation unit as their call site, GCC can decide to inline them - and thus rob the linker of the opportunity to override the weak alias with the real thing. So move all the IRQ handling related __weak symbols to kernel/irq/chip.c. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 12:15:49 +01:00
Ingo Molnar	0f01f07fad	Merge branches 'tracing/docs', 'tracing/function-graph-tracer' and 'linus' into tracing/core	2008-12-29 11:25:11 +01:00
Jens Axboe	abf137dd77	aio: make the lookup_ioctx() lockless The mm->ioctx_list is currently protected by a reader-writer lock, so we always grab that lock on the read side for doing ioctx lookups. As the workload is extremely reader biased, turn this into an rcu hlist so we can make lookup_ioctx() lockless. Get rid of the rwlock and use a spinlock for providing update side exclusion. There's usually only 1 entry on this list, so it doesn't make sense to look into fancier data structures. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:50 +01:00
Nikanth Karthikesan	7c0990c7ee	Do not free io context when taking recursive faults in do_exit When taking recursive faults in do_exit, if the io_context is not null, exit_io_context() is being called. But it might decrement the refcount more than once. It is better to leave this task alone. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:43 +01:00
Lachlan McIlroy	0a8c5395f9	[XFS] Fix merge failures Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/linux-2.6/xfs_cred.h fs/xfs/linux-2.6/xfs_globals.h fs/xfs/linux-2.6/xfs_ioctl.c fs/xfs/xfs_vnodeops.h Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-29 16:47:18 +11:00
David S. Miller	e3c6d4ee54	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/sparc64/kernel/idprom.c	2008-12-28 20:19:47 -08:00
Ingo Molnar	b2e2fe9962	sparseirq: work around __weak alias bug Impact: fix boot crash if the kernel is built with certain GCC versions GCC has a bug with __weak alias functions: if the functions are in the same compilation unit as their call site, GCC can decide to inline them - and thus rob the linker of the opportunity to override the weak alias with the real thing. This can lead to the boot crash reported by Kamalesh Babulal: ACPI: Core revision 20080926 Setting APIC routing to flat BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff8021f9a8>] add_pin_to_irq_cpu+0x14/0x74 PGD 0 Oops: 0000 [#1] SMP [...] So move the arch_init_chip_data() function from handle.c to manage.c. Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-29 00:19:55 +01:00
Linus Torvalds	96faec945f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits) allow stripping of generated symbols under CONFIG_KALLSYMS_ALL kbuild: strip generated symbols from *.ko kbuild: simplify use of genksyms kernel-doc: check for extra kernel-doc notations kbuild: add headerdep used to detect inclusion cycles in header files kbuild: fix string equality testing in tags.sh kbuild: fix make tags/cscope kbuild: fix make incompatibility kbuild: remove TAR_IGNORE setlocalversion: add git-svn support setlocalversion: print correct subversion revision scripts: improve the decodecode script scripts/package: allow custom options to rpm genksyms: allow to ignore symbol checksum changes genksyms: track symbol checksum changes tags and cscope support really belongs in a shell script kconfig: fix options to check-lxdialog.sh kbuild: gen_init_cpio expands shell variables in file names remove bashisms from scripts/extract-ikconfig kbuild: teach mkmakfile to be silent ...	2008-12-28 15:13:48 -08:00
Linus Torvalds	a39b863342	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) sched: fix warning in fs/proc/base.c schedstat: consolidate per-task cpu runtime stats sched: use RCU variant of list traversal in for_each_leaf_rt_rq() sched, cpuacct: export percpu cpuacct cgroup stats sched, cpuacct: refactoring cpuusage_read / cpuusage_write sched: optimize update_curr() sched: fix wakeup preemption clock sched: add missing arch_update_cpu_topology() call sched: let arch_update_cpu_topology indicate if topology changed sched: idle_balance() does not call load_balance_newidle() sched: fix sd_parent_degenerate on non-numa smp machine sched: add uid information to sched_debug for CONFIG_USER_SCHED sched: move double_unlock_balance() higher sched: update comment for move_task_off_dead_cpu sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares sched/rt: removed unneeded defintion sched: add hierarchical accounting to cpu accounting controller sched: include group statistics in /proc/sched_debug sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER sched: clean up SCHED_CPUMASK_ALLOC ...	2008-12-28 12:27:58 -08:00
Linus Torvalds	b0f4b285d7	Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits) sched, trace: update trace_sched_wakeup() tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3 Revert "x86: disable X86_PTRACE_BTS" ring-buffer: prevent false positive warning ring-buffer: fix dangling commit race ftrace: enable format arguments checking x86, bts: memory accounting x86, bts: add fork and exit handling ftrace: introduce tracing_reset_online_cpus() helper tracing: fix warnings in kernel/trace/trace_sched_switch.c tracing: fix warning in kernel/trace/trace.c tracing/ring-buffer: remove unused ring_buffer size trace: fix task state printout ftrace: add not to regex on filtering functions trace: better use of stack_trace_enabled for boot up code trace: add a way to enable or disable the stack tracer x86: entry_64 - introduce FTRACE_ frame macro v2 tracing/ftrace: add the printk-msg-only option tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp() x86, bts: correctly report invalid bts records ... Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits being already partly merged by the SH merge.	2008-12-28 12:21:10 -08:00
Yinghai Lu	12026ea16a	sparseirq: fix hang with !SPARSE_IRQ Impact: fix hang Suresh report his two sockets system only works with SPARSE_IRQ enable it turns out we miss the setting desc->irq so provide early_irq_init() even !SPARSE_IRQ to set desc->irq Reported-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-27 17:52:07 +01:00
Yinghai Lu	fa6beb37b0	sparseirq: set lock_class for legacy irq when sparse_irq is selected Impact: add lockdep annotation to legacy IRQ descs Warnings resulting out of this were not seen in practice, but it's prudent to initialize the legacy descriptors to the lock class as well, symmetric to how we do it with other descriptors. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-27 17:51:45 +01:00
Yinghai Lu	13a0c3c269	sparseirq: work around compiler optimizing away __weak functions Impact: fix panic on null pointer with sparseirq Some GCC versions seem to inline the weak global function, when that function is empty. Work it around, by making the functions return a (dummy) integer. Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-27 13:24:00 +01:00
Ingo Molnar	793f7b12a0	sparseirq: fix desc->lock init Impact: cleanup init_one_irq_desc() does not initialize the desc->lock properly - you cannot init a lock by memcpying some other lock on it. This happens to work right now (because irq_desc_init is never in use), but it's a dangerous construct nevertheless, so fix it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-27 09:29:22 +01:00
Ingo Molnar	8b07cd4451	sparseirq: do not printk when migrating IRQ descriptors Impact: reduce printk noise There were a couple of leftover KERN_DEBUG debugging printks, remove them. Also clarify an error message. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-27 09:29:21 +01:00
Rusty Russell	be4d638c15	cpumask: Replace cpu_coregroup_map with cpu_coregroup_mask cpu_coregroup_map returned a cpumask_t: it's going away. (Note, the sched part of this patch won't apply meaningfully to the sched tree, but I'm posting it to show the goal). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com>	2008-12-26 22:23:43 +10:30
Yinghai Lu	00c2363487	sparseirq: remove duplicated arch_early_irq_init() Impact: clean up We already have a weak copy of this function in init/main.c Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 10:43:30 +01:00
Frederic Weisbecker	412d0bb553	tracing/function-graph-tracer: strip ending newlines on comments Impact: tracer output improvement Ending newlines are appended automatically on comments by the function graph tracer because the newline needs to be placed after the "*/" comment characters. So if the user puts an ending newline, we want to strip it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 10:42:14 +01:00
KOSAKI Motohiro	18eefedfe8	irq: simplify for_each_irq_desc() usage Impact: cleanup all for_each_irq_desc() usage point have !desc check. then its check can move into for_each_irq_desc() macro. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 09:48:18 +01:00
KOSAKI Motohiro	26ddd8d5ca	proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c Impact: cleanup irq_desc can be NULL when CONFIG_SPARSE_IRQ=y only. therefore, NULL checking can move into kstat_irqs_cpu() of SPARSE_IRQ version. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: "Yinghai Lu" <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 09:48:18 +01:00
KOSAKI Motohiro	f9af0e7091	irq: for_each_irq_desc() move to irqnr.h Impact: cleanup before CONFIG_SPARSE_IRQ age, for_each_irq_desc() sat in irqnr.h and could be called from generic code. CONFIG_SPARSE_IRQ breaks this assumption, but SPARSE_IRQ version for_each_irq_desc() also can move into irqnr.h easily. Also, this patch unifies CONFIG_SPARSE_IRQ and !CONFIG_SPARSE_IRQ for_each_irq_desc(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 09:48:17 +01:00
KOSAKI Motohiro	51bc39f4ba	hrtimer: remove #include <linux/irq.h> Impact: cleanup <linux/irq.h> can be removed and should be, because: - hrtimer doesn't use any irq feature. - <linux/irq.h> shouldn't be include from generic code. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 09:48:16 +01:00
Ingo Molnar	32e8d18683	Merge branches 'timers/clocksource', 'timers/hpet', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/rtc' into timers/core	2008-12-25 18:02:25 +01:00
Ingo Molnar	860cf8894b	Merge branches 'irq/sparseirq', 'irq/genirq' and 'irq/urgent'; commit 'v2.6.28' into irq/core	2008-12-25 16:27:54 +01:00
Ingo Molnar	6638101c11	Merge branches 'core/debugobjects', 'core/iommu', 'core/locking', 'core/printk', 'core/rcu', 'core/resources', 'core/softirq' and 'core/stacktrace' into core/core	2008-12-25 14:06:29 +01:00
Ingo Molnar	cc37d3d206	Merge branch 'core/futexes' into core/core	2008-12-25 13:54:14 +01:00
Ingo Molnar	b594deb0cc	Merge branch 'core/debug' into core/core	2008-12-25 13:53:11 +01:00
Ingo Molnar	0b271ef452	Merge commit 'v2.6.28' into core/core	2008-12-25 13:51:46 +01:00
Ingo Molnar	4e202284e6	Merge branch 'sched/urgent'; commit 'v2.6.28' into sched/core	2008-12-25 13:42:23 +01:00
Ingo Molnar	5250d329e3	Merge branches 'tracing/ftrace', 'tracing/hw-branch-tracing' and 'tracing/ring-buffer'; commit 'v2.6.28' into tracing/core	2008-12-25 13:11:00 +01:00
Peter Zijlstra	468a15bb4c	sched, trace: update trace_sched_wakeup() Impact: extend the wakeup tracepoint with the info whether the wakeup was real Add the information needed to distinguish 'real' wakeups from 'false' wakeups. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-25 13:10:21 +01:00
Ingo Molnar	9212ddb5ea	stacktrace: provide save_stack_trace_tsk() weak alias Impact: build fix Some architectures have not implemented save_stack_trace_tsk() yet: fs/built-in.o: In function `proc_pid_stack': base.c:(.text+0x3f140): undefined reference to `save_stack_trace_tsk' So warn about that if the facility is used. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-25 11:44:43 +01:00
Ingo Molnar	12d79bafb7	rcu: provide RCU options on non-preempt architectures too Impact: build fix Some old architectures still do not use kernel/Kconfig.preempt, so the moving of the RCU options there broke their build: In file included from /home/mingo/tip/include/linux/sem.h:81, from /home/mingo/tip/include/linux/sched.h:69, from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9: /home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration" Move these options back to init/Kconfig, which every architecture includes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-25 09:31:28 +01:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Ingo Molnar	db8862eafe	Merge branch 'linus' into tracing/hw-branch-tracing	2008-12-24 21:08:26 +01:00
Li Zefan	20ca9b3f4c	cgroups: avoid accessing uninitialized data in failure path If cgroup_get_rootdir() failed, free_cg_links() will be called in the failure path, but tmp_cg_links hasn't been initialized at that time. I introduced this bug in the 2.6.27 merge window. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-23 15:58:21 -08:00
Sharyathi Nagesh	e368d3a836	cgroups: suppress bogus warning messages Remove spurious warning messages that are thrown onto the console during cgroup operations. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Sharyathi Nagesh <sharyathi@in.ibm.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-23 15:58:21 -08:00
Vaidyanathan Srinivasan	36dffab679	sched: nominate preferred wakeup cpu, fix Andrew Morton reported: > kernel/sched.c: In function 'schedule': > kernel/sched.c:3679: warning: 'active_balance' may be used uninitialized in this function > > This warning is correct - the code is buggy. In sched.c load_balance_newidle, there's real potential use of uninitialised variable - fix it. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 22:37:29 +01:00
Steven Rostedt	98db8df777	ring-buffer: prevent false positive warning Impact: eliminate false WARN_ON message If an interrupt goes off after the setting of the local variable tail_page and before incrementing the write index of that page, the interrupt could push the commit forward to the next page. Later a check is made to see if interrupts pushed the buffer around the entire ring buffer by comparing the next page to the last commited page. This can produce a false positive if the interrupt had pushed the commit page forward as stated above. Thanks to Jiaying Zhang for finding this race. Reported-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 18:45:26 +01:00
Steven Rostedt	a8ccf1d6f6	ring-buffer: fix dangling commit race Impact: fix stuck trace-buffers If an interrupt comes in during the rb_set_commit_to_write and pushes the tail page forward just at the right time, the commit updates will miss the adding of the interrupt data. This will cause the commit pointer to cease from moving forward. Thanks to Jiaying Zhang for finding this race. Reported-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 18:45:25 +01:00
Lachlan McIlroy	27a0464a6c	[XFS] Fix merge conflict in fs/xfs/xfs_rename.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/xfs_rename.c Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:34:26 +11:00
Thomas Gleixner	3d44cc3e01	Null pointer deref with hrtimer_try_to_cancel() Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW commit `2d42244ae7` (clocksource: introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only available to read out the raw not NTP adjusted system time. The above commit did not prevent that a posix timer can be created with that clockid. The timer_create() syscall succeeds and initializes the timer to a non existing hrtimer base. When the timer is deleted either by timer_delete() or by the exit() cleanup the kernel crashes. Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the posix clock function to no_timer_create which returns an error code. Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-20 14:13:45 -08:00
Markus Metzger	bf53de907d	x86, bts: add fork and exit handling Impact: introduce new ptrace facility Add arch_ptrace_untrace() function that is called when the tracer detaches (either voluntarily or when the tracing task dies); ptrace_disable() is only called on a voluntary detach. Add ptrace_fork() and arch_ptrace_fork(). They are called when a traced task is forked. Clear DS and BTS related fields on fork. Release DS resources and reclaim memory in ptrace_untrace(). This releases resources already when the tracing task dies. We used to do that when the traced task dies. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-20 09:15:46 +01:00
Yinghai Lu	b909895739	sparseirq: fix numa_migrate_irq_desc dependency and comments Impact: reduce kconfig variable scope and clean up Bartlomiej pointed out that the config dependencies and comments are not right. update it depend to NUMA, and fix some comments Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 22:56:02 +01:00
Hiroshi Shimamoto	26cc271db7	printk: fix discarding message when recursion_bug Impact: fix truncated recursion bug message printout When recursion_bug is true, kernel discards original message because printk_buf contains recursion_bug_msg with NULL terminator. The sizeof(recursion_bug_msg) makes this, use strlen() to get correct length without NULL terminator. Reported-by: Toshikazu Nakayama <nakayama.ts@ncos.nec.co.jp> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 22:52:47 +01:00
Jan Beulich	9bb482476c	allow stripping of generated symbols under CONFIG_KALLSYMS_ALL Building upon parts of the module stripping patch, this patch introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y. Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64) kernels I tested with. The patch also does away with the need to special case the kallsyms- internal symbols by making them available even in the first linking stage. While it is a generated file, the patch includes the changes to scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure here is. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2008-12-19 22:47:10 +01:00
Pekka J Enberg	213cc06079	ftrace: introduce tracing_reset_online_cpus() helper Impact: cleanup This patch factors out common code from multiple tracers into a tracing_reset_online_cpus() function and converts the tracers to use it. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 16:29:34 +01:00
Rafael J. Wysocki	baa5835df1	Hibernate: Replace unnecessary evaluation of pfn_to_page() Replace one evaluation of pfn_to_page() in copy_data_pages() with the value of a local variable containing the right number already. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>	2008-12-19 04:40:35 -05:00
Rafael J. Wysocki	846705deb0	Hibernate: Take overlapping zones into account (rev. 2) It has been requested to make hibernation work with memory hotplugging enabled and for this purpose the hibernation code has to be reworked to take the possible overlapping of zones into account. Thus, rework the hibernation memory bitmaps code to prevent duplication of PFNs from occuring and add checks to make sure that one page frame will not be marked as saveable many times. Additionally, use list.h lists instead of open-coded lists to implement the memory bitmaps. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2008-12-19 04:40:35 -05:00
Rafael J. Wysocki	69643279a8	Hibernate: Do not oops on resume if image data are incorrect During resume from hibernation using the userland interface image data are being passed from the used space process to the kernel. These data need not be valid, but currently the kernel sometimes oopses if it gets invalid image data, which is wrong. Make the kernel return error codes to the user space in such cases. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>	2008-12-19 04:40:35 -05:00
Rafael J. Wysocki	3f4b0ef7f2	ACPI hibernate: Add a mechanism to save/restore ACPI NVS memory According to the ACPI Specification 3.0b, Section 15.3.2, "OSPM will call the _PTS control method some time before entering a sleeping state, to allow the platform's AML code to update this memory image before entering the sleeping state. After the system awakes from an S4 state, OSPM will restore this memory area and call the _WAK control method to enable the BIOS to reclaim its memory image." For this reason, implement a mechanism allowing us to save the NVS memory during hibernation and to restore it during the subsequent resume. Based on a patch by Zhang Rui. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Cc: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-12-19 04:40:34 -05:00
Zhang Rui	3fe0313e6e	Hibernate: Call platform_begin before swsusp_shrink_memory Call platform_begin() before swsusp_shrink_memory() so that we can always allocate enough memory to save the ACPI NVS region from platform_begin(). Signed-off-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2008-12-19 04:40:34 -05:00
Ingo Molnar	30cd324e97	Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core Conflicts: include/linux/ftrace.h	2008-12-19 09:42:40 +01:00
Ingo Molnar	9924da434a	sched: fix warning in kernel/sched.c Impact: fix cpumask conversion bug this warning: kernel/sched.c: In function ‘find_busiest_group’: kernel/sched.c:3429: warning: passing argument 1 of ‘__first_cpu’ from incompatible pointer type shows that we forgot to convert a new patch to the new cpumask APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:57 +01:00
Vaidyanathan Srinivasan	ad273b32e4	sched: activate active load balancing in new idle cpus Impact: tweak task balancing to save power more agressively Active load balancing is a process by which migration thread is woken up on the target CPU in order to pull current running task on another package into this newly idle package. This method is already in use with normal load_balance(), this patch introduces this method to new idle cpus when sched_mc is set to POWERSAVINGS_BALANCE_WAKEUP. This logic provides effective consolidation of short running daemon jobs in a almost idle system The side effect of this patch may be ping-ponging of tasks if the system is moderately utilised. May need to adjust the iterations before triggering. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:54 +01:00
Vaidyanathan Srinivasan	7eb52dfa70	sched: bias task wakeups to preferred semi-idle packages Impact: tweak task wakeup to save power more agressively Preferred wakeup cpu (from a semi idle package) has been nominated in find_busiest_group() in the previous patch. Use this information in sched_mc_preferred_wakeup_cpu in function wake_idle() to bias task wakeups if the following conditions are satisfied: - The present cpu that is trying to wakeup the process is idle and waking the target process on this cpu will potentially wakeup a completely idle package - The previous cpu on which the target process ran is also idle and hence selecting the previous cpu may wakeup a semi idle cpu package - The task being woken up is allowed to run in the nominated cpu (cpu affinity and restrictions) Basically if both the current cpu and the previous cpu on which the task ran is idle, select the nominated cpu from semi idle cpu package for running the new task that is waking up. Cache hotness is considered since the actual biasing happens in wake_idle() only if the application is cache cold. This technique will effectively move short running bursty jobs in a mostly idle system. Wakeup biasing for power savings gets automatically disabled if system utilisation increases due to the fact that the probability of finding both this_cpu and prev_cpu idle decreases. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:52 +01:00
Vaidyanathan Srinivasan	7a09b1a27b	sched: nominate preferred wakeup cpu Impact: extend load-balancing code (no change in behavior yet) When the system utilisation is low and more cpus are idle, then the process waking up from sleep should prefer to wakeup an idle cpu from semi-idle cpu package (multi core package) rather than a completely idle cpu package which would waste power. Use the sched_mc balance logic in find_busiest_group() to nominate a preferred wakeup cpu. This info can be stored in appropriate sched_domain, but updating this info in all copies of sched_domain is not practical. Hence this information is stored in root_domain struct which is one copy per partitioned sched domain. The root_domain can be accessed from each cpu's runqueue and there is one copy per partitioned sched domain. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:50 +01:00
Vaidyanathan Srinivasan	d5679bd119	sched: favour lower logical cpu number for sched_mc balance Impact: change load-balancing direction to match that of irqbalanced Just in case two groups have identical load, prefer to move load to lower logical cpu number rather than the present logic of moving to higher logical number. find_busiest_group() tries to look for a group_leader that has spare capacity to take more tasks and freeup an appropriate least loaded group. Just in case there is a tie and the load is equal, then the group with higher logical number is favoured. This conflicts with user space irqbalance daemon that will move interrupts to lower logical number if the system utilisation is very low. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:48 +01:00
Gautham R Shenoy	afb8a9b70b	sched: framework for sched_mc/smt_power_savings=N Impact: extend range of /sys/devices/system/cpu/sched_mc_power_savings Currently the sched_mc/smt_power_savings variable is a boolean, which either enables or disables topology based power savings. This patch extends the behaviour of the variable from boolean to multivalued, such that based on the value, we decide how aggressively do we want to perform powersavings balance at appropriate sched domain based on topology. Variable levels of power saving tunable would benefit end user to match the required level of power savings vs performance trade-off depending on the system configuration and workloads. This version makes the sched_mc_power_savings global variable to take more values (0,1,2). Later versions can have a single tunable called sched_power_savings instead of sched_{mc,smt}_power_savings. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:46 +01:00
Darren Hart	b56863630d	futex: clean up futex_(un)lock_pi fault handling Impact: cleanup Some apparently left over cruft code was complicating the fault logic: Testing if uval != -EFAULT doesn't have any meaning, get_user() sets ret to either 0 or -EFAULT, there's no need to compare uval, especially not against EFAULT which it will never be. This patch removes the superfluous test and clarifies the comment blocks. Build and boot tested on an 8way x86_64 system. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:20:21 +01:00
Ingo Molnar	c71dd42db2	tracing: fix warnings in kernel/trace/trace_sched_switch.c these warnings: kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_register’: kernel/trace/trace_sched_switch.c:96: warning: passing argument 1 of ‘register_trace_sched_wakeup_new’ from incompatible pointer type kernel/trace/trace_sched_switch.c:112: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_unregister’: kernel/trace/trace_sched_switch.c:121: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type Trigger because sched_wakeup_new tracepoints need the same trace signature as sched_wakeup - which was changed recently. Fix it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 01:05:38 +01:00
Ingo Molnar	3bddb9a324	tracing: fix warning in kernel/trace/trace.c this warning: kernel/trace/trace.c: In function ‘print_lat_fmt’: kernel/trace/trace.c:1826: warning: unused variable ‘state’ Triggers because 'state' has become unused - remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 01:01:25 +01:00
Ingo Molnar	b2e3c0adec	hrtimers: fix warning in kernel/hrtimer.c this warning: kernel/hrtimer.c: In function ‘hrtimer_cpu_notify’: kernel/hrtimer.c:1574: warning: unused variable ‘dcpu’ is caused because 'dcpu' is only used in the CONFIG_HOTPLUG_CPU case. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 00:45:32 +01:00
Paul E. McKenney	64db4cfff9	"Tree RCU": scalable classic RCU implementation This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 21:56:04 +01:00
Ingo Molnar	d110ec3a1e	Merge branch 'linus' into core/rcu	2008-12-18 21:54:49 +01:00
KOSAKI Motohiro	74c8a61304	locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP Impact: simplify code commit "08678b0: generic: sparse irqs: use irq_desc() [...]" introduced the irq_desc_lock_class variable. But it is used only if CONFIG_SPARSE_IRQ=Y or CONFIG_TRACE_IRQFLAGS=Y. Otherwise, following warnings happen: CC kernel/irq/handle.o kernel/irq/handle.c:26: warning: 'irq_desc_lock_class' defined but not used Actually, current early_init_irq_lock_class has a bit strange and messy ifdef. In addition, it is not valueable. 1. this function is protected by !CONFIG_SPARSE_IRQ, but that is not necessary. if CONFIG_SPARSE_IRQ=Y, desc of all irq number are initialized by NULL at first - then this function calling is safe. 2. this function protected by CONFIG_TRACE_IRQFLAGS too. but it is not necessary either, because lockdep_set_class() doesn't have bad side effect even if CONFIG_TRACE_IRQFLAGS=n. This patch bloat kernel size a bit on CONFIG_TRACE_IRQFLAGS=n and CONFIG_SPARSE_IRQ=Y - but that's ok. early_init_irq_lock_class() is not a fastpatch at all. To avoid messy ifdefs is more important than a few bytes diet. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 14:35:53 +01:00
Ken Chen	9c2c48020e	schedstat: consolidate per-task cpu runtime stats Impact: simplify code When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time. These two stats are exactly the same. Given that task->se.sum_exec_runtime is always accumulated by the core scheduler, sched_info can reuse that data instead of duplicate the accounting. Signed-off-by: Ken Chen <kenchen@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 13:54:01 +01:00
Lai Jiangshan	6d102bc68f	tracing/ring-buffer: remove unused ring_buffer size Impact: remove dead code struct ring_buffer.size is not set after ring_buffer is initialized or resized. it is always 0. we can use "buffer->pages * PAGE_SIZE" to get ring_buffer's size Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 13:49:56 +01:00
Thomas Gleixner	3d9101e925	trace: fix task state printout Impact: fix occasionally incorrect trace output The tracing code has interesting varieties of printing out task state. Unfortunalely only one of the instances is correct as it copies the code from sched.c:sched_show_task(). The others are plain wrong as they treatthe bitfield as an integer offset into the character array. Also the size check of the character array is wrong as it includes the trailing \0. Use a common state decoder inline which does the Right Thing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 13:03:09 +01:00
Steven Rostedt	ea3a6d6d60	ftrace: add not to regex on filtering functions Impact: enhancement Ingo Molnar has asked about a way to remove items from the filter lists. Currently, you can only add or replace items. The way items are added to the list is through opening one of the list files (set_ftrace_filter or set_ftrace_notrace) via append. If the file is opened for truncate, the list is cleared. echo spin_lock > /debug/tracing/set_ftrace_filter The above will replace the list with only spin_lock echo spin_lock >> /debug/tracing/set_ftrace_filter The above will add spin_lock to the list. Now this patch adds: echo '!spin_lock' >> /debug/tracing/set_ftrace_filter This will remove spin_lock from the list. The limited glob features of these lists also can be notted. echo '!spin_' >> /debug/tracing/set_ftrace_filter This will remove all functions that start with 'spin_' Note: echo '!spin_' > /debug/tracing/set_ftrace_filter will simply clear out the list (notice the '>' instead of '>>') Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 12:57:09 +01:00
Steven Rostedt	e05a43b744	trace: better use of stack_trace_enabled for boot up code Impact: clean up Andrew Morton suggested to use the stack_tracer_enabled variable to decide whether or not to start stack tracing on bootup. This lets us remove the start_stack_trace variable. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 12:56:56 +01:00
Steven Rostedt	f38f1d2aa5	trace: add a way to enable or disable the stack tracer Impact: enhancement to stack tracer The stack tracer currently is either on when configured in or off when it is not. It can not be disabled when it is configured on. (besides disabling the function tracer that it uses) This patch adds a way to enable or disable the stack tracer at run time. It defaults off on bootup, but a kernel parameter 'stacktrace' has been added to enable it on bootup. A new sysctl has been added "kernel.stack_tracer_enabled" to let the user enable or disable the stack tracer at run time. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 12:56:24 +01:00
Darren Hart	73500ac545	futex: rename field in futex_q to clarify single waiter semantics Impact: simplify code I've tripped over the naming of this field a couple times. The futex_q uses a "waiters" list to represent a single blocked task and then calles wake_up_all(). This can lead to confusion in trying to understand the intent of the code, which is to have a single futex_q for every task waiting on a futex. This patch corrects the problem, using a single pointer to the waiting task, and an appropriate call to wake_up, rather than wake_up_all. Compile and boot tested on an 8way x86_64 machine. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 11:42:23 +01:00
Ingo Molnar	948a7b2b5e	Merge branch 'irq/sparseirq' into cpus4096 Conflicts: arch/x86/kernel/io_apic.c Merge irq/sparseirq here, to resolve conflicts.	2008-12-17 13:16:08 +01:00
Ingo Molnar	1f3f424a6b	Merge branch 'linus' into cpus4096	2008-12-17 13:07:48 +01:00
Frederic Weisbecker	66896a85cf	tracing/ftrace: add the printk-msg-only option Impact: display ftrace_printk messages "as is" By default, ftrace_printk() messages find their output with some other informations like pid, caller, ... Sometimes a developer just want to have the ftrace_printk left "as is", without other information. This is done by providing a default-off option called printk-msg-only. To enable it, just do `echo printk-msg-only > /debugfs/tracing/trace_options` Before the patch: <...>-2739 [000] 145.692153: __might_sleep: I'm an ftrace_printk msg in __might_sleep <...>-2739 [000] 145.692155: __might_sleep: I'm another ftrace_printk msg in __might_sleep After the patch and the printk-msg-only option enabled: I'm an ftrace_printk msg in __might_sleep I'm another ftrace_printk msg in __might_sleep Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-17 00:26:36 +01:00
Frederic Weisbecker	2c2d7329d8	tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp() Impact: prevent a trace recursion After some tests with function graph tracer under x86-32, I saw some recursions caused by ring_buffer_time_stamp() that calls preempt_enable_no_notrace() which calls preempt_schedule() which is traced itself. This patch re-enables preemption without rescheduling. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-17 00:26:35 +01:00
Yinghai Lu	48a1b10aff	x86, sparseirq: move irq_desc according to smp_affinity, v7 Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes if CONFIG_NUMA_MIGRATE_IRQ_DESC is set: - make irq_desc to go with affinity aka irq_desc moving etc - call move_irq_desc in irq_complete_move() - legacy irq_desc is not moved, because they are allocated via static array for logical apic mode, need to add move_desc_in_progress_in_same_domain, otherwise it will not be moved ==> also could need two phases to get irq_desc moved. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-17 00:14:01 +01:00
Paul E. McKenney	343e9099c8	rcu: fix rcutorture behavior during reboot Impact: fix very rare reboot hang Because rcutorture ignored all signals, it does not terminate in response to the signals sent at shutdown time. This can cause strange failures due to its continuing to make use of kernel function too late in the shutdown sequence. This patch therefore adds a shutdown notifier to rcutorture, causing it to shut down in response to a reboot or an orderly shutdown. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-17 00:04:40 +01:00
Arjan van de Ven	3ac52669c7	resources: skip sanity check of busy resources Impact: reduce false positives in iomem_map_sanity_check() Some drivers (vesafb) only map/reserve a portion of a resource. If then some other driver comes in and maps the whole resource, the current code WARN_ON's. This is not the intent of the checks in iomem_map_sanity_check(); rather these checks want to warn when crossing hardware resources only. This patch skips BUSY resources as suggested by Linus. Note: having two drivers talk to the same hardware at the same time is obviously not optimal behavior, but that's a separate story. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 23:30:49 +01:00
Bharata B Rao	80f40ee4a0	sched: use RCU variant of list traversal in for_each_leaf_rt_rq() Impact: fix potential of rare crash for_each_leaf_rt_rq() walks an RCU protected list (rq->leaf_rt_rq_list), but doesn't use list_for_each_entry_rcu(). Fix this. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 21:39:14 +01:00
Ken Chen	e9515c3c9f	sched, cpuacct: export percpu cpuacct cgroup stats This patch export per-cpu CPU cycle usage for a given cpuacct cgroup. There is a need for a user space monitor daemon to track group CPU usage on per-cpu base. It is also useful for monitoring CFS load balancer behavior by tracking per CPU group usage. Signed-off-by: Ken Chen <kenchen@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 12:15:01 +01:00
Ken Chen	720f54988e	sched, cpuacct: refactoring cpuusage_read / cpuusage_write Impact: micro-optimize the code on 64-bit architectures In the thread regarding to 'export percpu cpuacct cgroup stats' http://lkml.org/lkml/2008/12/7/13 akpm pointed out that current cpuacct code is inefficient. This patch refactoring the following: * make cpu_rq locking only on 32-bit * change iterator to each_present_cpu instead of each_possible_cpu to make it hotplug friendly. It's a bit of code churn, but I was rewarded with 160 byte code size saving on x86-64 arch and zero code size change on i386. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 12:15:00 +01:00
Ingo Molnar	9dfc3bc7d2	Merge branches 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/hw-branch-tracing' into tracing/core	2008-12-16 12:03:38 +01:00
Peter Zijlstra	34f28ecd0f	sched: optimize update_curr() Impact: micro-optimization Skip the hard work when there is none. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 09:46:33 +01:00
Mike Galbraith	03e89e4574	sched: fix wakeup preemption clock Impact: sharpen the wakeup-granularity to always be against current scheduler time It was possible to do the preemption check against an old time stamp. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 09:45:38 +01:00
Paul Menage	307257cf47	cgroups: fix a race between rmdir and remount When a cgroup is removed, it's unlinked from its parent's children list, but not actually freed until the last dentry on it is released (at which point cgrp->root->number_of_cgroups is decremented). Currently rebind_subsystems checks for the top cgroup's child list being empty in order to rebind subsystems into or out of a hierarchy - this can result in the set of subsystems bound to a hierarchy being removed-but-not-freed cgroup. The simplest fix for this is to forbid remounts that change the set of subsystems on a hierarchy that has removed-but-not-freed cgroups. This bug can be reproduced via: mkdir /mnt/cg mount -t cgroup -o ns,freezer cgroup /mnt/cg mkdir /mnt/cg/foo sleep 1h < /mnt/cg/foo & rmdir /mnt/cg/foo mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg kill $! Though the above will cause oops in -mm only but not mainline, but the bug can cause memory leak in mainline (and even oops) Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-15 16:27:07 -08:00
Linus Torvalds	ca7e716c78	Revert "sched_clock: prevent scd->clock from moving backwards" This reverts commit `5b7dba4ff8`, which caused a regression in hibernate, reported by and bisected by Fabio Comolli. This revert fixes http://bugzilla.kernel.org/show_bug.cgi?id=12155 http://bugzilla.kernel.org/show_bug.cgi?id=12149 Bisected-by: Fabio Comolli <fabio.comolli@gmail.com> Requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-14 16:23:17 -08:00
Rusty Russell	968ea6d80e	Merge ../linux-2.6-x86 Conflicts: arch/x86/kernel/io_apic.c kernel/sched.c kernel/sched_stats.h	2008-12-13 21:55:51 +10:30
Rusty Russell	320ab2b0b1	cpumask: convert struct clock_event_device to cpumask pointers. Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>	2008-12-13 21:20:26 +10:30
Rusty Russell	0de26520c7	cpumask: make irq_set_affinity() take a const struct cpumask Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	2008-12-13 21:20:26 +10:30
Rusty Russell	29c0177e6a	cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: mingo@redhat.com Cc: tony.luck@intel.com Cc: ralf@linux-mips.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: cl@linux-foundation.org Cc: srostedt@redhat.com	2008-12-13 21:20:25 +10:30
Rusty Russell	98a79d6a50	cpumask: centralize cpu_online_map and cpu_possible_map Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com	2008-12-13 21:19:41 +10:30
Oleg Nesterov	899921025b	posix-timers: check ->it_signal instead of ->it_pid to validate the timer Impact: clean up, speed up ->it_pid (was ->it_process) has also a special meaning: if it is NULL, the timer is under deletion or it wasn't initialized yet. We can check ->it_signal != NULL instead, this way we can - simplify sys_timer_create() a bit - remove yet another check from lock_timer() - move put_pid(->it_pid) into release_posix_timer() which runs outside of ->it_lock Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 17:00:34 +01:00
Oleg Nesterov	27af4245b6	posix-timers: use "struct pid" instead of "struct task_struct" Impact: restructure, clean up code k_itimer holds the ref to the ->it_process until sys_timer_delete(). This allows to pin up to RLIMIT_SIGPENDING dead task_struct's. Change the code to use "struct pid *" instead. The patch doesn't kill ->it_process, it places ->it_pid into the union. ->it_process is still used by do_cpu_nanosleep() as before. It would be trivial to change the nanosleep code as well, but since it uses it_process in a special way I think it is better to keep this field for grep. The patch bloats the kernel by 104 bytes and it also adds the new pointer, ->it_signal, to k_itimer. It is used by lock_timer() to verify that the found timer was not created by another process. It is not clear why do we use the global database (and thus the global idr_lock) for posix timers. We still need the signal_struct->posix_timers which contains all useable timers, perhaps it is better to use some form of per-process array instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 17:00:07 +01:00
Woodruff, Richard	001474491f	nohz: suppress needless timer reprogramming In my device I get many interrupts from a high speed USB device in a very short period of time. The system spends a lot of time reprogramming the hardware timer which is in a slower timing domain as compared to the CPU. This results in the CPU spending a huge amount of time waiting for the timer posting to be done. All of this reprogramming is useless as the wake up time has not changed. As measured using ETM trace this drops my reprogramming penalty from almost 60% CPU load down to 15% during high interrupt rate. I can send traces to show this. Suppress setting of duplicate timer event when timer already stopped. Timer programming can be very costly and can result in long cpu stall/wait times. [akpm@linux-foundation.org: coding-style fixes] [tglx@linutronix.de: move the check to the right place and avoid raising the softirq for nothing] Signed-off-by: Richard Woodruff <r-woodruff2@ti.com> Cc: johnstul@us.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 16:55:31 +01:00
Ingo Molnar	8299608f14	Merge branches 'irq/sparseirq', 'x86/quirks' and 'x86/reboot' into cpus4096 We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the cpus4096 tree because the io-apic changes in the sparseirq change conflict with the cpumask changes in the cpumask tree, and we want to resolve those.	2008-12-12 13:49:24 +01:00
Ingo Molnar	45ab6b0c76	Merge branch 'sched/core' into cpus4096 Conflicts: include/linux/ftrace.h kernel/sched.c	2008-12-12 13:48:57 +01:00
Heiko Carstens	d65bd5ecb2	sched: add missing arch_update_cpu_topology() call arch_reinit_sched_domains() used to call arch_update_cpu_topology() via arch_init_sched_domains(). This call got lost with `e761b77252` ("cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)". So we might end up with outdated and missing cpus in the cpu core maps (architecture used to call arch_reinit_sched_domains if cpu topology changed). This adds a call to arch_update_cpu_topology in partition_sched_domains which gets called whenever scheduling domains get updated. Which is what is supposed to happen when cpu topology changes. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 13:47:22 +01:00
Heiko Carstens	ee79d1bdb6	sched: let arch_update_cpu_topology indicate if topology changed Change arch_update_cpu_topology so it returns 1 if the cpu topology changed and 0 if it didn't change. This will be useful for the next patch which adds a call to this function in partition_sched_domains. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 13:47:21 +01:00
Ingo Molnar	81444a7995	Merge branch 'tracing/fastboot' into cpus4096	2008-12-12 12:43:05 +01:00
Peter Zijlstra	cbc34ed1ac	sched: fix tracepoints in scheduler The trace point only caught one of many places where a task changes cpu, put it in the right place to we get all of them. Change the signature while we're at it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 12:08:26 +01:00
Frederic Weisbecker	f8b755ac8e	tracing/function-graph-tracer: Output arrows signal on hardirq call/return Impact: make more obvious the hardirq calls in the output When a hardirq is triggered inside the codeflow on output, we have now two arrows that indicate the entry and return of the hardirq. 0) \| bit_waitqueue() { 0) 0.880 us \| __phys_addr(); 0) 2.699 us \| } 0) \| __wake_up_bit() { 0) ==========> \| smp_apic_timer_interrupt() { 0) 0.797 us \| native_apic_mem_write(); 0) 0.715 us \| exit_idle(); 0) \| irq_enter() { 0) 0.722 us \| idle_cpu(); 0) 5.519 us \| } 0) \| hrtimer_interrupt() { 0) \| ktime_get() { 0) \| ktime_get_ts() { 0) 0.805 us \| getnstimeofday(); [...] 0) ! 108.528 us \| } 0) \| irq_exit() { 0) \| do_softirq() { 0) \| __do_softirq() { 0) 0.895 us \| __local_bh_disable(); 0) \| run_timer_softirq() { 0) 0.827 us \| hrtimer_run_pending(); 0) 1.226 us \| _spin_lock_irq(); 0) \| _spin_unlock_irq() { 0) 6.550 us \| } 0) 0.924 us \| _local_bh_enable(); 0) + 12.129 us \| } 0) + 13.911 us \| } 0) 0.707 us \| idle_cpu(); 0) + 17.009 us \| } 0) ! 137.419 us \| } 0) <========== \| 0) 1.045 us \| } 0) ! 148.908 us \| } 0) ! 151.022 us \| } 0) ! 153.022 us \| } 0) 0.963 us \| journal_mark_dirty(); 0) 0.925 us \| __brelse(); Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 11:14:09 +01:00
Ingo Molnar	c1dfdc7597	Merge commit 'v2.6.28-rc8' into sched/core	2008-12-12 10:29:35 +01:00
Markus Metzger	a93751cab7	x86, bts, ftrace: adapt the hw-branch-tracer to the ds.c interface Impact: restructure code, cleanup Remove BTS bits from the hw-branch-tracer (renamed from bts-tracer) and use the ds interface. Signed-off-by: Markus Metzger <markut.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 08:08:14 +01:00
Heiko Carstens	fa116ea35e	nohz: no softirq pending warnings for offline cpus Impact: remove false positive warning After a cpu was taken down during cpu hotplug (read: disabled for interrupts) it still might have pending softirqs. However take_cpu_down makes sure that the idle task will run next instead of ksoftirqd on the taken down cpu. The idle task will call tick_nohz_stop_sched_tick which might warn about pending softirqs just before the cpu kills itself completely. However the pending softirqs on the dead cpu aren't a problem because they will be moved to an online cpu during CPU_DEAD handling. So make sure we warn only for online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 07:27:01 +01:00
Robert Richter	c4f50183f9	ring_buffer: adding EXPORT_SYMBOLs I added EXPORT_SYMBOL_GPLs for all functions part of the API (ring_buffer.h). This is required since oprofile is using the ring buffer and the compilation as modules would fail otherwise. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 06:54:55 +01:00
Linus Torvalds	f9fc05e762	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: CPU remove deadlock fix	2008-12-10 14:41:06 -08:00
Hugh Dickins	b88ed20594	fix mapping_writably_mapped() Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped test in 2.6.7! Bad bad bug, good good find. The i_mmap_writable count must be incremented for VM_SHARED (just as i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock) when dup_mmap() copies the vma for fork: it has its own more optimal version of __vma_link_file(), and I missed this out. So the count was later going down to 0 (dangerous) when one end unmapped, then wrapping negative (inefficient) when the other end unmapped. The only impact on x86 would have been that setting a mandatory lock on a file which has at some time been opened O_RDWR and mapped MAP_SHARED (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN when it should succeed, or succeed when it should fail. But those architectures which rely on flush_dcache_page() to flush userspace modifications back into the page before the kernel reads it, may in some cases have skipped the flush after such a fork - though any repetitive test will soon wrap the count negative, in which case it will flush_dcache_page() unnecessarily. Fix would be a two-liner, but mapping variable added, and comment moved. Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 14:40:45 -08:00
Hugh Dickins	9c24624727	KSYM_SYMBOL_LEN fixes Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my `966c8c12dc` sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:54 -08:00
Tom Zanussi	fbb5b7ae4b	relayfs: fix infinite loop with splice() Running kmemtraced, which uses splice() on relayfs, causes a hard lock on x86-64 SMP. As described by Tom Zanussi: It looks like you hit the same problem as described here: commit `8191ecd1d1` splice: fix infinite loop in generic_file_splice_read() relay uses the same loop but it never got noticed or fixed. Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Tested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:52 -08:00
Robert Richter	e2ac8ef576	ftrace: remove unused function arg in trace_iterator_increment() This removes the unused cpu function parameter. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Robert Richter <robert.richter@amd.com>	2008-12-10 14:20:12 +01:00
Robert Richter	68814b58c5	ring_buffer: update description for ring_buffer_alloc() Trivial patch. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Robert Richter <robert.richter@amd.com>	2008-12-10 14:20:11 +01:00
Brian King	9a2bd244e1	sched: CPU remove deadlock fix Impact: fix possible deadlock in CPU hot-remove path This patch fixes a possible deadlock scenario in the CPU remove path. migration_call grabs rq->lock, then wakes up everything on rq->migration_queue with the lock held. Then one of the tasks on the migration queue ends up calling tg_shares_up which then also tries to acquire the same rq->lock. [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0 [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4 [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0 [c000000058eab640] c00000000007ada4 .complete+0x54/0x80 [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8 [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0 [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4 [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108 [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8 [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50 [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114 [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-09 19:27:03 +01:00
Al Viro	48887e63d6	[PATCH] fix broken timestamps in AVC generated by kernel threads Timestamp in audit_context is valid only if ->in_syscall is set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-09 02:27:41 -05:00
Randy Dunlap	7f0ed77d24	[patch 1/1] audit: remove excess kernel-doc Delete excess kernel-doc notation in kernel/auditsc.c: Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry' Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-09 02:27:40 -05:00
Al Viro	a64e64944f	[PATCH] return records for fork() both to child and parent Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-09 02:27:38 -05:00
Eric Paris	a3f07114e3	[PATCH] Audit: make audit=0 actually turn off audit Currently audit=0 on the kernel command line does absolutely nothing. Audit always loads and always uses its resources such as creating the kernel netlink socket. This patch causes audit=0 to actually disable audit. Audit will use no resources and starting the userspace auditd daemon will not cause the kernel audit system to activate. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-09 02:27:37 -05:00
Serge E. Hallyn	94d6a5f734	user namespaces: document CFS behavior Documented the currently bogus state of support for CFS user groups with user namespaces. In particular, all users in a user namespace should be children of the user which created the user namespace. This is yet to be implemented. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-12-09 09:25:53 +11:00
Peter Zijlstra	a0a99b227d	hrtimer: removing all ur callback modes, fix > Ingo, this addition fixes the hotplug issue on my machine And because we're all human... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 17:20:38 +01:00
Ingo Molnar	e726f5f91e	tracing/function-graph-tracer: fix 'flags' variable mismatch this warning: kernel/trace/trace.c: In function ‘trace_vprintk’: kernel/trace/trace.c:3626: warning: ‘flags’ may be used uninitialized in this function shows some confusion about irq_flags / flags use here. We already have irq_flags so remove the extra flags variable. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 16:55:53 +01:00
Vaidyanathan Srinivasan	efbe027e95	sched: idle_balance() does not call load_balance_newidle() Impact: fix SD_BALANCE_NEWIDLEand broaden its use load_balance_newidle() does not get called if SD_BALANCE_NEWIDLE is set at higher level domain (3-CPU) and not in low level domain (2-MC). pulled_task is initialised to -1 and checked for non-zero which is always true if the lowest level sched_domain does not have SD_BALANCE_NEWIDLE flag set. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 16:48:00 +01:00
Frederic Weisbecker	380c4b1411	tracing/function-graph-tracer: append the tracing_graph_flag Impact: Provide a way to pause the function graph tracer As suggested by Steven Rostedt, the previous patch that prevented from spinlock function tracing shouldn't use the raw_spinlock to fix it. It's much better to follow lockdep with normal spinlock, so this patch adds a new flag for each task to make the function graph tracer able to be paused. We also can send an ftrace_printk whithout worrying of the irrelevant traced spinlock during insertion. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 15:11:45 +01:00
Frederic Weisbecker	8e1b82e086	tracing/function-graph-tracer: turn tracing_selftest_running into an int Impact: cleanup Apply some suggestions of Steven Rostedt: _turn tracing_selftest_running into a simple int (no need of an atomic_t) _set it __read_mostly _fix a comment style Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 15:11:44 +01:00
Frederic Weisbecker	8b96f01198	tracing/function-graph-tracer: introduce __notrace_funcgraph to filter special functions Impact: trace more functions When the function graph tracer is configured, three more files are not traced to prevent only four functions to be traced. And this impacts the normal function tracer too. arch/x86/kernel/process_64/32.c: I had crashes when I let this file traced. After some debugging, I saw that the "current" task point was changed inside__swtich_to(), ie: "write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store the original return address of the function inside current, we had crashes. Only __switch_to() has to be excluded from tracing. kernel/module.c and kernel/extable.c: Because of a function used internally by the function graph tracer: __kernel_text_address() To let the other functions inside these files to be traced, this patch introduces the __notrace_funcgraph function prefix which is __notrace if function graph tracer is configured and nothing if not. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 15:11:44 +01:00
Yinghai Lu	99d093d128	x86: use NR_IRQS_LEGACY Impact: cleanup Introduce NR_IRQS_LEGACY instead of hard coded number. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 14:31:52 +01:00
Yinghai Lu	0b8f1efad3	sparse irq_desc[] array: core kernel and x86 changes Impact: new feature Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with NR_CPUS set to large values. The goal is to be able to scale up to much larger NR_IRQS value without impacting the (important) common case. To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of irq_desc pointers. When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc, this also makes the IRQ descriptors NUMA-local (to the site that calls request_irq()). This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now uses desc->chip_data for x86 to store irq_cfg. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 14:31:51 +01:00
Ken Chen	5436499e60	sched: fix sd_parent_degenerate on non-numa smp machine Impact: optimize the sched domains tree some more The addition of SD_SERIALIZE flag added to SD_NODE_INIT prevented top level dummy numa sched_domain to be properly degenerated on non-numa smp machine. The reason is that in sd_parent_degenerate(), it found that the child and parent does not have comon sched_domain flags due to SD_SERIALIZE. However, for non-numa smp box, the top level is a dummy with a single sched_group. Filter out SD_SERIALIZE if it is on non-numa machine to properly degenerate top level node sched_domain. this will cut back some of the sd domain walk in the load balancer code. Signed-off-by: Ken Chen <kenchen@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 13:52:44 +01:00
Ingo Molnar	4d117c5c6b	Merge branch 'sched/urgent' into sched/core	2008-12-08 13:52:00 +01:00
Frederic Weisbecker	decbec3838	tracing/function-graph-tracer: implement a print_headers function Impact: provide trace headers to explain a bit the output This patch implements the print_headers callback for the function graph tracer. These headers are output according to the current trace options. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 13:24:52 +01:00
Serge E. Hallyn	7657d90497	user namespaces: require cap_set{ug}id for CLONE_NEWUSER While ideally CLONE_NEWUSER will eventually require no privilege, the required permission checks are currently not there. As a result, CLONE_NEWUSER has the same effect as a setuid(0)+setgroups(1,"0"). While we already require CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems appropriate. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-12-08 09:16:27 +11:00
Serge E. Hallyn	c37bbb0fdc	user namespaces: let user_ns be cloned with fairsched (These two patches are in the next-unacked branch of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6. If they get some ACKs, then I hope to feed this into security-next. After these two, I think we're ready to tackle userns+capabilities) Fairsched creates a per-uid directory under /sys/kernel/uids/. So when you clone(CLONE_NEWUSER), it tries to create /sys/kernel/uids/0, which already exists, and you get back -ENOMEM. This was supposed to be fixed by sysfs tagging, but that was postponed (ok, rejected until sysfs locking is fixed). So, just as with network namespaces, we just don't create those directories for user namespaces other than the init. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-12-08 09:16:22 +11:00
Steven Rostedt	21bbecdaae	ftrace: use init_struct_pid as swapper pid Impact: clean up Using (struct pid *)-1 as the pointer for ftrace_swapper_pid is a little confusing for others. This patch uses the address of the actual init pid structure instead. This change is only for clarity. It does not affect the code itself. Hopefully soon the swapper tasks will all have their own pid structure and then we can clean up the code a bit more. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-05 14:51:29 +01:00
Frederic Weisbecker	21a8c466f9	tracing/ftrace: provide the macro task_curr_ret_stack() Impact: cleanup As suggested by Steven Rostedt, this patch provide a new macro task_curr_ret_stack() to move the cpp conditionnal CONFIG into the linux/ftrace.h headers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-05 14:47:44 +01:00
Frederic Weisbecker	77d683f3e0	tracing/ftrace: fix the check of ftrace_trace_task Impact: fix default empty traces on function-graph-tracer The actual ftrace_trace_task() checks if ftrace_pid_trace is allocated and return 1 if it is true. If it is NULL, it will check the bit of pid tracing flag for the current task (which are not set by default). So by default, a task is not traced. Actually all tasks should be traced by default and filter_by_pid when ftrace_pid_trace is allocated. The appropriate condition should be to return 1 if filter_by_pid is set. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acke-dby: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-05 14:47:43 +01:00
Frederic Weisbecker	ff32504fdc	tracing/ftrace: don't insert TRACE_PRINT during selftests Impact: fix tracer selfstests false results After setting a ftrace_printk somewhere in th kernel, I saw the Function tracer selftest failing. When a selftest occurs, the ring buffer is lurked to see if some entries were inserted. But concurrent insertion such as ftrace_printk could occured at the same time and could give false positive or negative results. This patch prevent prevent from TRACE_PRINT entries insertion during selftests. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-05 14:47:43 +01:00
Ingo Molnar	970987beb9	Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core	2008-12-05 14:45:22 +01:00
Linus Torvalds	bbeba4c35c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: [PATCH] fix bogus argument of blkdev_put() in pktcdvd [PATCH 2/2] documnt FMODE_ constants [PATCH 1/2] kill FMODE_NDELAY_NOW [PATCH] clean up blkdev_get a little bit [PATCH] Fix block dev compat ioctl handling [PATCH] kill obsolete temporary comment in swsusp_close()	2008-12-04 21:45:44 -08:00
Linus Torvalds	4857339d7c	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: time: catch xtime_nsec underflows and fix them posix-cpu-timers: fix clock_gettime with CLOCK_PROCESS_CPUTIME_ID	2008-12-04 21:40:29 -08:00
Linus Torvalds	3b666ce6a2	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0 documentation: local_ops fix on_each_cpu	2008-12-04 21:39:41 -08:00
Lachlan McIlroy	14d676f56f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-12-05 15:27:43 +11:00
David S. Miller	0871420fad	sparc64: Add tsb-ratio sysctl. Add a sysctl to tweak the RSS limit used to decide when to grow the TSB for an address space. In order to avoid expensive divides and multiplies only simply positive and negative powers of two are supported. The function computed takes the number of TSB translations that will fit at one time in the TSB of a given size, and either adds or subtracts a percentage of entries. This final value is the RSS limit. See tsb_size_to_rss_limit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-04 09:16:59 -08:00
Peter Zijlstra	37810659ea	hrtimer: removing all ur callback modes, fix hotplug Impact: fix hrtimer locking (reported by lockdep) in the CPU hotplug case This addition fixes the hotplug locking issue on my machine Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-04 11:31:25 +01:00

... 2 3 4 5 6 ...

5851 Commits